Skip to content

Components Overview

FlowTask components are the building blocks of your workflows. Each component performs a specific function and can be chained together to create complex data processing pipelines.

Component Categories

Data Input Components

  • OpenWithPandas: Read various file formats (CSV, Excel, JSON) into pandas DataFrames
  • HTTPService: Fetch data from HTTP/REST APIs
  • DatabaseConnector: Connect to databases and execute queries
  • ExecuteSQL: Connect to main (postgreSQL) database and execute queries.

Data Processing Components

  • TransformRows: Transform data rows using custom logic
  • FilterRows: Filter data based on conditions
  • tGroup: Group and aggregate data
  • tJoin: Merge multiple datasets

Data Output Components

  • PandasToFile: Save DataFrames to various file formats
  • TableOutput: Insert data into databases
  • PgVectorOutput: Save to PostgreSQL with vector embeddings
  • PDFGenerator: Generate PDF documents from data

Web Scraping Components

  • WebScraper: Generic web scraping functionality
  • CompanyScraper: Specialized company information scraping
  • ExtractHTML: Extract data from HTML using XPath or CSS selectors

Utility Components

  • Echo: Simple output for testing and debugging
  • FileRead: File system operations
  • SendMail: Send emails with attachments

Component Structure

All components share a common structure:

class ComponentName(FlowComponent):
    """
    Component Description

    Overview:
        Detailed description of what the component does.

    Properties:
        property1 (type): Description of property1
        property2 (type): Description of property2

    Example:
        ```yaml
        ComponentName:
            property1: value1
            property2: value2
        ```
    """

Using Components

Basic Usage

steps:
  - ComponentName:
      parameter1: value1
      parameter2: value2

Chaining Components

steps:
  - OpenWithPandas:
      filename: input.csv
  - TransformRows:
      operation: normalize
  - PandasToFile:
      filename: output.xlsx
      mime: application/vnd.ms-excel

Component Dependencies

Components can access outputs from previous steps:

steps:
  - DataLoader:
      source: database
  - DataProcessor:
      input_data: "{{ previous.output }}"  # References previous step

Nested Components

Some components are organized in subdirectories for better organization:

components/
├── data/
│   ├── input/
│   │   └── OpenWithPandas.py
│   └── output/
│       └── PandasToFile.py
├── web/
│   ├── scrapers/
│   │   └── CompanyScraper.py
│   └── ExtractHTML.py
└── utils/
    └── TransformRows/
        └── TransformRows.py

Creating Custom Components

To create a custom component:

  1. Inherit from FlowComponent
  2. Implement required methods: start(), run(), close()
  3. Add proper docstring with examples
  4. Place in appropriate subdirectory
from flowtask.components.flow import FlowComponent

class MyCustomComponent(FlowComponent):
    """
    MyCustomComponent

    Overview:
        Does something amazing with your data.

    Example:
        ```yaml
        MyCustomComponent:
            setting1: value1
            setting2: value2
        ```
    """

    async def start(self):
        # Initialize component
        return True

    async def run(self):
        # Main component logic
        return self.process_data()

    async def close(self):
        # Cleanup
        pass

Browse Components

Explore the API Reference to see detailed documentation for all available components, including:

  • Complete parameter descriptions
  • Usage examples with YAML configurations
  • Return value specifications
  • Error handling information

Each component page includes real YAML examples that you can copy and adapt for your own workflows.