Components Overview¶
FlowTask components are the building blocks of your workflows. Each component performs a specific function and can be chained together to create complex data processing pipelines.
Component Categories¶
Data Input Components¶
- OpenWithPandas: Read various file formats (CSV, Excel, JSON) into pandas DataFrames
- HTTPService: Fetch data from HTTP/REST APIs
- DatabaseConnector: Connect to databases and execute queries
- ExecuteSQL: Connect to main (postgreSQL) database and execute queries.
Data Processing Components¶
- TransformRows: Transform data rows using custom logic
- FilterRows: Filter data based on conditions
- tGroup: Group and aggregate data
- tJoin: Merge multiple datasets
Data Output Components¶
- PandasToFile: Save DataFrames to various file formats
- TableOutput: Insert data into databases
- PgVectorOutput: Save to PostgreSQL with vector embeddings
- PDFGenerator: Generate PDF documents from data
Web Scraping Components¶
- WebScraper: Generic web scraping functionality
- CompanyScraper: Specialized company information scraping
- ExtractHTML: Extract data from HTML using XPath or CSS selectors
Utility Components¶
- Echo: Simple output for testing and debugging
- FileRead: File system operations
- SendMail: Send emails with attachments
Component Structure¶
All components share a common structure:
class ComponentName(FlowComponent):
"""
Component Description
Overview:
Detailed description of what the component does.
Properties:
property1 (type): Description of property1
property2 (type): Description of property2
Example:
```yaml
ComponentName:
property1: value1
property2: value2
```
"""
Using Components¶
Basic Usage¶
Chaining Components¶
steps:
- OpenWithPandas:
filename: input.csv
- TransformRows:
operation: normalize
- PandasToFile:
filename: output.xlsx
mime: application/vnd.ms-excel
Component Dependencies¶
Components can access outputs from previous steps:
steps:
- DataLoader:
source: database
- DataProcessor:
input_data: "{{ previous.output }}" # References previous step
Nested Components¶
Some components are organized in subdirectories for better organization:
components/
├── data/
│ ├── input/
│ │ └── OpenWithPandas.py
│ └── output/
│ └── PandasToFile.py
├── web/
│ ├── scrapers/
│ │ └── CompanyScraper.py
│ └── ExtractHTML.py
└── utils/
└── TransformRows/
└── TransformRows.py
Creating Custom Components¶
To create a custom component:
- Inherit from
FlowComponent - Implement required methods:
start(),run(),close() - Add proper docstring with examples
- Place in appropriate subdirectory
from flowtask.components.flow import FlowComponent
class MyCustomComponent(FlowComponent):
"""
MyCustomComponent
Overview:
Does something amazing with your data.
Example:
```yaml
MyCustomComponent:
setting1: value1
setting2: value2
```
"""
async def start(self):
# Initialize component
return True
async def run(self):
# Main component logic
return self.process_data()
async def close(self):
# Cleanup
pass
Browse Components¶
Explore the API Reference to see detailed documentation for all available components, including:
- Complete parameter descriptions
- Usage examples with YAML configurations
- Return value specifications
- Error handling information
Each component page includes real YAML examples that you can copy and adapt for your own workflows.