Adddataset¶
flowtask.components.AddDataset
¶
AddDataset
¶
Bases: DBSupport, TemplateSupport, FlowComponent
AddDataset Component
Overview
This component joins two pandas DataFrames based on specified criteria.
It supports various join types and handles cases where one of the DataFrames might be empty.
.. table:: Properties
:widths: auto
+---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | Name | Required | Summary | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | fields | Yes | List of field names to retrieve from the second dataset | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | dataset | Yes | Name of the second dataset to retrieve | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | datasource | No | Source of the second dataset ("datasets" or "vision") (default: "datasets") | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | join | No | List of columns to use for joining the DataFrames | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | type | No | Type of join to perform (left, inner) (default: left) | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | no_copy | No | If True, modifies original DataFrames instead of creating a copy (default: False) | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | distinct | No | If True, retrieves distinct rows based on join columns (default: False) | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | operator | No | Operator to use for joining rows (currently only "and" supported) | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | join_with | No | List of columns for a series of left joins (if main DataFrame has unmatched rows) | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | datatypes | No | Dictionary specifying data types for columns in the second dataset | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | infer_types | No | If True, attempts to infer better data types for object columns (default: False) | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | to_string | No | If True, attempts to convert object columns to strings during data type conversion (default: True) | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | as_objects | No | If True, creates resulting DataFrame with all columns as objects (default: False) | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | drop_empty | No | If True, drops columns with only missing values after join (default: False) | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | dropna | No | List of columns to remove rows with missing values after join | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | clean_strings | No | If True, replaces missing values in object/string columns with empty strings (default: False) | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | Group | No | usually used FormData | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | skipError | No | spected = skip - Log Errors - Enforce | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+
Returns the joined DataFrame and a metric ("JOINED_ROWS")
representing the number of rows in the result.
Example:
```yaml
AddDataset:
datasource: banco_chile
dataset: vw_form_metadata
distinct: true
type: left
fields:
- formid
- form_name
- column_name
- description as question
join:
- formid
- column_name
```