Skip to content

Adddataset

flowtask.components.AddDataset

AddDataset

AddDataset(loop=None, job=None, stat=None, **kwargs)

Bases: DBSupport, TemplateSupport, FlowComponent

AddDataset Component

Overview

This component joins two pandas DataFrames based on specified criteria.
It supports various join types and handles cases where one of the DataFrames might be empty.

.. table:: Properties
:widths: auto

+---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | Name | Required | Summary | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | fields | Yes | List of field names to retrieve from the second dataset | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | dataset | Yes | Name of the second dataset to retrieve | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | datasource | No | Source of the second dataset ("datasets" or "vision") (default: "datasets") | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | join | No | List of columns to use for joining the DataFrames | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | type | No | Type of join to perform (left, inner) (default: left) | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | no_copy | No | If True, modifies original DataFrames instead of creating a copy (default: False) | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | distinct | No | If True, retrieves distinct rows based on join columns (default: False) | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | operator | No | Operator to use for joining rows (currently only "and" supported) | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | join_with | No | List of columns for a series of left joins (if main DataFrame has unmatched rows) | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | datatypes | No | Dictionary specifying data types for columns in the second dataset | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | infer_types | No | If True, attempts to infer better data types for object columns (default: False) | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | to_string | No | If True, attempts to convert object columns to strings during data type conversion (default: True) | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | as_objects | No | If True, creates resulting DataFrame with all columns as objects (default: False) | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | drop_empty | No | If True, drops columns with only missing values after join (default: False) | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | dropna | No | List of columns to remove rows with missing values after join | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | clean_strings | No | If True, replaces missing values in object/string columns with empty strings (default: False) | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | Group | No | usually used FormData | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+ | skipError | No | spected = skip - Log Errors - Enforce | +---------------------------+----------+-----------+------------------------------------------------------------------------------------------+

    Returns the joined DataFrame and a metric ("JOINED_ROWS")
    representing the number of rows in the result.



Example:

```yaml
AddDataset:
  datasource: banco_chile
  dataset: vw_form_metadata
  distinct: true
  type: left
  fields:
  - formid
  - form_name
  - column_name
  - description as question
  join:
  - formid
  - column_name
```

start async

start(**kwargs)

Obtain Pandas Dataframe.