Skip to content

Tjoin

flowtask.components.tJoin

tJoin

tJoin(loop=None, job=None, stat=None, **kwargs)

Bases: FlowComponent

tJoin

Overview

The tJoin class is a component for joining two Pandas DataFrames based on specified join conditions. It supports various join types
(such as left, right, inner, and outer joins) and handles different scenarios like missing data, custom join conditions, and multi-source joins.

.. table:: Properties :widths: auto

+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| Name             | Required | Description                                                                                      |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| df1              |   Yes    | The left DataFrame to join.                                                                       |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| df2              |   Yes    | The right DataFrame to join.                                                                      |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| type             |   No     | "left"    | The type of join to perform. Supported values are "left", "right", "inner",           |
|                  |          |           | "outer", and "anti-join". When "anti-join" is used, it returns the difference         |
|                  |          |           | of B - A, i.e., all rows present in df1 but not in df2.                               |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| depends          |   Yes    | A list of dependencies defining the sources for the join.                                         |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| operator         |   No     | The logical operator to use for join conditions, defaults to "and".                               |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| fk               |   No     | The foreign key or list of keys to use for joining DataFrames.                                    |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| no_copy          |   No     | A flag indicating if copies of the DataFrames should not be made, defaults to True.               |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| join_with        |   No     | A list of additional keys to use for join conditions.                                             |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+

Return

The methods in this class manage the joining of two Pandas DataFrames, including initialization, execution, and result handling.
It ensures proper handling of temporary columns and provides metrics on the joined rows.



Example:

```yaml
tJoin:
  depends:
  - TransformRows_2
  - QueryToPandas_3
  type: left
  fk:
  - store_number
  args:
    validate: many_to_many
```

start async

start(**kwargs)

Obtain Pandas Dataframe.