Skip to content

Tgroup

flowtask.components.tGroup

tGroup

tGroup(loop=None, job=None, stat=None, **kwargs)

Bases: FlowComponent

tGroup

Overview

    The tGroup class is a component for performing a group-by operation on a DataFrame using specified columns.
    It returns unique combinations of the specified group-by columns, allowing data aggregation and summarization.

.. table:: Properties
:widths: auto

    +----------------+----------+-----------+---------------------------------------------------------------+
    | Name           | Required | Summary                                                                   |
    +----------------+----------+-----------+---------------------------------------------------------------+
    | group_by       |   Yes    | List of columns to group by.                                              |
    +----------------+----------+-----------+---------------------------------------------------------------+
    | columns        |   No     | List of columns to retain in the result DataFrame. If None,               |
    |                |          | all columns in `group_by` are returned.                                   |
    +----------------+----------+-----------+---------------------------------------------------------------+
    | agg            |   No     | List of aggregation functions to apply to the grouped data.               |
    |                |          | Each aggregation should be a dictionary with the column name,             |
    |                | aggregation function, and an optional alias.                                         |
    +----------------+----------+-----------+---------------------------------------------------------------+
Returns

    This component returns a DataFrame with unique rows based on the specified `group_by` columns. If `columns`
    is defined, only those columns are included in the result. The component provides debugging information on
    column data types if enabled, and any errors during grouping are logged and raised as exceptions.

Example:

```
- tGroup:
    group_by:
    - store_id
    - formatted_address
    - state_code
    - latitude
    - longitude
    - store_name
    - city
```

- Aggregation Example:
```
- tGroup:
    group_by:
    - store_id
    - formatted_address
    - state_code
    - latitude
    - longitude
    - store_name
    - city
    agg:
        - store_id: distinct
        alias: unique_store_ids
        - latitude: mean
        alias: avg_latitude
        - longitude: mean
        alias: avg_longitude
        - state_code: count
```