Tgroup¶
flowtask.components.tGroup
¶
tGroup
¶
Bases: FlowComponent
tGroup
Overview
The tGroup class is a component for performing a group-by operation on a DataFrame using specified columns.
It returns unique combinations of the specified group-by columns, allowing data aggregation and summarization.
.. table:: Properties
:widths: auto
+----------------+----------+-----------+---------------------------------------------------------------+
| Name | Required | Summary |
+----------------+----------+-----------+---------------------------------------------------------------+
| group_by | Yes | List of columns to group by. |
+----------------+----------+-----------+---------------------------------------------------------------+
| columns | No | List of columns to retain in the result DataFrame. If None, |
| | | all columns in `group_by` are returned. |
+----------------+----------+-----------+---------------------------------------------------------------+
| agg | No | List of aggregation functions to apply to the grouped data. |
| | | Each aggregation should be a dictionary with the column name, |
| | aggregation function, and an optional alias. |
+----------------+----------+-----------+---------------------------------------------------------------+
Returns
This component returns a DataFrame with unique rows based on the specified `group_by` columns. If `columns`
is defined, only those columns are included in the result. The component provides debugging information on
column data types if enabled, and any errors during grouping are logged and raised as exceptions.
Example:
```
- tGroup:
group_by:
- store_id
- formatted_address
- state_code
- latitude
- longitude
- store_name
- city
```
- Aggregation Example:
```
- tGroup:
group_by:
- store_id
- formatted_address
- state_code
- latitude
- longitude
- store_name
- city
agg:
- store_id: distinct
alias: unique_store_ids
- latitude: mean
alias: avg_latitude
- longitude: mean
alias: avg_longitude
- state_code: count
```