Uniquerows¶
flowtask.components.UniqueRows
¶
UniqueRows
¶
Bases: FlowComponent
UniqueRows
Overview
The UniqueRows class is a component for extracting unique rows from a Pandas DataFrame.
It supports pre-sorting of rows, custom options for handling duplicates, and an option to save
rejected rows that are duplicates.
.. table:: Properties
:widths: auto
+----------------+----------+-----------+---------------------------------------------------------------+
| Name | Required | Summary |
+----------------+----------+-----------+---------------------------------------------------------------+
| unique | Yes | List of columns to use for identifying unique rows. |
+----------------+----------+-----------+---------------------------------------------------------------+
| order | No | Dictionary specifying columns and sort order (`asc` or `desc`). |
+----------------+----------+-----------+---------------------------------------------------------------+
| keep | No | Specifies which duplicates to keep: `first`, `last`, or `False`. |
+----------------+----------+-----------+---------------------------------------------------------------+
| save_rejected | No | Dictionary with filename to save rejected rows as CSV, if specified. |
+----------------+----------+-----------+---------------------------------------------------------------+
Returns
This component returns a DataFrame containing only unique rows based on the specified columns.
If sorting is defined in `order`, rows are pre-sorted before duplicates are removed. Metrics such as
the number of rows passed and rejected are recorded. If `save_rejected` is specified, rejected rows
are saved to a file. Any data errors encountered during execution are raised with detailed error messages.
Example:
```yaml
UniqueRows:
unique:
- store_id
```