Skip to content

Uniquerows

flowtask.components.UniqueRows

UniqueRows

UniqueRows(loop=None, job=None, stat=None, **kwargs)

Bases: FlowComponent

UniqueRows

Overview

    The UniqueRows class is a component for extracting unique rows from a Pandas DataFrame.
    It supports pre-sorting of rows, custom options for handling duplicates, and an option to save
    rejected rows that are duplicates.

.. table:: Properties
:widths: auto

    +----------------+----------+-----------+---------------------------------------------------------------+
    | Name           | Required | Summary                                                                   |
    +----------------+----------+-----------+---------------------------------------------------------------+
    | unique         |   Yes    | List of columns to use for identifying unique rows.                       |
    +----------------+----------+-----------+---------------------------------------------------------------+
    | order          |   No     | Dictionary specifying columns and sort order (`asc` or `desc`).           |
    +----------------+----------+-----------+---------------------------------------------------------------+
    | keep           |   No     | Specifies which duplicates to keep: `first`, `last`, or `False`.          |
    +----------------+----------+-----------+---------------------------------------------------------------+
    | save_rejected  |   No     | Dictionary with filename to save rejected rows as CSV, if specified.      |
    +----------------+----------+-----------+---------------------------------------------------------------+

Returns

    This component returns a DataFrame containing only unique rows based on the specified columns.
    If sorting is defined in `order`, rows are pre-sorted before duplicates are removed. Metrics such as
    the number of rows passed and rejected are recorded. If `save_rejected` is specified, rejected rows
    are saved to a file. Any data errors encountered during execution are raised with detailed error messages.


Example:

```yaml
UniqueRows:
  unique:
  - store_id
```

close

close()

Close.

run async

run()

Getting Unique rows from a Dataframe.

start async

start(**kwargs)

Start Component