Skip to content

Filterrows

flowtask.components.FilterRows.FilterRows

FilterRows

FilterRows(loop=None, job=None, stat=None, **kwargs)

Bases: FlowComponent

FilterRows

Overview

The FilterRows class is a component for removing or cleaning rows in a Pandas DataFrame based on specified criteria.
It supports various cleaning and filtering operations and allows for the saving of rejected rows to a file.

.. table:: Properties :widths: auto

+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| Name             | Required | Description                                                                                      |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| fields           |   Yes    | A dictionary defining the fields and corresponding filtering conditions to be applied.           |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| filter_conditions|   Yes    | A dictionary defining the filter conditions for transformations.                                 |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| _applied         |   No     | A list to store the applied filters.                                                             |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| multi            |   No     | A flag indicating if multiple DataFrame transformations are supported, defaults to False.        |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+

Return

The methods in this class manage the filtering of rows in a Pandas DataFrame, including initialization, execution,
and result handling.



Example:

```yaml
FilterRows:
  filter_conditions:
    clean_empty:
      columns:
      - updated
    drop_columns:
      columns:
      - legal_street_address_1
      - legal_street_address_2
      - work_location_address_1
      - work_location_address_2
      - birth_date
    suppress:
      columns:
      - payroll_id
      - reports_to_payroll_id
      pattern: (\.0)
  drop_empty: true
```

start async

start(**kwargs)

Obtain Pandas Dataframe.