Skip to content

Filterrows

flowtask.components.FilterRows

FilterRows.

Component for filtering, dropping, cleaning Pandas Dataframes.

FilterRows

FilterRows

FilterRows(loop=None, job=None, stat=None, **kwargs)

Bases: FlowComponent

FilterRows

Overview

The FilterRows class is a component for removing or cleaning rows in a Pandas DataFrame based on specified criteria.
It supports various cleaning and filtering operations and allows for the saving of rejected rows to a file.

.. table:: Properties :widths: auto

+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| Name             | Required | Description                                                                                      |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| fields           |   Yes    | A dictionary defining the fields and corresponding filtering conditions to be applied.           |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| filter_conditions|   Yes    | A dictionary defining the filter conditions for transformations.                                 |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| _applied         |   No     | A list to store the applied filters.                                                             |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| multi            |   No     | A flag indicating if multiple DataFrame transformations are supported, defaults to False.        |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+

Return

The methods in this class manage the filtering of rows in a Pandas DataFrame, including initialization, execution,
and result handling.



Example:

```yaml
FilterRows:
  filter_conditions:
    clean_empty:
      columns:
      - updated
    drop_columns:
      columns:
      - legal_street_address_1
      - legal_street_address_2
      - work_location_address_1
      - work_location_address_2
      - birth_date
    suppress:
      columns:
      - payroll_id
      - reports_to_payroll_id
      pattern: (\.0)
  drop_empty: true
```
start async
start(**kwargs)

Obtain Pandas Dataframe.

functions

Functions for FilterRows.

filter_rows_by_column

filter_rows_by_column(df, column_source, pattern, column_destination)

Filter rows in a DataFrame by removing rows where column_destination values are found in column_source values.

Parameters:

Name Type Description Default
df DataFrame

The DataFrame to filter.

required
column_source str

The name of the column to extract values from.

required
pattern str

The regex pattern to match (for string columns) or None for integer columns.

required
column_destination str

The name of the column to check against.

required

Returns:

Type Description
DataFrame

pd.DataFrame: A DataFrame with rows removed where column_destination values are in column_source values.