Filterrows¶
flowtask.components.FilterRows
¶
FilterRows.
Component for filtering, dropping, cleaning Pandas Dataframes.
FilterRows
¶
FilterRows
¶
Bases: FlowComponent
FilterRows
Overview
The FilterRows class is a component for removing or cleaning rows in a Pandas DataFrame based on specified criteria.
It supports various cleaning and filtering operations and allows for the saving of rejected rows to a file.
.. table:: Properties :widths: auto
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| Name | Required | Description |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| fields | Yes | A dictionary defining the fields and corresponding filtering conditions to be applied. |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| filter_conditions| Yes | A dictionary defining the filter conditions for transformations. |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| _applied | No | A list to store the applied filters. |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| multi | No | A flag indicating if multiple DataFrame transformations are supported, defaults to False. |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
Return
The methods in this class manage the filtering of rows in a Pandas DataFrame, including initialization, execution,
and result handling.
Example:
```yaml
FilterRows:
filter_conditions:
clean_empty:
columns:
- updated
drop_columns:
columns:
- legal_street_address_1
- legal_street_address_2
- work_location_address_1
- work_location_address_2
- birth_date
suppress:
columns:
- payroll_id
- reports_to_payroll_id
pattern: (\.0)
drop_empty: true
```
functions
¶
Functions for FilterRows.
filter_rows_by_column
¶
Filter rows in a DataFrame by removing rows where column_destination values are found in column_source values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
The DataFrame to filter. |
required |
column_source
|
str
|
The name of the column to extract values from. |
required |
pattern
|
str
|
The regex pattern to match (for string columns) or None for integer columns. |
required |
column_destination
|
str
|
The name of the column to check against. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: A DataFrame with rows removed where column_destination values are in column_source values. |