Tableinput¶
flowtask.components.TableInput
¶
TableInput
¶
Bases: TableBase
TableInput
Overview
The TableInput class is a component for loading data from a SQL table or custom SQL query into a Pandas DataFrame.
It supports large tables by utilizing Dask for partitioned loading, along with options for column trimming,
primary key setting, and removal of empty rows or columns.
.. table:: Properties
:widths: auto
+----------------+----------+-----------+-------------------------------------------------------------------+
| Name | Required | Summary |
+----------------+----------+-----------+-------------------------------------------------------------------+
| tablename | Yes | The name of the SQL table to load into a DataFrame. |
+----------------+----------+-----------+-------------------------------------------------------------------+
| schema | No | Database schema where the table is located, if applicable. |
+----------------+----------+-----------+-------------------------------------------------------------------+
| query | No | SQL query string to load data if `tablename` is not used. |
+----------------+----------+-----------+-------------------------------------------------------------------+
| chunksize | No | Number of rows per chunk when reading data in chunks. |
+----------------+----------+-----------+-------------------------------------------------------------------+
| bigfile | No | Boolean indicating if Dask should be used for large tables. |
+----------------+----------+-----------+-------------------------------------------------------------------+
| drop_empty | No | Boolean to drop columns and rows with only null values. |
+----------------+----------+-----------+-------------------------------------------------------------------+
| trim | No | Boolean to trim whitespace from string columns. |
+----------------+----------+-----------+-------------------------------------------------------------------+
| pk | No | Dictionary specifying primary key columns and settings for DataFrame indexing.|
+----------------+----------+-----------+-------------------------------------------------------------------+
Returns
This component returns a DataFrame populated with data from the specified SQL table or query. If `drop_empty` is set,
columns and rows with only null values are removed. Metrics such as row and column counts are recorded, along with
information on the table or query used for data extraction. If no data is found, a `DataNotFound` exception is raised.
Example:
```yaml
TableInput:
tablename: sales_raw
schema: epson
chunksize: 1000
pk:
columns:
- sales_id
```