Tableschema¶
flowtask.components.TableSchema
¶
TableSchema
¶
Bases: QSSupport, FlowComponent
TableSchema
Overview
The TableSchema class is a component for reading a CSV file or DataFrame and creating a table schema based
on data models. It supports various formatting and normalization options for column names, datatype inference,
and automatic handling of primary keys. This component also supports normalization settings for column names,
such as camelCase to snake_case conversion, illegal character removal, and customizable name replacements.
.. table:: Properties
:widths: auto
+-------------------+----------+-----------+------------------------------------------------------------------+
| Name | Required | Summary |
+-------------------+----------+-----------+------------------------------------------------------------------+
| filename | Yes | The CSV file or DataFrame input to read and infer schema from. |
+-------------------+----------+-----------+------------------------------------------------------------------+
| schema | No | The database schema for the table. |
+-------------------+----------+-----------+------------------------------------------------------------------+
| tablename | Yes | The name of the table to be created based on the data model. |
+-------------------+----------+-----------+------------------------------------------------------------------+
| drop | No | Boolean specifying if an existing table with the same name should be dropped.|
+-------------------+----------+-----------+------------------------------------------------------------------+
| normalize_names | No | Dictionary with options for column name normalization. |
+-------------------+----------+-----------+------------------------------------------------------------------+
| pk | No | List of columns to define as primary keys. |
+-------------------+----------+-----------+------------------------------------------------------------------+
| replace_names | No | Dictionary of column name replacements for renaming specific columns. |
+-------------------+----------+-----------+------------------------------------------------------------------+
Returns
This component returns the input data after creating a database table schema based on the data's inferred or
specified structure. If the input is a file, it reads and processes the file; if a DataFrame, it directly
processes the DataFrame. The component provides detailed metrics on column structure and row counts, as well as
logging for SQL execution status and any schema creation errors.
handle_sql_reserved_word
¶
Verify if the column name is a SQL reserved word and raise an error if it is.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
column_name
|
str
|
Column name to verify |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
The same column name if it is not a reserved word |
Raises:
| Type | Description |
|---|---|
ComponentError
|
If the column name is a SQL reserved word |