Skip to content

Tableschema

flowtask.components.TableSchema

TableSchema

TableSchema(loop=None, job=None, stat=None, **kwargs)

Bases: QSSupport, FlowComponent

TableSchema

Overview

    The TableSchema class is a component for reading a CSV file or DataFrame and creating a table schema based
    on data models. It supports various formatting and normalization options for column names, datatype inference,
    and automatic handling of primary keys. This component also supports normalization settings for column names,
    such as camelCase to snake_case conversion, illegal character removal, and customizable name replacements.

.. table:: Properties
:widths: auto

    +-------------------+----------+-----------+------------------------------------------------------------------+
    | Name              | Required | Summary                                                                      |
    +-------------------+----------+-----------+------------------------------------------------------------------+
    | filename          |   Yes    | The CSV file or DataFrame input to read and infer schema from.               |
    +-------------------+----------+-----------+------------------------------------------------------------------+
    | schema            |   No     | The database schema for the table.                                           |
    +-------------------+----------+-----------+------------------------------------------------------------------+
    | tablename         |   Yes    | The name of the table to be created based on the data model.                 |
    +-------------------+----------+-----------+------------------------------------------------------------------+
    | drop              |   No     | Boolean specifying if an existing table with the same name should be dropped.|
    +-------------------+----------+-----------+------------------------------------------------------------------+
    | normalize_names   |   No     | Dictionary with options for column name normalization.                       |
    +-------------------+----------+-----------+------------------------------------------------------------------+
    | pk                |   No     | List of columns to define as primary keys.                                   |
    +-------------------+----------+-----------+------------------------------------------------------------------+
    | replace_names     |   No     | Dictionary of column name replacements for renaming specific columns.        |
    +-------------------+----------+-----------+------------------------------------------------------------------+

Returns

    This component returns the input data after creating a database table schema based on the data's inferred or
    specified structure. If the input is a file, it reads and processes the file; if a DataFrame, it directly
    processes the DataFrame. The component provides detailed metrics on column structure and row counts, as well as
    logging for SQL execution status and any schema creation errors.

handle_sql_reserved_word

handle_sql_reserved_word(column_name)

Verify if the column name is a SQL reserved word and raise an error if it is.

Parameters:

Name Type Description Default
column_name str

Column name to verify

required

Returns:

Name Type Description
str

The same column name if it is not a reserved word

Raises:

Type Description
ComponentError

If the column name is a SQL reserved word