Data Engine documentation
This site combines hand-written author guides with generated API reference material. Start with the authoring guides when building flows, then use the API reference for exact signatures and docstring-level details.
Authoring Guides
- Getting Started
- The mental model
- The basic workspace layout
- Where flow module sources live
- Your first flow
- What the app actually does with that flow
- A starter-style polling flow
- Batch-oriented files
- Running flows from Python
- Manual, poll, and schedule at a glance
- Runtime context essentials
- A few good habits early
- Next steps
- Core Concepts
- Configuring Flows
- Authoring Flow Modules
- Flow Methods
Flow(group, name=None, label=None)watch(...)mirror(root=...)step(fn, use=None, save_as=None, label=None)collect(extensions, root=None, recursive=False, use=None, save_as=None, label=None)map(fn, use=None, save_as=None, label=None)step_each(fn, use=None, save_as=None, label=None)run_once()run()preview(use=None)- Shared Validation Caveats
- FlowContext
- Database Methods
- DuckDB Helpers
- Import style
- Shared conventions
build_dimension(...)attach_dimension(...)normalize_columns(...)denormalize_columns(...)replace_rows_by_file(...)ensure_index(...)replace_rows_by_values(...)compact_database(...)read_rows_by_values(...)read_sql(...)read_table(...)replace_table(...)- Design guidance
- When to use direct DuckDB instead
- Polars And Schema Helpers
- Excel Helpers
- Recipes
- Recipe: Mirror source files
- Recipe: Filter rows and write a cleaned output
- Recipe: Capture source metadata during processing
- Recipe: Produce a stable latest snapshot
- Recipe: Read selected worksheets from a multi-sheet workbook
- Recipe: Single-file settings workflow
- Recipe: Batch read with
map(...)orstep_each(...) - Recipe: Load into DuckDB and export a summary
- Recipe: Use TOML workspace config
- Recipe: Normalize inbound column names before validation
- Recipe: Save an intermediate dataframe to the Debug view
- Recipe: Calculate business days and keep a grouped running total
- Recipe: Offset to the next business due date
- Recipe: Write dataframe outputs atomically
- Recipe: Propagate the last matching row value across a window
- Recipe: Count repeated visits to the same workflow
- Recipe: Replace one source slice in a DuckDB table
- Recipe: Write several outputs for one source
Runtime And Reference
- App Runtime and Workspaces
- The two roots to keep in mind
- How the app is structured
- Authored files vs generated runtime artifacts
- Shared workspace state
- Local state vs workspace state
- Control, handoff, and control requests
- The daemon and the selected workspace
- Workspace selection
- Workspace provisioning
- VS Code provisioning
- Flow-module compilation
- Logging and run history
- The kill switch
- How this affects flow authors
- API Reference
- Project Map
- Project Inventory