Metadata-Version: 2.4
Name: dartlab
Version: 0.5.0
Summary: DART 공시 문서를 완벽하게 분석하는 Python 라이브러리 — 숫자와 텍스트 모두
Project-URL: Homepage, https://github.com/eddmpython/dartlab
Project-URL: Repository, https://github.com/eddmpython/dartlab
Project-URL: Issues, https://github.com/eddmpython/dartlab/issues
Author: eddmpython
License: MIT License
        
        Copyright (c) 2026 eddmpython
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: accounting,dart,disclosure,financial-statements,korea,polars
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Office/Business :: Financial
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Typing :: Typed
Requires-Python: >=3.12
Requires-Dist: alive-progress>=3.3.0
Requires-Dist: beautifulsoup4>=4.14.3
Requires-Dist: lxml>=6.0.2
Requires-Dist: marimo>=0.20.4
Requires-Dist: openpyxl>=3.1.5
Requires-Dist: polars>=1.0.0
Requires-Dist: requests>=2.32.5
Requires-Dist: rich>=14.3.3
Provides-Extra: ai
Requires-Dist: fastapi>=0.135.1; extra == 'ai'
Requires-Dist: httpx>=0.28.1; extra == 'ai'
Requires-Dist: sse-starlette>=2.0.0; extra == 'ai'
Requires-Dist: uvicorn[standard]>=0.30.0; extra == 'ai'
Provides-Extra: all
Requires-Dist: anthropic>=0.30.0; extra == 'all'
Requires-Dist: fastapi>=0.135.1; extra == 'all'
Requires-Dist: httpx>=0.28.1; extra == 'all'
Requires-Dist: openai>=1.0.0; extra == 'all'
Requires-Dist: plotly>=5.0.0; extra == 'all'
Requires-Dist: sse-starlette>=2.0.0; extra == 'all'
Requires-Dist: uvicorn[standard]>=0.30.0; extra == 'all'
Provides-Extra: charts
Requires-Dist: plotly>=5.0.0; extra == 'charts'
Provides-Extra: llm
Requires-Dist: openai>=1.0.0; extra == 'llm'
Provides-Extra: llm-anthropic
Requires-Dist: anthropic>=0.30.0; extra == 'llm-anthropic'
Requires-Dist: openai>=1.0.0; extra == 'llm-anthropic'
Provides-Extra: ui
Requires-Dist: fastapi>=0.135.1; extra == 'ui'
Requires-Dist: httpx>=0.28.1; extra == 'ui'
Requires-Dist: sse-starlette>=2.0.0; extra == 'ui'
Requires-Dist: uvicorn[standard]>=0.30.0; extra == 'ui'
Description-Content-Type: text/markdown

﻿<div align="center">

<br>

<img alt="DartLab" src=".github/assets/logo.png" width="180">

<h3>DartLab</h3>

<p><b>Beyond the numbers</b> — Extract both financials and text from DART filings</p>

<p>
<a href="https://pypi.org/project/dartlab/"><img src="https://img.shields.io/pypi/v/dartlab?style=for-the-badge&color=ea4647&labelColor=050811&logo=pypi&logoColor=white" alt="PyPI"></a>
<a href="https://pypi.org/project/dartlab/"><img src="https://img.shields.io/pypi/pyversions/dartlab?style=for-the-badge&color=c83232&labelColor=050811&logo=python&logoColor=white" alt="Python"></a>
<a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-94a3b8?style=for-the-badge&labelColor=050811" alt="License"></a>
<a href="https://github.com/eddmpython/dartlab/actions/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/eddmpython/dartlab/ci.yml?branch=master&style=for-the-badge&labelColor=050811&logo=github&logoColor=white&label=CI" alt="CI"></a>
</p>

<p>
<a href="https://eddmpython.github.io/dartlab/">Docs</a> · <a href="README_KR.md">한국어</a> · <a href="https://buymeacoffee.com/eddmpython">Sponsor</a>
</p>

<p>
<a href="https://github.com/eddmpython/dartlab/releases/tag/data-docs"><img src="https://img.shields.io/badge/Docs-260%2B_Companies-f87171?style=for-the-badge&labelColor=050811&logo=databricks&logoColor=white" alt="Docs Data"></a>
<a href="https://github.com/eddmpython/dartlab/releases/tag/data-finance-1"><img src="https://img.shields.io/badge/Finance-2,700%2B_Companies-818cf8?style=for-the-badge&labelColor=050811&logo=databricks&logoColor=white" alt="Finance Data"></a>
<a href="https://github.com/eddmpython/dartlab/releases/tag/data-report-1"><img src="https://img.shields.io/badge/Report-2,700%2B_Companies-34d399?style=for-the-badge&labelColor=050811&logo=databricks&logoColor=white" alt="Report Data"></a>
</p>

</div>

## What is DartLab?

DartLab is a Python package for parsing and analyzing corporate filings. Its stable core covers [DART](https://dart.fss.or.kr/) (Korea) with growing support for [SEC EDGAR](https://www.sec.gov/edgar) (US) — both accessed through the same `dartlab.Company(...)` facade.

The package extracts **both financial numbers and narrative text** from filings and exposes them through comparable tables, company facades, CLI workflows, and an AI web interface. The same `index → show → trace` workflow works for Korean and US stocks alike.

### Account Standardization

Every listed company in Korea reports financials through XBRL, but each company uses **different account IDs and names** for the same economic concept. "Revenue" alone appears as dozens of variations across 2,700+ companies.

DartLab maintains its own **unified account schema** — built through a 7-stage mapping pipeline covering 34,000+ learned synonyms. The result: **98.7% of all financial statement rows** (15.8 million rows tested) across 2,700+ companies are successfully mapped to standardized accounts. This means you can directly compare Samsung Electronics' revenue with any other listed company using the same `revenue` key.

```
Raw XBRL (company-specific)          DartLab (standardized)
─────────────────────────────        ──────────────────────
ifrs-full_Revenue                 →  revenue
dart_OperatingIncomeLoss          →  operating_income
dart_ConstructionRevenue          →  revenue
ifrs_ProfitLoss                   →  net_income
매출액, 수익(매출액), 영업수익     →  revenue
```

### 40 Parsing Modules

One stock code is all you need. 40 modules extract structured DataFrames from disclosure filings — financial statements, notes, dividends, executives, governance, risk, and narrative text. All accessed through simple properties on a `Company` object, following the yfinance-style API.

## Installation

> **[uv](https://docs.astral.sh/uv/)** is required — a fast Python package manager written in Rust. It handles Python version management and virtual environments automatically.

```bash
# 1. Install uv (skip if already installed)
# Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# 2. Create a project
uv init my-analysis && cd my-analysis

# 3. Install DartLab — pick the extras you need
uv add dartlab              # Core (financial statement parsing)
uv add dartlab[ai]          # + AI analysis web interface (dartlab ai)
uv add dartlab[llm]         # + OpenAI/Ollama LLM (CLI analysis)
uv add dartlab[charts]      # + Plotly charts
uv add dartlab[all]         # Everything

# 4. Verify
uv run python -c "import dartlab; c = dartlab.Company('005930'); print(c.corpName)"
# → 삼성전자

# 5. Launch AI analysis (requires dartlab[ai])
uv run dartlab ai
# → http://localhost:8400
```

## Quick Start

```python
import dartlab

# Korean stocks (DART)
c = dartlab.Company("005930")       # by stock code
c = dartlab.Company("삼성전자")      # by company name (Korean)
c.corpName                  # "삼성전자"

# US stocks (EDGAR) — same facade, same workflow
c = dartlab.Company("AAPL")
c.corpName                  # "Apple Inc."
```

Creating a `Company` object prints a usage guide. For the full guide, call `c.guide()`.

Data is auto-downloaded from GitHub Releases when not found locally.

```python
from dartlab.core.dataLoader import downloadAll

downloadAll("docs")                        # 260+ companies — disclosure documents
downloadAll("finance")                     # 2,700+ companies — financial numbers
downloadAll("report")                      # 2,700+ companies — periodic reports
downloadAll("finance", forceUpdate=True)   # re-download if remote is newer
```

## CLI

The `dartlab` command is a public interface, not just a helper for the web UI.

```bash
uv run dartlab status
uv run dartlab setup codex
uv run dartlab ask 005930 "Summarize debt risk"
uv run dartlab excel 005930
uv run dartlab ai
```

`dartlab ai` launches the web interface. `ask`, `status`, `setup`, and `excel` are supported CLI commands with stable entrypoint behavior.

---

## Features

### Company — The Unified Entry Point

One facade covers both markets. The ticker format determines the data source automatically:

```python
import dartlab

# Korean stock → DART engine
kr = dartlab.Company("005930")
kr.corpName    # "삼성전자"

# US stock → EDGAR engine
us = dartlab.Company("AAPL")
us.corpName    # "Apple Inc."
```

Both return the same `Company` interface with the same `index → show → trace` workflow.

### index / show / trace

The current public flow is simple:

- `index` shows the company structure first
- `show(topic)` opens the actual payload
- `trace(topic)` explains whether `docs`, `finance`, or `report` won

```python
c = dartlab.Company("005930")
c.index              # structure index dataframe
c.show("BS")         # show one topic
c.trace("dividend")  # source trace
c.docs.sections      # pure docs source spine
c.finance.BS         # authoritative financial statement
c.report.dividend    # authoritative report series

# Same flow for EDGAR
us = dartlab.Company("AAPL")
us.index             # same 8-column structure
us.show("BS")        # SEC XBRL financials
us.show("riskFactors")  # 10-K narrative sections
```

`show()` returns a `ShowResult(text, table)` for disclosure topics — text and table blocks are separated:

```python
result = c.show("companyOverview")
result.text    # narrative text DataFrame
result.table   # table DataFrame
```

### Financial Statements

```python
c.BS    # Balance Sheet (DataFrame)
c.IS    # Income Statement (DataFrame)
c.CIS   # Comprehensive Income Statement (DataFrame)
c.CF    # Cash Flow Statement (DataFrame)
c.SCE   # Statement of Changes in Equity  (DART only)
```

### Cross-Company Comparable Time Series

Every company's XBRL data is mapped through the unified account schema (98.7% coverage), then converted to **standalone quarterly time series**. Cumulative figures from semi-annual and annual reports are reverse-engineered into individual quarters.

```python
series, periods = c.timeseries
# periods = ["2016_Q1", "2016_Q2", ..., "2024_Q4"]
# series["IS"]["revenue"]            # quarterly revenue
# series["BS"]["total_assets"]       # quarterly total assets
# series["CF"]["operating_cashflow"] # quarterly operating cash flow

r = c.ratios
r.roe               # 8.29 (%)
r.operatingMargin   # 9.51 (%)
r.debtRatio         # 27.4 (%)
r.fcf               # Free Cash Flow (KRW)
```

2,700+ listed companies share the same snakeId schema. Compare any two companies directly — no manual mapping required.

### Summary Financials with Bridge Matching

Extracts summary financial time series, automatically tracking accounts even when names change due to K-IFRS revisions.

```python
result = c.fsSummary()

result.FS          # Full financial time series (Polars DataFrame)
result.BS          # Balance Sheet
result.IS          # Income Statement
result.allRate     # Overall match rate (e.g. 0.97)
result.breakpoints # List of detected breakpoints
```

### K-IFRS Notes (12 items)

Use these as deep-access note parsers. For the company workflow, prefer `c.show(...)` on the board first.

```python
c.notes.inventory          # Inventories
c.notes["재고자산"]         # Korean key also works
c.notes.receivables        # Trade receivables
c.show("tangibleAsset")    # preferred company payload
c.notes.tangibleAsset      # deep-access legacy note parser
c.notes.intangibleAsset    # Intangible assets
c.notes.investmentProperty # Investment property
c.notes.affiliates         # Associates
c.notes.borrowings         # Borrowings
c.notes.provisions         # Provisions
c.notes.eps                # Earnings per share
c.notes.lease              # Leases
c.notes.segments           # Operating segments
c.show("costByNature")     # preferred company payload
c.notes.costByNature       # deep-access legacy note parser
```

### Dividends

```python
c.dividend
# ┌──────┬───────────┬───────┬──────────────┬─────────────┬──────────────┬──────┐
# │ year ┆ netIncome ┆ eps   ┆ totalDividend┆ payoutRatio ┆ dividendYield┆ dps  │
# └──────┴───────────┴───────┴──────────────┴─────────────┴──────────────┴──────┘
```

### Major Shareholders

```python
c.majorHolder    # Largest shareholder + related parties ownership (time series)
```

For the full Result object: `c.get("majorHolder")`

```python
result = c.get("majorHolder")
result.majorHolder   # "이재용"
result.majorRatio    # 20.76
result.timeSeries    # Ownership ratio time series
```

### Employees

```python
c.employee    # year, totalEmployees, avgSalary, avgTenure, ...
```

### Disclosure Horizontalization

```python
c.sections          # merged topic x period company table
c.index             # same structure index dataframe
c.docs.sections     # pure docs horizontalization source
c.retrievalBlocks   # long DataFrame of source markdown blocks
c.contextSlices     # LLM-ready slices with semantic/detail metadata
```

`sections` is the company spine. Columns are time series. The row structure
comes from disclosure sections, then `finance` fills `BS / IS / CIS / CF / SCE`
and `report` fills better structured periodic disclosure rows.

`retrievalBlocks()` and `contextSlices()` keep raw markdown and table evidence
so the text layer stays lossless while runtime still returns DataFrames
directly.

DartLab does not store per-stock result tables as package data. Learned rules
ship with the package, and runtime returns DataFrames directly from the current
stock's disclosure parquet.

### Audit Opinion

```python
c.audit    # year, auditor, opinion, keyAuditMatters
```

### Executives

```python
c.executive      # year, totalRegistered, insideDirectors, outsideDirectors, ...
c.executivePay   # year, category, headcount, totalPay, avgPay
```

### Shares / Capital

```python
c.shareCapital     # Issued, treasury, outstanding shares
c.capitalChange    # Capital changes
c.fundraising      # Capital increases/decreases
```

### Subsidiaries / Associates

```python
c.subsidiary           # Investments in other corporations
c.affiliateGroup       # Affiliate group companies
c.investmentInOther    # Investee, ownership ratio, book value
```

### Board / Governance

```python
c.boardOfDirectors     # Board composition, attendance
c.shareholderMeeting   # Shareholder meeting agendas, resolutions
c.auditSystem          # Audit committee, audit activities
c.internalControl      # Internal control assessment
```

### Risk / Legal

```python
c.contingentLiability  # Contingent liabilities, lawsuits
c.relatedPartyTx       # Related party transactions
c.sanction             # Sanctions, penalties
c.riskDerivative       # FX sensitivity, derivatives
```

### Other Financials

```python
c.bond                 # Debt securities
c.rnd                  # R&D expenses
c.otherFinance         # Allowance for bad debt, etc.
c.productService       # Major products/services
c.salesOrder           # Sales performance, order backlog
c.articlesOfIncorporation  # Articles of incorporation amendments
```

### Company Info

```python
c.companyHistory         # Corporate history
c.companyOverviewDetail  # Incorporation date, listing date, CEO, address
```

### Disclosure Narratives

```python
c.business       # Business overview (sections + change detection)
c.overview       # Company overview (incorporation, address, credit rating)
c.mdna           # Management Discussion & Analysis
c.rawMaterial    # Raw materials, tangible assets, capex
```

### Raw Data Access

```python
c.rawDocs        # Original docs parquet (unprocessed)
c.rawFinance     # Original finance parquet (unprocessed)
c.rawReport      # Original periodic report parquet (unprocessed)
```

---

## AI Analysis (dartlab ai)

Chat with an LLM over DartLab's structured data to analyze companies interactively — `uv run dartlab ai` opens the web UI at `http://localhost:8400`.

All extracted data (financial statements, notes, dividends, executives, governance) is provided as context for natural-language Q&A with streaming responses. Data Explorer lets you browse raw data directly in the browser.

The web UI is one public surface. The same runtime also exposes CLI entrypoints such as `dartlab ask`, `dartlab status`, `dartlab setup`, and `dartlab excel`.

### Supported LLM Providers

| Provider | Auth | Description |
|----------|------|-------------|
| **ChatGPT** | OAuth (browser login) | ChatGPT Plus/Pro subscription — no API key needed |
| **Ollama** | None (local) | Free, offline, private — GPU auto-detected |
| **OpenAI API** | API key | GPT-4o, o3, o4-mini and more |
| **Anthropic API** | API key | Claude Opus, Sonnet, Haiku |
| **Codex CLI** | CLI auth | ChatGPT subscription via Codex CLI |
| **Claude Code** | CLI auth | Claude subscription via Claude Code CLI |

```bash
uv run dartlab ai              # http://localhost:8400
uv run dartlab ai --port 9000  # custom port
```

---

## Bulk Extraction

```python
d = c.all()    # All module data as dict (with progress bar)
# {"BS": df, "IS": df, "CF": df, "dividend": df, "notes": {...},
#  "timeseries": (series, periods), "ratios": RatioResult, ...}
```

```python
import dartlab
dartlab.verbose = False    # Suppress progress output

d = c.all()    # Silent extraction
```

---

## Result Object

Properties return the primary DataFrame. For the full Result object, use `c.get()`.

```python
# property — returns DataFrame directly
c.audit          # opinionDf (audit opinion DataFrame)

# get() — returns full Result object
result = c.get("audit")
result.opinionDf   # Audit opinion
result.feeDf       # Audit fees
```

---

## Company Search

```python
import dartlab

dartlab.Company.search("삼성")
# ┌──────────────┬──────────┬────────────────┐
# │ 회사명       ┆ 종목코드 ┆ 업종           │
# └──────────────┴──────────┴────────────────┘

dartlab.Company.listing()   # Full KRX listed companies
dartlab.Company.status()    # Local data index
c.filings()         # Filing list + DART viewer links
```

---

## Core Technology

### Horizontal Alignment of Filings

DART filings cover different periods depending on report type:

```
                           Q1         Q2         Q3         Q4
                          ┌──────┐
 Q1 Report                │  Q1  │
                          └──────┘
                          ┌──────────────┐
 Semi-Annual              │   Q1 + Q2    │
                          └──────────────┘
                          ┌─────────────────────┐
 Q3 Report                │    Q1 + Q2 + Q3     │
                          └─────────────────────┘
                          ┌──────────────────────────────┐
 Annual Report            │       Q1 + Q2 + Q3 + Q4      │
                          └──────────────────────────────┘
```

Q1 reports contain only Q1, semi-annual reports contain cumulative Q1+Q2, and annual reports contain the full year. DartLab reverse-engineers standalone quarterly figures from these cumulative structures, and tracks accounts even when names change between filings.

### Bridge Matching

K-IFRS revisions and internal restructuring frequently cause **account name changes within the same company**. Bridge Matching combines amount matching and name similarity across adjacent years to automatically link identical accounts.

```
             2022              2023              2024
             ──────            ──────            ──────
 매출액 ────────────── 매출액 ────────────── 수익(매출액)
                              ↑ name change              ↑ name change
 영업이익 ──────────── 영업이익 ──────────── 영업이익
 당기순이익 ────────── 당기순이익 ────────── 당기순이익(손실)
```

Four-stage matching process:

1. **Exact match** — identical amounts
2. **Restatement match** — within 0.5 tolerance
3. **Name change match** — amount error < 5% AND name similarity > 60%
4. **Special item match** — decimal-unit items like EPS

When match rate drops below 85%, a breakpoint is detected and the segment is split.

---

## Data

### Sources and Integrity

All data originates from **[OpenDART](https://opendart.fss.or.kr/)** and **[DART](https://dart.fss.or.kr/)**, Korea's official electronic disclosure system. The developer has **not modified a single number** — only metadata columns (stock code, year, report type, etc.) have been added for structural organization.

If you want to verify, you can cross-check any value against the original filings using the package's built-in DART viewer links (`c.filings()`).

Each Parquet file contains all filings for a single company:

- **Metadata**: stock code, company name, report type, filing date, business year
- **Quantitative**: summary financials, financial statement body, notes
- **Narrative**: business description, audit opinion, risk management, executive/shareholder status

### Data Releases

| Category | Release Tags | Description | Count |
|----------|-------------|-------------|-------|
| Disclosure | [`data-docs`](https://github.com/eddmpython/dartlab/releases/tag/data-docs) | Parsed annual report sections | 260+ |
| Finance | [`data-finance-1`](https://github.com/eddmpython/dartlab/releases/tag/data-finance-1) [`2`](https://github.com/eddmpython/dartlab/releases/tag/data-finance-2) [`3`](https://github.com/eddmpython/dartlab/releases/tag/data-finance-3) [`4`](https://github.com/eddmpython/dartlab/releases/tag/data-finance-4) | XBRL financial statement numbers | 2,700+ |
| Report | [`data-report-1`](https://github.com/eddmpython/dartlab/releases/tag/data-report-1) [`2`](https://github.com/eddmpython/dartlab/releases/tag/data-report-2) [`3`](https://github.com/eddmpython/dartlab/releases/tag/data-report-3) [`4`](https://github.com/eddmpython/dartlab/releases/tag/data-report-4) | Periodic report data | 2,700+ |

Finance and Report data are split into 4 tags by stock code range (GitHub's 1000-asset-per-release limit). `loadData()` and `downloadAll()` handle this automatically.

### Bring Your Own Data

If you structure your own Parquet files to match DartLab's schema, all existing features work out of the box. Place files as `data/{category}/{stockCode}.parquet` and every property, extraction module, and analysis tool will function normally.

### Disclaimer

This project is licensed under MIT. While the data faithfully mirrors OpenDART public disclosures, **no guarantee of commercial reliability is provided**. Always verify against official sources for investment or compliance decisions.

> **Update frequency**
>
> Data is collected directly without paid proxies, so updates may be slow. Adding new companies or reflecting the latest filings may take time.

---

## Why DartLab?

DART filings contain far more than financial numbers — business descriptions, risk factors, audit opinions, litigation status, and governance changes are all embedded in the text. Most tools only extract the numbers. The rest is discarded.

DartLab extracts both. It aligns quarterly, semi-annual, and annual reports on a single time axis, and automatically tracks accounts even when K-IFRS revisions or restructuring changes their names.

> **Current scope**
>
> Bridge Matching tracks account name changes **within a single company** across years. The finance engine enables **cross-company comparison** by mapping XBRL accounts to standardized snakeIds. 2,700+ listed companies are normalized to the same structure.
>
> The insight engine grades each company across 7 areas (performance, profitability, financial health, cash flow, governance, risk), detects anomalies, and the rank engine computes market-wide size rankings.
>
> Text analysis capabilities are being developed in a **separate project** and will be integrated into DartLab.
>
> The ultimate goal is a tool that can analyze the **entire market** at once, not just one company.

## Roadmap

- [x] Summary financial time series (Bridge Matching)
- [x] Consolidated BS, IS, CF
- [x] Segment revenue, associates, dividends, employees, shareholders, subsidiaries
- [x] Debt securities, expenses by nature, raw materials/capex
- [x] Audit opinion, executive status, executive compensation
- [x] PPE movement, note details (23 keywords)
- [x] Board of directors, capital changes, contingent liabilities, related party tx, sanctions, R&D, internal control
- [x] Affiliate groups, capital raises, sales/orders, products, risk management/derivatives
- [x] MD&A, business description, company overview
- [x] Company property API + Notes integration + all()
- [x] Rich terminal output (avatar + usage guide)
- [x] Account standardization engine — 2,700+ companies cross-comparable
- [x] Quarterly time series + financial ratios (c.timeseries, c.ratios)
- [x] Periodic report data engine (dividend, employees, major holders, audit, executives)
- [x] Sector classification (WICS 11 sectors — KSIC + keyword + override)
- [x] Insight grading engine (7 areas: performance, profitability, health, cashflow, governance, risk + overall)
- [x] Anomaly detection (Z-score + domain rules across 30+ financial metrics)
- [x] Market-wide size ranking (revenue, assets, growth — total + within-sector)
- [x] AI analysis web interface (dartlab ai) — Ollama local LLM
- [x] Cloud LLM providers (OpenAI, Anthropic, ChatGPT OAuth, Codex CLI, Claude Code)
- [x] Data Explorer — full-screen data browser with Korean/English label toggle
- [x] Excel export with templates
- [ ] Company `profile` report view (terminal/notebook document view focused on change points)
- [ ] Compare UX overhaul around the same `index/show/trace` philosophy
- [x] EDGAR Company UX alignment with the DART `Company` surface
- [x] EDGAR (US SEC) financial data integration
- [ ] Text analysis module integration (from separate project)
- [ ] Quantitative + qualitative cross-validation
- [ ] Visualization

## Architecture

```
src/dartlab/
├── company.py              # Company facade — auto-routes DART / EDGAR
├── core/                   # Data loading, report selection, table parsing
│   ├── dataLoader.py       # GitHub Releases ↔ local cache
│   ├── dataConfig.py       # Release tags, shard mapping
│   └── registry.py         # DataEntry — single source of truth for all modules
│
├── engines/
│   ├── dart/               # L1: DART data source (Korea)
│   │   ├── docs/           # Filing document parsing
│   │   │   ├── finance/    # 36 quantitative modules (BS, IS, CF, dividend, ...)
│   │   │   ├── disclosure/ # 4 narrative modules (business, MD&A, overview, ...)
│   │   │   └── notes.py    # K-IFRS notes wrapper (12 items)
│   │   ├── finance/        # XBRL normalization — 34K synonyms → unified snakeId
│   │   └── report/         # Periodic report API (dividend, employee, audit, ...)
│   │
│   ├── edgar/              # L1: EDGAR data source (US)
│   │   ├── docs/           # 10-K/10-Q section parsing + horizontal alignment
│   │   └── finance/        # SEC XBRL normalization → unified snakeId
│   │
│   ├── common/             # Shared utilities (extract, ratios)
│   ├── sector/             # L2: WICS 11-sector classification
│   ├── insight/            # L2: 7-area grading (A~F) + anomaly detection
│   ├── rank/               # L2: Market-wide size ranking
│   │
│   └── ai/                 # L3: LLM-powered analysis
│       ├── providers/      # ChatGPT, Ollama, OpenAI, Anthropic, Codex, Claude Code
│       ├── context.py      # Engine data → LLM context assembly
│       └── prompts.py      # System prompts (KR/EN)
│
├── server/                 # FastAPI backend for web UI
└── ui/                     # Svelte 5 SPA (Data Explorer, chat)
```

**Layer principles**: L1 defines the data (labels, ordering, units). L2 and L3 consume L1 without modification. Changes to data quality always start at L1.

## Contributing

Issues and pull requests are welcome. Before submitting:

- Test new features in `experiments/` first — verify the approach before modifying `src/`
- For data mapping improvements (e.g., `accountMappings.json`), include experiment results showing the before/after impact

### Development Setup

```bash
git clone https://github.com/eddmpython/dartlab.git
cd dartlab
uv sync --group dev
pre-commit install
pre-commit install --hook-type commit-msg
uv run pytest tests/ -v -m "not requires_data"
```

Questions or ideas? Open an [issue](https://github.com/eddmpython/dartlab/issues). Both Korean and English are fine.

## Sponsor

<a href="https://buymeacoffee.com/eddmpython">
  <img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" width="180"/>
</a>

## License

MIT License

