Metadata-Version: 2.4
Name: tree-sitter-postgres
Version: 1.1.0
Summary: Postgres grammar for tree-sitter
License: BSD-3-Clause
Project-URL: Homepage, https://github.com/gmr/tree-sitter-postgres
Keywords: incremental,parsing,tree-sitter,postgres
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Topic :: Software Development :: Compilers
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Typing :: Typed
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: core
Requires-Dist: tree-sitter>=0.25; extra == "core"
Dynamic: license-file

# tree-sitter-postgres

A [tree-sitter](https://tree-sitter.github.io/) grammar for PostgreSQL, generated directly from PostgreSQL's Bison grammar (`gram.y`) and keyword list (`kwlist.h`).

## Features

- **Current as of PostgreSQL 18** (generated from REL_18_3)
- **727 grammar rules** covering the full PostgreSQL SQL syntax
- **494 case-insensitive keywords** across all four PG keyword categories
- **Correct operator precedence** — `1 + 2 * 3` parses as `1 + (2 * 3)`
- **PL/pgSQL support** via a separate grammar with language injection
- **Generated, not hand-written** — regenerate for any PostgreSQL version

## Quick start

```bash
npm install
cd postgres && npx tree-sitter generate && npx tree-sitter test
```

## Regenerating from PostgreSQL source

The grammar is generated from a local PostgreSQL checkout. Set `PG_SOURCE_DIR` to point at your PostgreSQL source tree:

```bash
export PG_SOURCE_DIR=/path/to/postgres

# Using just (recommended)
just generate

# Or run the script directly
node script/generate-grammar.js "$PG_SOURCE_DIR"
cd postgres && npx tree-sitter generate
```

### Input files

| File                          | Source                                       |
| ----------------------------- | -------------------------------------------- |
| `src/backend/parser/gram.y`   | Bison grammar (733 rules, 3236 alternatives) |
| `src/include/parser/kwlist.h` | Keyword definitions (494 keywords)           |

### Generator scripts

| Script                          | Purpose                                                                  |
| ------------------------------- | ------------------------------------------------------------------------ |
| `script/generate-grammar.js`    | Orchestrator — reads PG source, writes `postgres/grammar.js`             |
| `script/parse-gram-y.js`        | Parses Bison grammar: rules, terminals, precedence, `%prec` annotations  |
| `script/parse-kwlist.js`        | Parses keyword list into categories                                      |
| `script/codegen.js`             | Generates tree-sitter grammar with precedence and optional-rule handling |
| `postgres/harvest-conflicts.sh` | Iteratively discovers GLR conflicts needed by tree-sitter                |

## Repository structure

```
postgres/               PostgreSQL SQL grammar
  grammar.js            Generated tree-sitter grammar
  src/                  Generated parser (C)
  test/corpus/          Test cases (35 tests)
  known-conflicts.json  GLR conflict pairs

plpgsql/                PL/pgSQL grammar
  grammar.js            Hand-written tree-sitter grammar
  src/scanner.c         External scanner for dollar-quoting and keywords
  test/corpus/          Test cases
  queries/              Highlights and injection queries

script/                 Shared generator code
  generate-grammar.js   SQL grammar orchestrator
  parse-gram-y.js       Bison parser
  parse-kwlist.js       Keyword parser
  codegen.js            Grammar code generator

bindings/               Language bindings (Node, Rust, Python, Go, Swift, C)
```

## Design notes

### Empty rule handling

Bison's `/* EMPTY */` alternatives cannot be directly translated — tree-sitter forbids non-start rules that match the empty string. The generator propagates optionality upward via a fixpoint loop and wraps references with `optional()` at call sites.

### Operator precedence

Binary operators are split into a separate `a_expr_prec` rule resolved by static precedence (no GLR), while complex patterns (IS, IN, BETWEEN, LIKE, subquery operators) stay in `a_expr` with GLR conflict resolution. Both `prec.left`/`prec.right` (generation-time) and `prec.dynamic` (runtime) are emitted.

### PL/pgSQL

PL/pgSQL is implemented as a separate hand-written grammar in `plpgsql/` with an external scanner for dollar-quoting and context-sensitive keywords. SQL expressions and statements within PL/pgSQL blocks are delegated to the postgres grammar via tree-sitter language injection (`plpgsql/queries/injections.scm`).

## License

BSD 3-Clause
