Metadata-Version: 2.4
Name: daft-lts
Version: 0.7.8
Requires-Dist: pyarrow>=8.0.0,<24.0.0
Requires-Dist: fsspec<2026.3.0
Requires-Dist: tqdm<4.68.0
Requires-Dist: typing-extensions>=4.0.0 ; python_full_version < '3.11'
Requires-Dist: packaging
Requires-Dist: daft[aws,azure,clickhouse,deltalake,gcp,google,gravitino,hudi,huggingface,iceberg,lance,numpy,openai,pandas,postgres,ray,sentence-transformers,sql,transformers,turbopuffer,unity,video] ; extra == 'all'
Requires-Dist: soundfile>=0.13.0,<0.14.0 ; extra == 'audio'
Requires-Dist: librosa>=0.11.0,<0.12.0 ; extra == 'audio'
Requires-Dist: boto3<1.43.0 ; extra == 'aws'
Requires-Dist: mypy-boto3-glue ; extra == 'aws'
Requires-Dist: clickhouse-connect<0.12.0 ; extra == 'clickhouse'
Requires-Dist: deltalake<1.3.0 ; extra == 'deltalake'
Requires-Dist: google-genai<1.64.0 ; extra == 'google'
Requires-Dist: numpy<2.4.0 ; extra == 'google'
Requires-Dist: pillow==12.1.1 ; extra == 'google'
Requires-Dist: requests>=2.28.0,<3.0.0 ; extra == 'gravitino'
Requires-Dist: pyarrow>=8.0.0,<22.1.0 ; extra == 'hudi'
Requires-Dist: huggingface-hub<1.5.0 ; extra == 'huggingface'
Requires-Dist: datasets<4.6.0 ; extra == 'huggingface'
Requires-Dist: pyiceberg>=0.7.0,!=0.9.1,!=0.10.0,<=0.11.0 ; extra == 'iceberg'
Requires-Dist: pylance<0.40.0 ; extra == 'lance'
Requires-Dist: numpy<2.4.0 ; extra == 'numpy'
Requires-Dist: openai<2.21.0 ; extra == 'openai'
Requires-Dist: numpy<2.4.0 ; extra == 'openai'
Requires-Dist: pillow==12.1.1 ; extra == 'openai'
Requires-Dist: pandas<2.4.0 ; extra == 'pandas'
Requires-Dist: psycopg[binary]<3.4.0 ; extra == 'postgres'
Requires-Dist: pgvector<0.5.0 ; extra == 'postgres'
Requires-Dist: sqlglot<28.11.0 ; extra == 'postgres'
Requires-Dist: connectorx>=0.4.4,<0.5.0 ; extra == 'postgres'
Requires-Dist: ray[data,client]>=2.0.0,<2.54.0 ; platform_system != 'Windows' and extra == 'ray'
Requires-Dist: ray[data,client]>=2.10.0,<2.54.0 ; platform_system == 'Windows' and extra == 'ray'
Requires-Dist: connectorx>=0.4.4,<0.5.0 ; extra == 'sql'
Requires-Dist: sqlalchemy<2.1.0 ; extra == 'sql'
Requires-Dist: sqlglot<28.11.0 ; extra == 'sql'
Requires-Dist: transformers<5.2.0 ; extra == 'transformers'
Requires-Dist: sentence-transformers<5.3.0 ; extra == 'transformers'
Requires-Dist: torch<2.11.0 ; extra == 'transformers'
Requires-Dist: torchvision<0.26.0 ; extra == 'transformers'
Requires-Dist: pillow==12.1.1 ; extra == 'transformers'
Requires-Dist: turbopuffer<1.16.0 ; extra == 'turbopuffer'
Requires-Dist: httpx<=0.28.1 ; extra == 'unity'
Requires-Dist: unitycatalog<0.2.0 ; extra == 'unity'
Requires-Dist: deltalake<1.3.0 ; extra == 'unity'
Requires-Dist: av>=15.0.0,<16.2.0 ; extra == 'video'
Provides-Extra: all
Provides-Extra: audio
Provides-Extra: aws
Provides-Extra: azure
Provides-Extra: clickhouse
Provides-Extra: deltalake
Provides-Extra: gcp
Provides-Extra: google
Provides-Extra: gravitino
Provides-Extra: hudi
Provides-Extra: huggingface
Provides-Extra: iceberg
Provides-Extra: lance
Provides-Extra: numpy
Provides-Extra: openai
Provides-Extra: pandas
Provides-Extra: postgres
Provides-Extra: ray
Provides-Extra: sql
Provides-Extra: transformers
Provides-Extra: turbopuffer
Provides-Extra: unity
Provides-Extra: video
Provides-Extra: viz
License-File: LICENSE
Summary: Distributed Dataframes for Multimodal Data
Author-email: Eventual Inc <daft@eventualcomputing.com>
Maintainer-email: Sammy Sidhu <sammy@eventualcomputing.com>, Jay Chia <jay@eventualcomputing.com>
Requires-Python: >=3.10
Description-Content-Type: text/x-rst; charset=UTF-8
Project-URL: homepage, https://www.daft.ai
Project-URL: repository, https://github.com/Eventual-Inc/Daft

|Banner|

|CI| |PyPI| |Latest Tag| |Coverage| |Slack|

`Website <https://www.daft.ai>`_ • `Docs <https://docs.daft.ai>`_ • `Installation <https://docs.daft.ai/en/stable/install/>`_ • `Daft Quickstart <https://docs.daft.ai/en/stable/quickstart/>`_ • `Community and Support <https://github.com/Eventual-Inc/Daft/discussions>`_

Daft: High-Performance Data Engine for AI and Multimodal Workloads
==================================================================

|TrendShift|

`Daft <https://www.daft.ai>`_ is a high-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale.

* **Native multimodal processing:** Process images, audio, video, and embeddings alongside structured data in a single framework
* **Built-in AI operations:** Run LLM prompts, generate embeddings, and classify data at scale using OpenAI, Transformers, or custom models
* **Python-native, Rust-powered:** Skip the JVM complexity with Python at its core and Rust under the hood for blazing performance
* **Seamless scaling:** Start local, scale to distributed clusters on `Ray <https://docs.daft.ai/en/stable/distributed/ray/>`_, `Kubernetes <https://docs.daft.ai/en/stable/distributed/kubernetes/>`_
* **Universal connectivity:** Access data anywhere (S3, GCS, Iceberg, Delta Lake, Hugging Face, Unity Catalog)
* **Out-of-box reliability:** Intelligent memory management and sensible defaults eliminate configuration headaches

Getting Started
---------------

Installation
^^^^^^^^^^^^

Install Daft with ``pip install daft``. Requires Python 3.10 or higher.

For more advanced installations (e.g. installing from source or with extra dependencies such as Ray and AWS utilities), please see our `Installation Guide <https://docs.daft.ai/en/stable/install/>`_

Quickstart
^^^^^^^^^^

Get started in minutes with our `Quickstart <https://docs.daft.ai/en/stable/quickstart/>`_ - load a real-world e-commerce dataset, process product images, and run AI inference at scale.


More Resources
^^^^^^^^^^^^^^

* `Examples <https://docs.daft.ai/en/stable/examples/>`_ - see Daft in action with use cases across text, images, audio, and more
* `User Guide <https://docs.daft.ai/en/stable/>`_ - take a deep-dive into each topic within Daft
* `API Reference <https://docs.daft.ai/en/stable/api/>`_ - API reference for public classes/functions of Daft

Benchmarks
----------
|Benchmark Image|

To see the full benchmarks, detailed setup, and logs, check out our `benchmarking page. <https://docs.daft.ai/en/stable/benchmarks>`_

Contributing
------------

We ❤️ developers! To start contributing to Daft, please read `CONTRIBUTING.md <https://github.com/Eventual-Inc/Daft/blob/main/CONTRIBUTING.md>`_. This document describes the development lifecycle and toolchain for working on Daft. It also details how to add new functionality to the core engine and expose it through a Python API.

Here's a list of `good first issues <https://github.com/Eventual-Inc/Daft/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22>`_ to get yourself warmed up with Daft. Comment in the issue to pick it up, and feel free to ask any questions!

Telemetry
---------

To help improve Daft, we collect non-identifiable data via Scarf (https://scarf.sh).

To disable this behavior, set the environment variable ``DO_NOT_TRACK=true``.

The data that we collect is:

1. **Non-identifiable:** Events are keyed by a session ID which is generated on import of Daft
2. **Metadata-only:** We do not collect any of our users’ proprietary code or data
3. **For development only:** We do not buy or sell any user data

Please see our `documentation <https://docs.daft.ai/en/stable/resources/telemetry/>`_ for more details.

.. image:: https://static.scarf.sh/a.png?x-pxid=31f8d5ba-7e09-4d75-8895-5252bbf06cf6

Related Projects
----------------

+---------------------------------------------------+-----------------+---------------+-------------+-----------------+-----------------------------+-------------+
| Engine                                            | Query Optimizer | Multimodal    | Distributed | Arrow Backed    | Vectorized Execution Engine | Out-of-core |
+===================================================+=================+===============+=============+=================+=============================+=============+
| Daft                                              | Yes             | Yes           | Yes         | Yes             | Yes                         | Yes         |
+---------------------------------------------------+-----------------+---------------+-------------+-----------------+-----------------------------+-------------+
| `Pandas <https://github.com/pandas-dev/pandas>`_  | No              | Python object | No          | optional >= 2.0 | Some(Numpy)                 | No          |
+---------------------------------------------------+-----------------+---------------+-------------+-----------------+-----------------------------+-------------+
| `Polars <https://github.com/pola-rs/polars>`_     | Yes             | Python object | No          | Yes             | Yes                         | Yes         |
+---------------------------------------------------+-----------------+---------------+-------------+-----------------+-----------------------------+-------------+
| `Modin <https://github.com/modin-project/modin>`_ | Yes             | Python object | Yes         | No              | Some(Pandas)                | Yes         |
+---------------------------------------------------+-----------------+---------------+-------------+-----------------+-----------------------------+-------------+
| `Ray Data <https://github.com/ray-project/ray>`_  | No              | Yes           | Yes         | Yes             | Some(PyArrow)               | Yes         |
+---------------------------------------------------+-----------------+---------------+-------------+-----------------+-----------------------------+-------------+
| `PySpark <https://github.com/apache/spark>`_      | Yes             | No            | Yes         | Pandas UDF/IO   | Pandas UDF                  | Yes         |
+---------------------------------------------------+-----------------+---------------+-------------+-----------------+-----------------------------+-------------+
| `Dask DF <https://github.com/dask/dask>`_         | No              | Python object | Yes         | No              | Some(Pandas)                | Yes         |
+---------------------------------------------------+-----------------+---------------+-------------+-----------------+-----------------------------+-------------+

License
-------

Daft has an Apache 2.0 license - please see the LICENSE file.

.. |Quickstart Image| image:: https://github.com/Eventual-Inc/Daft/assets/17691182/dea2f515-9739-4f3e-ac58-cd96d51e44a8
   :alt: Dataframe code to load a folder of images from AWS S3 and create thumbnails
   :height: 256

.. |Benchmark Image| image:: https://raw.githubusercontent.com/Eventual-Inc/Daft/refs/heads/main/assets/benchmark.png
   :alt: AI Benchmarks

.. |Banner| image:: https://daft.ai/images/diagram.png
   :target: https://www.daft.ai
   :alt: Daft dataframes can load any data such as PDF documents, images, protobufs, csv, parquet and audio files into a table dataframe structure for easy querying

.. |CI| image:: https://github.com/Eventual-Inc/Daft/actions/workflows/pr-test-suite.yml/badge.svg
   :target: https://github.com/Eventual-Inc/Daft/actions/workflows/pr-test-suite.yml?query=branch:main
   :alt: GitHub Actions tests

.. |PyPI| image:: https://img.shields.io/pypi/v/daft.svg?label=pip&logo=PyPI&logoColor=white
   :target: https://pypi.org/project/daft
   :alt: PyPI

.. |Latest Tag| image:: https://img.shields.io/github/v/tag/Eventual-Inc/Daft?label=latest&logo=GitHub
   :target: https://github.com/Eventual-Inc/Daft/tags
   :alt: latest tag

.. |Coverage| image:: https://codecov.io/gh/Eventual-Inc/Daft/branch/main/graph/badge.svg?token=J430QVFE89
   :target: https://codecov.io/gh/Eventual-Inc/Daft
   :alt: Coverage

.. |Slack| image:: https://img.shields.io/badge/slack-@distdata-purple.svg?logo=slack
   :target: https://join.slack.com/t/dist-data/shared_invite/zt-3rh9jr9iv-tmmTNOlQpfvhEy2NTMWS_w
   :alt: slack community

.. |TrendShift| image:: https://trendshift.io/api/badge/repositories/8239
   :target: https://trendshift.io/repositories/8239
   :alt: Eventual-Inc/Daft | Trendshift
   :width: 250px
   :height: 55px

