Metadata-Version: 2.4
Name: pyconverters-newsml
Version: 0.6.5
Summary: NewsML converter (AFP news)
Project-URL: Homepage, https://github.com/oterrier/pyconverters_newsml/
Author-email: Olivier Terrier <olivier.terrier@kairntech.com>
License: MIT
License-File: AUTHORS.md
License-File: LICENSE
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.12
Requires-Dist: beautifulsoup4
Requires-Dist: inscriptis>=2.3.0
Requires-Dist: openpyxl>=3.1.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: pymultirole-plugins<0.7.0,>=0.6.0
Provides-Extra: dev
Requires-Dist: bump2version; extra == 'dev'
Requires-Dist: pre-commit; extra == 'dev'
Provides-Extra: docs
Requires-Dist: jupyter-sphinx; extra == 'docs'
Requires-Dist: lxml-html-clean; extra == 'docs'
Requires-Dist: m2r2; extra == 'docs'
Requires-Dist: sphinx; extra == 'docs'
Requires-Dist: sphinx-rtd-theme; extra == 'docs'
Requires-Dist: sphinxcontrib-apidoc; extra == 'docs'
Provides-Extra: test
Requires-Dist: pytest-cov; extra == 'test'
Requires-Dist: pytest>=7.0.0; extra == 'test'
Requires-Dist: requests-cache; extra == 'test'
Requires-Dist: ruff; extra == 'test'
Requires-Dist: tqdm; extra == 'test'
Description-Content-Type: text/markdown

# pyconverters_newsml

[![license](https://img.shields.io/github/license/oterrier/pyconverters_newsml)](https://github.com/oterrier/pyconverters_newsml/blob/master/LICENSE)
[![tests](https://github.com/oterrier/pyconverters_newsml/workflows/tests/badge.svg)](https://github.com/oterrier/pyconverters_newsml/actions?query=workflow%3Atests)
[![codecov](https://img.shields.io/codecov/c/github/oterrier/pyconverters_newsml)](https://codecov.io/gh/oterrier/pyconverters_newsml)
[![docs](https://img.shields.io/readthedocs/pyconverters_newsml)](https://pyconverters_newsml.readthedocs.io)
[![version](https://img.shields.io/pypi/v/pyconverters_newsml)](https://pypi.org/project/pyconverters_newsml/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pyconverters_newsml)](https://pypi.org/project/pyconverters_newsml/)

Convert AFP NewsML-G2 XML feeds into `pymultirole` `Document` objects.

Supports text articles, picture captions, video descriptions, and graphic items.
Extracts IPTC media topics, AFP-specific subjects (persons, organisations, locations),
keywords, urgency, genre, language, and other metadata.

## Installation

```
pip install pyconverters_newsml
```

## Usage

The converter is registered as a `pyconverters.plugins` entry point under the name `newsml`
and integrates automatically with the `pymultirole` plugin system.

### Parameters

| Parameter | Type | Default | Description |
|---|---|---|---|
| `subjects_as_metadata` | `str` | `""` | Comma-separated subject types to extract as metadata: `medtop`, `afpperson`, `afporganization`, `afplocation`, or `all`. |
| `subjects_code` | `bool` | `False` | When `True`, metadata values are `"code:name"` strings; when `False`, only the name is stored. |
| `mediatopics_as_categories` | `bool` | `False` | When `True`, IPTC media-topic codes are added as hierarchical `Category` objects. |
| `keywords_as_categories` | `bool` | `False` | When `True`, AFP slug keywords are added as `Category` objects. |
| `natures` | `str` | `"text"` | Comma-separated list of item natures to include: `text`, `video`, `picture`, `graphic`. |

## Developing

### Prerequisites

You will need [uv](https://github.com/astral-sh/uv) and Python 3.12.

Clone the repository:

```
git clone https://github.com/oterrier/pyconverters_newsml
cd pyconverters_newsml
```

Install dependencies (including test extras):

```
uv sync --extra test
```

### Running the test suite

```
uv run pytest
```

### Linting

```
uv run ruff check .
uv run ruff format --check .
```

### Building the documentation

```
uv run --extra docs sphinx-build docs docs/_build
```

The built documentation is available at `docs/_build/index.html`.
