Metadata-Version: 2.1
Name: sockit
Version: 0.0.2
Summary: Assign probabilistic Standard Occupational Classification (SOC) codes to free-text job titles
Home-page: https://github.com/ripl-org/sockit
Author: Mark Howison
Author-email: mhowison@ripl.org
License: UNKNOWN
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: License :: Free for non-commercial use
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Human Machine Interfaces
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Provides: sockit
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE.txt

*Sockit* is a self-contained toolkit for assigning probabilistic Standard Occupational Classification (SOC) codes to free-text job titles. It is developed by [Research Improving People's Lives (RIPL)](https://www.ripl.org).

## Installation

Requires Python 3.8 or later.

To install from PyPI using **pip**:

    pip install sockit

To install a **development version** from the current directory:

    pip install -e .

## Running

There is a single command line script included, `sockit`, that processes existing CSV files containing free-text job titles in one of the columns:

    sockit -h

    usage: sockit [-h] [-v] [-q] [-d] -i INPUT [-o OUTPUT] [--record_id RECORD_ID] [--title TITLE]

    Sockit: assign probabilistic Standard Occupational Classification (SOC) codes to free-text job titles https://github.com/ripl-org/sockit

    optional arguments:
      -h, --help            show this help message and exit
      -v, --version         show program's version number and exit
      -q, --quiet           suppress all logging messages except for errors
      -d, --debug           show all logging messages, including debugging output
      -i INPUT, --input INPUT
                            input CSV file containing the record ID and title fields
      -o OUTPUT, --output OUTPUT
                            output file (default: stdout) containing a JSON record per line: {'record_id': ..., 'title': ..., 'clean_title': ..., 'socs': [{'soc': ..., 'prob': ..., 'desc': ...}, ...]}
      --record_id RECORD_ID
                            field name corresponding to the record ID [default: 1-based index]
      --title TITLE         field name corresponding to the title [default: 'title']

Alternatively, you can load the `sockit` package in a python script and process titles one at a time with the `search()` method:

    import sockit
    clean_title = sockit.clean(title)
    result = sockit.search(clean_title)

## License

Sockit is freely available for non-commercial use under the license provided in [LICENSE.txt](https://github.com/ripl-org/sockit/blob/main/LICENSE.txt).
Please contact [connect@ripl.org](mailto:connect@ripl.org) to inquire about commercial use.

## Contributors

* Marcelle Goggins
* Ethan Ho
* Nile Dixon
* Mark Howison
* Joe Long
* Karen Shen


