Metadata-Version: 1.1
Name: mwcites
Version: 0.0.5
Summary: A collection of scripts and utilities for extracting citations to academic literature from Wikipedia's XML database dumps.
Home-page: https://github.com/halfak/Extract-scholarly-article-citations-from-Wikipedia
Author: Aaron Halfaker
Author-email: ahalfaker@wikimedia.org
License: MIT
Description: Extract academic citaitons from Wikipedia
        =========================================
        This project contains a utility for extracting academic citation identifiers.
        
        ``pip install mwcites``
        
        Usage
        -----
        There's really only one utility in this package called ``extract_cites``.
        
        ::
        
            $ extract_cites enwiki-20150112-pages-meta-history*.xml*.bz2 > citations.tsv
        
        
        Documentation
        -------------
        Documentation is provided ``$ mwcites extract -h``.
        
        ::
        
            Extracts academic citations from articles from the history of Wikipedia
            articles by processing a pages-meta-history XML dump and matching regular
            expressions to revision content.
        
            Currently supported identifies include:
        
             * PubMed
             * DOI
             
            Outputs a TSV file with the following fields:
        
             * page_id: The identifier of the Wikipedia article (int), e.g. 1325125
             * page_title: The title of the Wikipedia article (utf-8), e.g. Club cell
             * rev_id: The Wikipedia revision where the citation was first added (int),
                       e.g. 282470030
             * timestamp: The timestamp of the revision where the citation was first added.
                          (ISO 8601 datetime), e.g. 2009-04-08T01:52:20Z
             * type: The type of identifier, e.g. pmid
             * id: The id of the cited scholarly article (utf-8),
                   e.g 10.1183/09031936.00213411
        
            Usage:
                mwcites extract -h | --help
                mwcites extract <dump_file>...
        
            Options:
                -h --help        Shows this documentation
        
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: MIT License
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: System Administrators
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Topic :: Utilities
Classifier: Topic :: Scientific/Engineering
