Metadata-Version: 1.1
Name: sumy
Version: 0.0.1
Summary: Module for automatic text summarization of HTML documents.
Home-page: https://github.com/miso-belica/sumy
Author: Michal Belica
Author-email: miso.belica@gmail.com
License: Copyright 2013 Michal Belica

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Description: =========================
        Automatic text summarizer
        =========================
        .. image:: https://api.travis-ci.org/miso-belica/sumy.png?branch=master
           :target: https://travis-ci.org/miso-belica/sumy
        
        Here are some other summarizers:
        
        - `TextTeaser <https://github.com/MojoJolo/textteaser>`_ - Scala
        - https://github.com/thavelick/summarize/ - Python, TF (very simple)
        - `Sumtract: Second project for UW LING 572 <https://github.com/stefanbehr/sumtract>`_ - Python
        - `Simple program that summarize text <https://github.com/xhresko/text-summarizer>`_ - Python, TF without normalization
        - `Open Text Summarizer <http://libots.sourceforge.net/>`_ - C, TF without normalization
        - `Intro to Computational Linguistics <https://github.com/kylehardgrave/summarizer>`_ - Java, LexRank
        
        - `Automatic Document Summarizer <https://github.com/himanshujindal/Automatic-Text-Summarizer>`_ - Java, Bipartite HITS (no sources)
        - `Pythia <https://github.com/giorgosera/pythia/blob/dev/analysis/summarization/summarization.py>`_ - Python, LexRank & Centroid
        - `Reduction <https://github.com/adamfabish/Reduction>`_ - Python, some graph method
        - `SWING <https://github.com/WING-NUS/SWING>`_ - Ruby
        - `Topic Networks <https://github.com/bobflagg/Topic-Networks>`_ - R, topic models & bipartite graphs
        - `Almus: Automatic Text Summarizer <http://textmining.zcu.cz/?lang=en&section=download>`_ - Java, LSA (without source code)
        - `Musutelsa <http://www.musutelsa.jamstudio.eu/>`_ - Java, LSA (always freezes)
        - http://mff.bajecni.cz/index.php - C++
        - `MEAD <http://www.summarization.com/mead/>`_ - Perl, various methods + evaluation framework
        
        
        Installation
        ------------
        Currently only from git repo (make sure you have
        `Python installed <https://python-guide.readthedocs.org/en/latest/#getting-started>`_)
        
        .. code-block:: bash
        
            $ wget https://github.com/miso-belica/sumy/archive/master.zip # download the sources
            $ unzip master.zip # extract the downloaded file
            $ cd sumy-master/
            $ [sudo] python setup.py install # install the package
        
        Or simply run:
        
        .. code-block:: bash
        
            $ [sudo] pip install git+git://github.com/miso-belica/sumy.git
        
        
        Usage
        -----
        Sumy contains command line utility for quick summarization of documents.
        
        .. code-block:: bash
        
            $ sumy lex-rank --length=10 --url=http://en.wikipedia.org/wiki/Automatic_summarization # what's summarization?
            $ sumy luhn --language=czech --url=http://www.zdrojak.cz/clanky/automaticke-zabezpeceni/
            $ sumy edmundson --language=czech --length=3% --url=http://cs.wikipedia.org/wiki/Bitva_u_Lipan
            $ sumy --help # for more info
        
        Various evaluation methods for some summarization method can be executed by
        commands below:
        
        .. code-block:: bash
        
            $ sumy_eval lex-rank reference_summary.txt --url=http://en.wikipedia.org/wiki/Automatic_summarization
            $ sumy_eval lsa reference_summary.txt --language=czech --url=http://www.zdrojak.cz/clanky/automaticke-zabezpeceni/
            $ sumy_eval edmundson reference_summary.txt --language=czech --url=http://cs.wikipedia.org/wiki/Bitva_u_Lipan
            $ sumy_eval --help # for more info
        
        
        Python API
        ----------
        Or you can use sumy like a library in your project.
        
        .. code-block:: python
        
            # -*- coding: utf8 -*-
        
            from __future__ import absolute_import
            from __future__ import division, print_function, unicode_literals
        
            from sumy.parsers.html import HtmlParser
            from sumy.nlp.tokenizers import Tokenizer
            from sumy.summarizers.lsa import LsaSummarizer
            from sumy.nlp.stemmers.czech import stem_word
            from sumy.utils import get_stop_words
        
        
            if __name__ == "__main__":
                url = "http://www.zsstritezuct.estranky.cz/clanky/predmety/cteni/jak-naucit-dite-spravne-cist.html"
                parser = HtmlParser.from_url(url, Tokenizer("czech"))
        
                summarizer = LsaSummarizer(stem_word)
                summarizer.stop_words = get_stop_words("czech")
        
                for sentence in summarizer(parser.document, 20):
                    print(sentence)
        
        
        Tests
        -----
        Run tests via
        
        .. code-block:: bash
        
            $ nosetests-2.6 && nosetests-3.2 && nosetests-2.7 && nosetests-3.3
        
        
        .. :changelog:
        
        Changelog
        =========
        
        0.1.0 (2013-MM-DD)
        ------------------
        - First public release.
        
Keywords: data mining,text summarization,data reduction,web-data extraction,NLP,natural language processing,latent semantic analysis,LSA,singular value decomposition,SVD
Platform: UNKNOWN
Classifier: Development Status :: 1 - Planning
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Natural Language :: Czech
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Topic :: Education
Classifier: Topic :: Internet
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Text Processing :: Filters
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Text Processing :: Markup :: HTML
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.2
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: Implementation :: CPython
