Metadata-Version: 1.1
Name: shaman
Version: 0.0.6.dev1
Summary: Multiprocessing application to download and analyze a content of an html pages.
Home-page: https://github.com/Landish145/shaman
Author: eugtsa,azraev
Author-email: eugtsa@gmail.com,azraev@gmail.com
License: MIT
Description: This is the documentation for the Shaman. Multiprocessing application to combine different singular handlers against one message.
        
        The initial purpose was to create a tool, that:
            - would make possible to download and analyze a content of an html pages.
            - simple enough to add a new functionality in it.
            - hast to be scalable (multiprocessing).
        Actual usage can be different from it. There are some spontaneous ideas:
            - scanning a mongo collection and parsing documents in parallel
            - parsing a lot of lines from multiple huge files, saving the results to any database (depending on the results)
        
        There are three parts in the shaman library:
            * stages (actual processors, which do represent some functionality)
            * consumer (worker, that run them all in a particular order)
            * daemon (run as many as needed workers. Also used as a CLI unstrument.)
            All stages are run in a particular order and use the same message object (inside one worker).
        
        INSTALLATION:
            pip install shaman
        
        If everything is ok, you should be able to run:
            shaman --help
        
        It has to display:
        
            usage: shaman [-h] [-i | -d] -c CONFIGURATION [--drop_first DROP_FIRST]
                          [-p PRINT_FIELDS [PRINT_FIELDS ...]]
                          [-r REMOVE_FIELDS [REMOVE_FIELDS ...]]
                          [--ignore_after IGNORE_AFTER]
                          [{stop,start,restart,} [{stop,start,restart,} ...]]
        
            Main shaman module. Use it to start|stop|restart daemon or start non-daemon
            modes of shaman
        
            positional arguments:
             {stop,start,restart,}
                                     Command to daemon (default: )
        
            optional arguments:
             -h, --help            show this help message and exit
             -i                    Use stdin input as main input (default: False)
             -d                    Daemonize main process (default: False)
             -c CONFIGURATION      Path to configuration file (default: None)
             --drop_first DROP_FIRST
                                   drop first lines (default: 0)
             -p PRINT_FIELDS [PRINT_FIELDS ...], --print_fields PRINT_FIELDS [PRINT_FIELDS ...]
             -r REMOVE_FIELDS [REMOVE_FIELDS ...], --remove_fields REMOVE_FIELDS [REMOVE_FIELDS ...]
             --ignore_after IGNORE_AFTER
        
        CONFIGURATION:
        
        You may find an example configuration file in <path_to_python_lib>/site-packages/shaman/etc/crawler.config
        It includes 4 stages:
        
            reading from stdin
            downloading page
            detecting charset
            print url, charset
        
        By default, all stages reside in <path_to_python_lib>/site-packages/shaman/src/analyzers/ folder.
        You may create your custom stage and put it into the custom folder.
        There is a parameter in a configuration file:
        
            custom_stage_dir = <custom_folder>
        
        If you put some stages into this folder, shaman will also "see" them.
        
        To check if anything is working, please, run:
        
            echo "http://google.ru" | shaman -c <path_to_config> -i
        
        More information about the package: `here
        <http://shaman.readthedocs.io/en/latest/>`_.
        Github: `<https://github.com/Landish145/shaman>`_.
        
Keywords: crawlers analyze development
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Build Tools
Classifier: Natural Language :: English
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
