Metadata-Version: 2.1
Name: sitemap-generator
Version: 0.9.10
Summary: web crawler and sitemap generator.
Home-page: https://github.com/Haikson/sitemap-generator
Author: Kamo Petrosyan
Author-email: kamo@haikson.com
License: GPL3
Classifier: Environment :: Web Environment
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Description-Content-Type: text/x-rst
License-File: LICENSE
License-File: NOTICE

pysitemap
=========

Sitemap generator

installing
----------

::

    pip install sitemap-generator

requirements
------------

::

    asyncio
    aiofile
    aiohttp

example
-------

::

    import sys
    import logging
    from pysitemap import crawler
    from pysitemap.parsers.lxml_parser import Parser

    if __name__ == '__main__':
        if '--iocp' in sys.argv:
            from asyncio import events, windows_events
            sys.argv.remove('--iocp')
            logging.info('using iocp')
            el = windows_events.ProactorEventLoop()
            events.set_event_loop(el)

        # root_url = sys.argv[1]
        root_url = 'https://www.haikson.com'
        crawler(
            root_url, out_file='debug/sitemap.xml', exclude_urls=[".pdf", ".jpg", ".zip"],
            http_request_options={"ssl": False}, parser=Parser
        )

TODO
-----

-  big sites with count of pages more then 100K will use more then 100MB
   memory. Move queue and done lists into database. Write Queue and Done
   backend classes based on
-  Lists
-  SQLite database
-  Redis
-  Write api for extending by user backends

changelog
---------

