Skip to content

Scrapper

flowtask.components.ServiceScrapper.scrapper

ServiceScrapper

ServiceScrapper(loop=None, job=None, stat=None, **kwargs)

Bases: FlowComponent, SeleniumService, HTTPService

Service Scraper Component

Overview:

Pluggable component for scrapping several services and sites using different scrapers.

.. table:: Properties :widths: auto

+-----------------------+----------+------------------------------------------------------------------------------------------------------+ | Name | Required | Description | +-----------------------+----------+------------------------------------------------------------------------------------------------------+ | url_column (str) | Yes | Name of the column containing URLs to scrape (default: 'search_url') | +-----------------------+----------+------------------------------------------------------------------------------------------------------+ | wait_for (tuple) | No | Element to wait for before scraping (default: ('class', 'company-overview')) | +-----------------------+----------+------------------------------------------------------------------------------------------------------+

Return: - DataFrame with company information

close async

close()

Clean up resources.

run async

run()

Execute scraping for requested URL in the DataFrame.

start async

start(**kwargs)

Initialize the component and validate required parameters.