Skip to content

Servicescrapper

flowtask.components.ServiceScrapper

ServiceScrapper

ServiceScrapper(loop=None, job=None, stat=None, **kwargs)

Bases: FlowComponent, SeleniumService, HTTPService

Service Scraper Component

Overview:

Pluggable component for scrapping several services and sites using different scrapers.

.. table:: Properties :widths: auto

+-----------------------+----------+------------------------------------------------------------------------------------------------------+ | Name | Required | Description | +-----------------------+----------+------------------------------------------------------------------------------------------------------+ | url_column (str) | Yes | Name of the column containing URLs to scrape (default: 'search_url') | +-----------------------+----------+------------------------------------------------------------------------------------------------------+ | wait_for (tuple) | No | Element to wait for before scraping (default: ('class', 'company-overview')) | +-----------------------+----------+------------------------------------------------------------------------------------------------------+

Return: - DataFrame with company information

close async

close()

Clean up resources.

run async

run()

Execute scraping for requested URL in the DataFrame.

start async

start(**kwargs)

Initialize the component and validate required parameters.

parsers

base

ScrapperBase
ScrapperBase(*args, **kwargs)

Bases: SeleniumService, HTTPService

ScrapperBase Model.

Define how scrappers should be work.-

connect abstractmethod async
connect()

Creates the Driver and Connects to the Site.

disconnect abstractmethod async
disconnect()

Disconnects the Driver and closes the Connection.

start async
start()

Starts de Navigation to Main Site.

costco

CostcoScrapper
CostcoScrapper(*args, **kwargs)

Bases: ScrapperBase

connect async
connect()

Creates the Driver and Connects to the Site.

disconnect async
disconnect()

Disconnects the Driver and closes the Connection.

product_information async
product_information(response, idx, row)

Get the product information from Costco.

special_events async
special_events(response, idx, row)

Get the special events from Costco.

scrapper

ServiceScrapper

ServiceScrapper(loop=None, job=None, stat=None, **kwargs)

Bases: FlowComponent, SeleniumService, HTTPService

Service Scraper Component

Overview:

Pluggable component for scrapping several services and sites using different scrapers.

.. table:: Properties :widths: auto

+-----------------------+----------+------------------------------------------------------------------------------------------------------+ | Name | Required | Description | +-----------------------+----------+------------------------------------------------------------------------------------------------------+ | url_column (str) | Yes | Name of the column containing URLs to scrape (default: 'search_url') | +-----------------------+----------+------------------------------------------------------------------------------------------------------+ | wait_for (tuple) | No | Element to wait for before scraping (default: ('class', 'company-overview')) | +-----------------------+----------+------------------------------------------------------------------------------------------------------+

Return: - DataFrame with company information

close async
close()

Clean up resources.

run async
run()

Execute scraping for requested URL in the DataFrame.

start async
start(**kwargs)

Initialize the component and validate required parameters.