Skip to content

Scraper

flowtask.components.ProductInfo.scraper

ProductInfo

ProductInfo(loop=None, job=None, stat=None, **kwargs)

Bases: FlowComponent, HTTPService, SeleniumService

Product Information Scraper Component

This component extracts detailed product information by: 1. Searching for products using search terms 2. Extracting model codes from URLs 3. Parsing product details from manufacturer websites

Configuration options: - search_column: Column name containing search terms (default: 'model') - parsers: List of parser names to use (default: ['epson']) - max_results: Maximum number of search results to process (default: 5) - concurrently: Process items concurrently (default: True) - task_parts: Number of parts to split concurrent tasks (default: 10)

close async

close()

Clean up resources.

run async

run()

Execute product info extraction for each row.

split_parts

split_parts(tasks, num_parts=5)

Split tasks into parts for concurrent processing.

start async

start(**kwargs)

Initialize component and validate requirements.