Parsers¶
flowtask.components.ProductInfo.parsers
¶
base
¶
ParserBase
¶
Bases: HTTPService, SeleniumService
Base class for product information parsers.
Defines the interface and common functionality for all product parsers.
create_search_query
¶
Create a search query for the given term.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
term
|
str
|
Search term (typically product model) |
required |
Returns:
| Type | Description |
|---|---|
str
|
Formatted search query |
extract_model_code
¶
Extract model code from URL using the regex pattern if defined.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
str
|
URL to extract model code from |
required |
Returns:
| Type | Description |
|---|---|
Optional[str]
|
Extracted model code or None if not found or pattern not defined |
get_product_urls
¶
Extract relevant product URLs from search results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
search_results
|
List[Dict[str, str]]
|
List of search result dictionaries |
required |
max_urls
|
int
|
Maximum number of URLs to return |
5
|
Returns:
| Type | Description |
|---|---|
List[str]
|
List of product URLs |
parse
abstractmethod
async
¶
Parse product information from a URL.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
str
|
URL to parse |
required |
search_term
|
str
|
Original search term |
required |
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dictionary with extracted product information |
brother
¶
BrotherParser
¶
Bases: ParserBase
Parser for Brother product information.
Extracts product details from Brother's USA website using Selenium.
get_product_urls
¶
Extract relevant product URLs from search results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
search_results
|
List[Dict[str, str]]
|
List of search result dictionaries |
required |
max_urls
|
int
|
Maximum number of URLs to return |
5
|
Returns:
| Type | Description |
|---|---|
List[str]
|
List of product URLs that match the Brother product pattern |
parse
async
¶
Parse product information from a Brother URL using Selenium.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
str
|
Brother product URL |
required |
search_term
|
str
|
Original search term |
required |
retailer
|
Optional[str]
|
Optional retailer information (not used for Brother) |
None
|
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dictionary with product information |
canon
¶
CanonParser
¶
Bases: ParserBase
Parser for Canon product information.
Extracts product details from Canon's USA and Canada websites using Selenium.
create_search_query
¶
Create region-specific search query.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
term
|
str
|
Search term (typically product model) |
required |
Returns:
| Type | Description |
|---|---|
str
|
Formatted search query for the appropriate region |
determine_region
¶
Determine region based on retailer information.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
retailer
|
Optional[str]
|
Retailer string that may contain region information |
required |
Returns:
| Type | Description |
|---|---|
str
|
'ca' for Canada, 'us' for United States (default) |
get_product_urls
¶
Extract relevant product URLs from search results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
search_results
|
List[Dict[str, str]]
|
List of search result dictionaries |
required |
max_urls
|
int
|
Maximum number of URLs to return |
5
|
Returns:
| Type | Description |
|---|---|
List[str]
|
List of product URLs that match the Canon product pattern |
parse
async
¶
Parse product information from a Canon URL using Selenium.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
str
|
Canon product URL |
required |
search_term
|
str
|
Original search term |
required |
retailer
|
Optional[str]
|
Optional retailer information to determine region |
None
|
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dictionary with product information |
epson
¶
EpsonParser
¶
Bases: ParserBase
Parser for Epson product information.
Extracts product details from Epson's website.
extract_model_code
¶
Extract model code from URL using the regex pattern and clean it.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
str
|
URL to extract model code from |
required |
Returns:
| Type | Description |
|---|---|
Optional[str]
|
Cleaned model code or None if not found |
parse
async
¶
Parse product information from an Epson URL.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
str
|
Epson product URL |
required |
search_term
|
str
|
Original search term |
required |
retailer
|
str
|
Optional retailer information |
None
|
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dictionary with product information |
hp
¶
HPParser
¶
Bases: ParserBase
Parser for HP product information.
Extracts product details from HP's website using Selenium for dynamic content.
get_product_urls
¶
Extract relevant product URLs from search results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
search_results
|
List[Dict[str, str]]
|
List of search result dictionaries |
required |
max_urls
|
int
|
Maximum number of URLs to return |
5
|
Returns:
| Type | Description |
|---|---|
List[str]
|
List of product URLs that match the HP product pattern |
parse
async
¶
Parse product information from an HP URL using Selenium.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
str
|
HP product URL |
required |
search_term
|
str
|
Original search term |
required |
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dictionary with product information |
samsung
¶
SamsungParser
¶
Bases: ParserBase
Parser for Samsung product information.
Extracts product details from Samsung's website using Selenium.
get_product_urls
¶
Extract relevant product URLs from search results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
search_results
|
List[Dict[str, str]]
|
List of search result dictionaries |
required |
max_urls
|
int
|
Maximum number of URLs to return (default: 1) |
1
|
Returns:
| Type | Description |
|---|---|
List[str]
|
List of product URLs that match the Samsung product pattern |
parse
async
¶
Parse product information from a Samsung URL using Selenium.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
str
|
Samsung product URL |
required |
search_term
|
str
|
Original search term |
required |
retailer
|
Optional[str]
|
Optional retailer information (not used for Samsung) |
None
|
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dictionary with product information |