Skip to content

Parsers

flowtask.components.ProductInfo.parsers

base

ParserBase

ParserBase(*args, **kwargs)

Bases: HTTPService, SeleniumService

Base class for product information parsers.

Defines the interface and common functionality for all product parsers.

create_search_query
create_search_query(term)

Create a search query for the given term.

Parameters:

Name Type Description Default
term str

Search term (typically product model)

required

Returns:

Type Description
str

Formatted search query

extract_model_code
extract_model_code(url)

Extract model code from URL using the regex pattern if defined.

Parameters:

Name Type Description Default
url str

URL to extract model code from

required

Returns:

Type Description
Optional[str]

Extracted model code or None if not found or pattern not defined

get_product_urls
get_product_urls(search_results, max_urls=5)

Extract relevant product URLs from search results.

Parameters:

Name Type Description Default
search_results List[Dict[str, str]]

List of search result dictionaries

required
max_urls int

Maximum number of URLs to return

5

Returns:

Type Description
List[str]

List of product URLs

parse abstractmethod async
parse(url, search_term)

Parse product information from a URL.

Parameters:

Name Type Description Default
url str

URL to parse

required
search_term str

Original search term

required

Returns:

Type Description
Dict[str, Any]

Dictionary with extracted product information

brother

BrotherParser

BrotherParser(*args, **kwargs)

Bases: ParserBase

Parser for Brother product information.

Extracts product details from Brother's USA website using Selenium.

get_product_urls
get_product_urls(search_results, max_urls=5)

Extract relevant product URLs from search results.

Parameters:

Name Type Description Default
search_results List[Dict[str, str]]

List of search result dictionaries

required
max_urls int

Maximum number of URLs to return

5

Returns:

Type Description
List[str]

List of product URLs that match the Brother product pattern

parse async
parse(url, search_term, retailer=None)

Parse product information from a Brother URL using Selenium.

Parameters:

Name Type Description Default
url str

Brother product URL

required
search_term str

Original search term

required
retailer Optional[str]

Optional retailer information (not used for Brother)

None

Returns:

Type Description
Dict[str, Any]

Dictionary with product information

canon

CanonParser

CanonParser(*args, **kwargs)

Bases: ParserBase

Parser for Canon product information.

Extracts product details from Canon's USA and Canada websites using Selenium.

create_search_query
create_search_query(term)

Create region-specific search query.

Parameters:

Name Type Description Default
term str

Search term (typically product model)

required

Returns:

Type Description
str

Formatted search query for the appropriate region

determine_region
determine_region(retailer)

Determine region based on retailer information.

Parameters:

Name Type Description Default
retailer Optional[str]

Retailer string that may contain region information

required

Returns:

Type Description
str

'ca' for Canada, 'us' for United States (default)

get_product_urls
get_product_urls(search_results, max_urls=5)

Extract relevant product URLs from search results.

Parameters:

Name Type Description Default
search_results List[Dict[str, str]]

List of search result dictionaries

required
max_urls int

Maximum number of URLs to return

5

Returns:

Type Description
List[str]

List of product URLs that match the Canon product pattern

parse async
parse(url, search_term, retailer=None)

Parse product information from a Canon URL using Selenium.

Parameters:

Name Type Description Default
url str

Canon product URL

required
search_term str

Original search term

required
retailer Optional[str]

Optional retailer information to determine region

None

Returns:

Type Description
Dict[str, Any]

Dictionary with product information

epson

EpsonParser

EpsonParser(*args, **kwargs)

Bases: ParserBase

Parser for Epson product information.

Extracts product details from Epson's website.

extract_model_code
extract_model_code(url)

Extract model code from URL using the regex pattern and clean it.

Parameters:

Name Type Description Default
url str

URL to extract model code from

required

Returns:

Type Description
Optional[str]

Cleaned model code or None if not found

parse async
parse(url, search_term, retailer=None)

Parse product information from an Epson URL.

Parameters:

Name Type Description Default
url str

Epson product URL

required
search_term str

Original search term

required
retailer str

Optional retailer information

None

Returns:

Type Description
Dict[str, Any]

Dictionary with product information

hp

HPParser

HPParser(*args, **kwargs)

Bases: ParserBase

Parser for HP product information.

Extracts product details from HP's website using Selenium for dynamic content.

get_product_urls
get_product_urls(search_results, max_urls=5)

Extract relevant product URLs from search results.

Parameters:

Name Type Description Default
search_results List[Dict[str, str]]

List of search result dictionaries

required
max_urls int

Maximum number of URLs to return

5

Returns:

Type Description
List[str]

List of product URLs that match the HP product pattern

parse async
parse(url, search_term, retailer=None)

Parse product information from an HP URL using Selenium.

Parameters:

Name Type Description Default
url str

HP product URL

required
search_term str

Original search term

required

Returns:

Type Description
Dict[str, Any]

Dictionary with product information

samsung

SamsungParser

SamsungParser(*args, **kwargs)

Bases: ParserBase

Parser for Samsung product information.

Extracts product details from Samsung's website using Selenium.

get_product_urls
get_product_urls(search_results, max_urls=1)

Extract relevant product URLs from search results.

Parameters:

Name Type Description Default
search_results List[Dict[str, str]]

List of search result dictionaries

required
max_urls int

Maximum number of URLs to return (default: 1)

1

Returns:

Type Description
List[str]

List of product URLs that match the Samsung product pattern

parse async
parse(url, search_term, retailer=None)

Parse product information from a Samsung URL using Selenium.

Parameters:

Name Type Description Default
url str

Samsung product URL

required
search_term str

Original search term

required
retailer Optional[str]

Optional retailer information (not used for Samsung)

None

Returns:

Type Description
Dict[str, Any]

Dictionary with product information