Skip to content

Companyscraper

flowtask.components.CompanyScraper

parsers

base

ScrapperBase
ScrapperBase(*args, **kwargs)

Bases: SeleniumService, HTTPService

ScrapperBase Model.

Define how scrappers should be work.-

explorium

ExploriumScrapper
ExploriumScrapper(*args, **kwargs)

Bases: ScrapperBase

ExploriumScrapper Model.

scrapping async
scrapping(document, idx, row)

Scrape company information from Explorium. Updates the existing row with new data from Explorium.

leadiq

LeadiqScrapper
LeadiqScrapper(*args, **kwargs)

Bases: ScrapperBase

LeadiqScrapper Model.

scrapping async
scrapping(document, idx, row)

Scrape company information from LeadIQ. Updates the existing row with new data from LeadIQ.

rocket

RocketReachScrapper
RocketReachScrapper(*args, **kwargs)

Bases: ScrapperBase

RocketReachScrapper Model.

scrapping async
scrapping(document, idx, row)

Scrape company information from LeadIQ. Updates the existing row with new data from LeadIQ.

siccode

SicCodeScrapper
SicCodeScrapper(*args, **kwargs)

Bases: ScrapperBase

SicCodeScrapper Model.

scrapping async
scrapping(document, idx, row)

Scrapes company information from siccode.com and updates the row.

visualvisitor

VisualVisitorScrapper
VisualVisitorScrapper(*args, **kwargs)

Bases: ScrapperBase

VisualVisitorScrapper Model.

scrapping async
scrapping(document, idx, row)

Scrape company information from LeadIQ. Updates the existing row with new data from LeadIQ.

zoominfo

ZoomInfoScrapper
ZoomInfoScrapper(*args, **kwargs)

Bases: ScrapperBase

ZoomInfo Model.

scrapping async
scrapping(document, idx, row)

Scrape company information from Zoominfo. Updates the existing row with new data from Zoominfo.

scrapper

CompanyScraper

CompanyScraper(loop=None, job=None, stat=None, **kwargs)

Bases: FlowComponent, SeleniumService, HTTPService

Company Scraper Component

Overview:

This component scrapes company information from different sources using HTTPService. It can receive URLs from a previous component (like GoogleSearch) and extract specific company information.

.. table:: Properties :widths: auto

+-----------------------+----------+------------------------------------------------------------------------------------------------------+ | Name | Required | Description | +-----------------------+----------+------------------------------------------------------------------------------------------------------+ | url_column (str) | Yes | Name of the column containing URLs to scrape (default: 'search_url') | +-----------------------+----------+------------------------------------------------------------------------------------------------------+ | wait_for (tuple) | No | Element to wait for before scraping (default: ('class', 'company-overview')) | +-----------------------+----------+------------------------------------------------------------------------------------------------------+

Return:

The component adds new columns to the DataFrame with company information: - headquarters - phone_number - website - stock_symbol - naics_code - employee_count

close async
close()

Clean up resources.

extract_company_info
extract_company_info(soup, search_term, search_url)

Extract company information from the page.

run async
run()

Execute scraping for each URL in the DataFrame.

scrape_url async
scrape_url(idx, url)

Scrape company information from URL.

search_in_ddg async
search_in_ddg(search_term, company_name, scrapper, backend='html', region='wt-wt')

Search for a term in DuckDuckGo.

split_parts
split_parts(task_list, num_parts=5)

Split task list into parts for concurrent processing.

start async
start(**kwargs)

Initialize the component and validate required parameters.