Skip to content

Zoominfoscraper

flowtask.components.ZoomInfoScraper

ZoomInfoScraper

ZoomInfoScraper(loop=None, job=None, stat=None, **kwargs)

Bases: FlowComponent, HTTPService, SeleniumService

ZoomInfo Scraper Component that can use either HTTP or Selenium for scraping.

Overview:

This component scrapes company information from ZoomInfo pages using HTTPService. It can receive URLs from a previous component (like GoogleSearch) and extract specific company information.

.. table:: Properties :widths: auto

+-----------------------+----------+------------------------------------------------------------------------------------------------------+ | Name | Required | Description | +-----------------------+----------+------------------------------------------------------------------------------------------------------+ | url_column (str) | Yes | Name of the column containing URLs to scrape (default: 'search_url') | +-----------------------+----------+------------------------------------------------------------------------------------------------------+ | wait_for (tuple) | No | Element to wait for before scraping (default: ('class', 'company-overview')) | +-----------------------+----------+------------------------------------------------------------------------------------------------------+

Return:

The component adds new columns to the DataFrame with company information: - headquarters - phone_number - website - stock_symbol - naics_code - employee_count

close async

close()

Clean up resources.

extract_company_info

extract_company_info(soup, search_term, search_url)

Extract company information from the page.

run async

run()

Execute scraping for each URL in the DataFrame.

scrape_url async

scrape_url(idx, row)

Scrape a single ZoomInfo URL using either HTTP or Selenium.

split_parts

split_parts(task_list, num_parts=5)

Split task list into parts for concurrent processing.

start async

start(**kwargs)

Initialize the component and validate required parameters.