Skip to content

Parsehtml

flowtask.components.ParseHTML

ParseHTML

ParseHTML(job=None, *args, **kwargs)

Bases: FlowComponent

ParseHTML. Parse HTML Content using lxml etree and BeautifulSoup.

Example:

ParseHTML:
  xml: true

get_soup

get_soup(content, parser='html.parser')

Get a BeautifulSoup Object.

open_html async

open_html(filename)

Open the HTML file.

run async

run()

Open all Filenames and convert them into BeautifulSoup and etree objects.