Skip to content

Docx

flowtask.components.LangchainLoader.loaders.docx

MSWordLoader

MSWordLoader(tokenizer=None, text_splitter=None, summarizer=None, markdown_splitter=None, source_type='file', doctype='document', device=None, cuda_number=0, llm=None, **kwargs)

Bases: AbstractLoader

Load Microsoft Docx as Langchain Documents.

extract_text

extract_text(path)

Extract text from a docx file.

Parameters:

Name Type Description Default
path Path

The source of the data.

required

Returns:

Name Type Description
str

The extracted text.