Skip to content

Analysis

HTML analysis modules for readability, forms, tables, and metadata extraction.

6 modules

ModuleDescription
HTML ReadabilityAnalyze content readability
Extract FormsExtract form data from HTML
Extract MetadataExtract metadata from HTML
Extract TablesExtract table data from HTML
Find PatternsFind repeating data patterns in HTML
HTML StructureAnalyze HTML DOM structure

Modules

HTML Readability

analysis.html.analyze_readability

Analyze content readability

Parameters:

NameTypeRequiredDefaultDescription
htmlstringYes-HTML content to analyze

Output:

FieldTypeDescription
typeanyobject
propertiesany

Extract Forms

analysis.html.extract_forms

Extract form data from HTML

Parameters:

NameTypeRequiredDefaultDescription
htmlstringYes-HTML content to analyze

Output:

FieldTypeDescription
typeanyobject
propertiesany

Extract Metadata

analysis.html.extract_metadata

Extract metadata from HTML

Parameters:

NameTypeRequiredDefaultDescription
htmlstringYes-HTML content to analyze

Output:

FieldTypeDescription
typeanyobject
propertiesany

Extract Tables

analysis.html.extract_tables

Extract table data from HTML

Parameters:

NameTypeRequiredDefaultDescription
htmlstringYes-HTML content to analyze

Output:

FieldTypeDescription
typeanyobject
propertiesany

Find Patterns

analysis.html.find_patterns

Find repeating data patterns in HTML

Parameters:

NameTypeRequiredDefaultDescription
htmlstringYes-HTML content to analyze

Output:

FieldTypeDescription
typeanyobject
propertiesany

HTML Structure

analysis.html.structure

Analyze HTML DOM structure

Parameters:

NameTypeRequiredDefaultDescription
htmlstringYes-HTML content to analyze

Output:

FieldTypeDescription
typeanyobject
propertiesany

Released under the Apache 2.0 License.