Skip to content

Text

Text analysis: word count, encoding detection, email/URL/number extraction.

6 modules

ModuleDescription
Character CountCount characters in text
Detect EncodingDetect text encoding
Extract EmailsExtract all email addresses from text
Extract NumbersExtract all numbers from text
Extract URLsExtract all URLs from text
Word CountCount words in text

Modules

Character Count

text.char_count

Count characters in text

Parameters:

NameTypeRequiredDefaultDescription
texttextYes-Text to analyze

Output:

FieldTypeDescription
totalnumberText to analyze
without_spacesnumberTotal character count
lettersnumberTotal character count
digitsnumberCount without spaces
spacesnumberLetter count
linesnumberDigit count

Detect Encoding

text.detect_encoding

Detect text encoding

Parameters:

NameTypeRequiredDefaultDescription
texttextYes-Text or bytes to detect encoding

Output:

FieldTypeDescription
encodingstringText or bytes to detect encoding
confidencenumberDetected encoding
is_asciibooleanDetected encoding
has_bombooleanConfidence score (0-1)

Extract Emails

text.extract_emails

Extract all email addresses from text

Parameters:

NameTypeRequiredDefaultDescription
texttextYes-Text to extract emails from
uniquebooleanNoTrueText to extract emails from
lowercasebooleanNoTrueReturn only unique emails

Output:

FieldTypeDescription
emailsarrayConvert emails to lowercase
countnumberList of extracted emails
domainsarrayList of extracted emails

Extract Numbers

text.extract_numbers

Extract all numbers from text

Parameters:

NameTypeRequiredDefaultDescription
texttextYes-Text to extract numbers from
include_decimalsbooleanNoTrueText to extract numbers from
include_negativebooleanNoTrueInclude decimal numbers

Output:

FieldTypeDescription
numbersarrayInclude negative numbers
countnumberList of extracted numbers
sumnumberList of extracted numbers
minnumberNumber of numbers found
maxnumberSum of all numbers

Extract URLs

text.extract_urls

Extract all URLs from text

Parameters:

NameTypeRequiredDefaultDescription
texttextYes-Text to extract URLs from
uniquebooleanNoTrueText to extract URLs from

Output:

FieldTypeDescription
urlsarrayReturn only unique URLs
countnumberList of extracted URLs

Word Count

text.word_count

Count words in text

Parameters:

NameTypeRequiredDefaultDescription
texttextYes-Text to analyze

Output:

FieldTypeDescription
word_countnumberText to analyze
unique_wordsnumberTotal word count
sentence_countnumberTotal word count
paragraph_countnumberNumber of unique words
avg_word_lengthnumberApproximate sentence count

Released under the Apache 2.0 License.