Skip to content

Document

Excel, PDF, and Word document read/write/convert.

8 modules

ModuleDescription
Read ExcelRead data from Excel files (xlsx, xls)
Write ExcelWrite data to Excel files (xlsx)
Fill PDF FormFill PDF form fields with data and optionally insert images
Generate PDFGenerate PDF files from HTML content or text
Parse PDFExtract text and metadata from PDF files
PDF to WordConvert PDF files to Word documents (.docx)
Parse Word DocumentExtract text and content from Word documents (.docx)
Word to PDFConvert Word documents (.docx) to PDF files

Modules

Read Excel

excel.read

Read data from Excel files (xlsx, xls)

Parameters:

NameTypeRequiredDefaultDescription
pathstringYes-Path to the Excel file
sheetstringNo-Sheet name (default: first sheet)
header_rownumberNo1Row number for headers (1-based, 0 for no headers)
rangestringNo-Cell range to read (e.g., "A1:D10")
as_dictbooleanNoTrueReturn rows as dictionaries (using headers as keys)

Output:

FieldTypeDescription
dataarrayExtracted data rows
headersarrayExtracted data rows
row_countnumberExtracted data rows
sheet_namesarrayColumn headers

Example: Read entire sheet

yaml
path: /tmp/data.xlsx
as_dict: true

Write Excel

excel.write

Write data to Excel files (xlsx)

Parameters:

NameTypeRequiredDefaultDescription
pathstringYes-Path to the Excel file
dataarrayYes-Data to write (array of arrays or array of objects)
headersarrayNo-Column headers (auto-detected from objects if not provided)
sheet_namestringNoSheet1Name of the worksheet
auto_widthbooleanNoTrueAutomatically adjust column widths

Output:

FieldTypeDescription
pathstringPath to the created Excel file
row_countnumberPath to the created Excel file
sizenumberPath to the created Excel file

Example: Write data to Excel

yaml
path: /tmp/output.xlsx
data: [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]

Fill PDF Form

pdf.fill_form

Fill PDF form fields with data and optionally insert images

Parameters:

NameTypeRequiredDefaultDescription
templatestringYes-Path to the PDF template file
outputstringYes-Path for the output document
fieldsobjectNo{}Key-value pairs of form field names and values
imagesarrayNo[]List of images to insert with position info
flattenbooleanNoTrueFlatten form fields (make them non-editable)

Output:

FieldTypeDescription
output_pathstringPath to the filled PDF
fields_fillednumberPath to the filled PDF
images_insertednumberPath to the filled PDF
file_size_bytesnumberNumber of images inserted

Example: Fill form with text fields

yaml
template: /templates/form.pdf
output: /output/filled.pdf
fields: {"name": "John Doe", "id_number": "A123456789", "date": "2024-01-01"}

Example: Fill form with photo

yaml
template: /templates/id_card.pdf
output: /output/id_card_filled.pdf
fields: {"name": "Jane Doe"}
images: [{"file": "/photos/jane.jpg", "page": 1, "x": 50, "y": 650, "width": 100, "height": 120}]

Generate PDF

pdf.generate

Generate PDF files from HTML content or text

Parameters:

NameTypeRequiredDefaultDescription
contentstringYes-HTML or text content to convert to PDF
output_pathstringYes-Path for the output document
titlestringNo-Document title (metadata)
authorstringNo-Document author (metadata)
page_sizeselect (A4, Letter, Legal, A3, A5)NoA4Page size format
orientationselect (portrait, landscape)NoportraitPage orientation
marginnumberNo20Page margin in millimeters
headerstringNo-Header text for each page
footerstringNo-Footer text for each page

Output:

FieldTypeDescription
output_pathstringPath to the generated PDF
page_countnumberPath to the generated PDF
file_size_bytesnumberNumber of pages in the PDF

Example: Generate from HTML

yaml
content: <h1>Report</h1><p>Content here</p>
output_path: /path/to/report.pdf
title: Monthly Report

Parse PDF

pdf.parse

Extract text and metadata from PDF files

Parameters:

NameTypeRequiredDefaultDescription
pathstringYes-Path to the PDF file
pagesstringNoallPage range (e.g., "1-5", "1,3,5", or "all")
extract_imagesbooleanNoFalseExtract embedded images
extract_tablesbooleanNoFalseExtract tables as structured data

Output:

FieldTypeDescription
textstringExtracted text content
pagesarrayExtracted text content
metadataobjectExtracted text content
page_countnumberText content per page

Example: Extract all text from PDF

yaml
path: /tmp/document.pdf
pages: all

PDF to Word

pdf.to_word

Convert PDF files to Word documents (.docx)

Parameters:

NameTypeRequiredDefaultDescription
input_pathstringYes-Path to the input document
output_pathstringNo-Path for the output document
preserve_formattingbooleanNoTruePreserve basic formatting
pagesstringNoallPage range (e.g., "1-5", "1,3,5", or "all")

Output:

FieldTypeDescription
output_pathstringPath to the generated Word document
page_countnumberPath to the generated Word document
file_sizenumberNumber of pages converted

Example: Convert entire PDF to Word

yaml
input_path: /tmp/document.pdf

Example: Convert specific pages

yaml
input_path: /tmp/document.pdf
output_path: /tmp/output.docx
pages: 1-5

Parse Word Document

word.parse

Extract text and content from Word documents (.docx)

Parameters:

NameTypeRequiredDefaultDescription
file_pathstringYes-Path to the Word document (.docx)
extract_tablesbooleanNoTrueExtract tables as structured data
extract_imagesbooleanNoFalseExtract embedded images
images_output_dirstringNo-Directory to save extracted images
preserve_formattingbooleanNoFalsePreserve basic formatting

Output:

FieldTypeDescription
textstringFull text content of the document
paragraphsarrayFull text content of the document
tablesarrayFull text content of the document
imagesarrayList of paragraphs
metadataobjectExtracted tables as arrays

Example: Extract text from Word

yaml
file_path: /path/to/document.docx

Example: Extract with tables and images

yaml
file_path: /path/to/document.docx
extract_tables: true
extract_images: true
images_output_dir: /path/to/images/

Word to PDF

word.to_pdf

Convert Word documents (.docx) to PDF files

Parameters:

NameTypeRequiredDefaultDescription
input_pathstringYes-Path to the input document
output_pathstringNo-Path for the output document
methodselect (auto, libreoffice, docx2pdf)NoautoMethod to use for conversion

Output:

FieldTypeDescription
output_pathstringPath to the generated PDF file
file_sizenumberPath to the generated PDF file
method_usedstringSize of the output file in bytes

Example: Convert Word to PDF

yaml
input_path: /tmp/document.docx

Example: Convert with specific output path

yaml
input_path: /tmp/document.docx
output_path: /tmp/output.pdf

Released under the Apache 2.0 License.