Skip to content

Document

Excel, PDF, and Word document read/write/convert.

8 modules

ModuleDescription
Baca ExcelBaca data dari file Excel (xlsx, xls)
Tulis ExcelTulis data ke file Excel (xlsx)
Isi Form PDFIsi field form PDF dengan data dan opsional sisipkan gambar
Hasilkan PDFHasilkan file PDF dari konten HTML atau teks
Memproses PDFEkstrak teks dan metadata dari file PDF
PDF ke WordKonversi file PDF ke dokumen Word (.docx)
Parse Dokumen WordEkstrak teks dan konten dari dokumen Word (.docx)
Word ke PDFKonversi dokumen Word (.docx) ke file PDF

Modules

Baca Excel

excel.read

Baca data dari file Excel (xlsx, xls)

Parameters:

NameTypeRequiredDefaultDescription
pathstringYes-Path to the Excel file
sheetstringNo-Sheet name (default: first sheet)
header_rownumberNo1Row number for headers (1-based, 0 for no headers)
rangestringNo-Cell range to read (e.g., "A1:D10")
as_dictbooleanNoTrueReturn rows as dictionaries (using headers as keys)

Output:

FieldTypeDescription
dataarrayBaris data yang diekstrak
headersarrayBaris data yang diekstrak
row_countnumberBaris data yang diekstrak
sheet_namesarrayHeader kolom

Example: Read entire sheet

yaml
path: /tmp/data.xlsx
as_dict: true

Tulis Excel

excel.write

Tulis data ke file Excel (xlsx)

Parameters:

NameTypeRequiredDefaultDescription
pathstringYes-Path to the Excel file
dataarrayYes-Data to write (array of arrays or array of objects)
headersarrayNo-Column headers (auto-detected from objects if not provided)
sheet_namestringNoSheet1Name of the worksheet
auto_widthbooleanNoTrueAutomatically adjust column widths

Output:

FieldTypeDescription
pathstringPath ke file Excel yang dibuat
row_countnumberPath ke file Excel yang dibuat
sizenumberPath ke file Excel yang dibuat

Example: Write data to Excel

yaml
path: /tmp/output.xlsx
data: [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]

Isi Form PDF

pdf.fill_form

Isi field form PDF dengan data dan opsional sisipkan gambar

Parameters:

NameTypeRequiredDefaultDescription
templatestringYes-Path to the PDF template file
outputstringYes-Path for the output document
fieldsobjectNo{}Key-value pairs of form field names and values
imagesarrayNo[]List of images to insert with position info
flattenbooleanNoTrueFlatten form fields (make them non-editable)

Output:

FieldTypeDescription
output_pathstringPath ke PDF yang diisi
fields_fillednumberPath ke PDF yang diisi
images_insertednumberPath ke PDF yang diisi
file_size_bytesnumberJumlah gambar yang disisipkan

Example: Fill form with text fields

yaml
template: /templates/form.pdf
output: /output/filled.pdf
fields: {"name": "John Doe", "id_number": "A123456789", "date": "2024-01-01"}

Example: Fill form with photo

yaml
template: /templates/id_card.pdf
output: /output/id_card_filled.pdf
fields: {"name": "Jane Doe"}
images: [{"file": "/photos/jane.jpg", "page": 1, "x": 50, "y": 650, "width": 100, "height": 120}]

Hasilkan PDF

pdf.generate

Hasilkan file PDF dari konten HTML atau teks

Parameters:

NameTypeRequiredDefaultDescription
contentstringYes-HTML or text content to convert to PDF
output_pathstringYes-Path for the output document
titlestringNo-Document title (metadata)
authorstringNo-Document author (metadata)
page_sizeselect (A4, Letter, Legal, A3, A5)NoA4Page size format
orientationselect (portrait, landscape)NoportraitPage orientation
marginnumberNo20Page margin in millimeters
headerstringNo-Header text for each page
footerstringNo-Footer text for each page

Output:

FieldTypeDescription
output_pathstringPath ke PDF yang dihasilkan
page_countnumberPath ke PDF yang dihasilkan
file_size_bytesnumberJumlah halaman dalam PDF

Example: Generate from HTML

yaml
content: <h1>Report</h1><p>Content here</p>
output_path: /path/to/report.pdf
title: Monthly Report

Memproses PDF

pdf.parse

Ekstrak teks dan metadata dari file PDF

Parameters:

NameTypeRequiredDefaultDescription
pathstringYes-Path to the PDF file
pagesstringNoallPage range (e.g., "1-5", "1,3,5", or "all")
extract_imagesbooleanNoFalseExtract embedded images
extract_tablesbooleanNoFalseExtract tables as structured data

Output:

FieldTypeDescription
textstringKonten teks yang diekstrak
pagesarrayKonten teks yang diekstrak
metadataobjectKonten teks yang diekstrak
page_countnumberKonten teks per halaman

Example: Extract all text from PDF

yaml
path: /tmp/document.pdf
pages: all

PDF ke Word

pdf.to_word

Konversi file PDF ke dokumen Word (.docx)

Parameters:

NameTypeRequiredDefaultDescription
input_pathstringYes-Path to the input document
output_pathstringNo-Path for the output document
preserve_formattingbooleanNoTruePreserve basic formatting
pagesstringNoallPage range (e.g., "1-5", "1,3,5", or "all")

Output:

FieldTypeDescription
output_pathstringPath ke dokumen Word yang dihasilkan
page_countnumberPath ke dokumen Word yang dihasilkan
file_sizenumberJumlah halaman yang dikonversi

Example: Convert entire PDF to Word

yaml
input_path: /tmp/document.pdf

Example: Convert specific pages

yaml
input_path: /tmp/document.pdf
output_path: /tmp/output.docx
pages: 1-5

Parse Dokumen Word

word.parse

Ekstrak teks dan konten dari dokumen Word (.docx)

Parameters:

NameTypeRequiredDefaultDescription
file_pathstringYes-Path to the Word document (.docx)
extract_tablesbooleanNoTrueExtract tables as structured data
extract_imagesbooleanNoFalseExtract embedded images
images_output_dirstringNo-Directory to save extracted images
preserve_formattingbooleanNoFalsePreserve basic formatting

Output:

FieldTypeDescription
textstringKonten teks lengkap dokumen
paragraphsarrayKonten teks lengkap dokumen
tablesarrayKonten teks lengkap dokumen
imagesarrayDaftar paragraf
metadataobjectTabel yang diekstrak sebagai array

Example: Extract text from Word

yaml
file_path: /path/to/document.docx

Example: Extract with tables and images

yaml
file_path: /path/to/document.docx
extract_tables: true
extract_images: true
images_output_dir: /path/to/images/

Word ke PDF

word.to_pdf

Konversi dokumen Word (.docx) ke file PDF

Parameters:

NameTypeRequiredDefaultDescription
input_pathstringYes-Path to the input document
output_pathstringNo-Path for the output document
methodselect (auto, libreoffice, docx2pdf)NoautoMethod to use for conversion

Output:

FieldTypeDescription
output_pathstringPath ke file PDF yang dihasilkan
file_sizenumberPath ke file PDF yang dihasilkan
method_usedstringUkuran file output dalam bytes

Example: Convert Word to PDF

yaml
input_path: /tmp/document.docx

Example: Convert with specific output path

yaml
input_path: /tmp/document.docx
output_path: /tmp/output.pdf

Released under the Apache 2.0 License.