from xhtml2pdf import pisa from io import BytesIO def html_to_pdf(html_string: str): pdf_buffer = BytesIO() pisa_status = pisa.CreatePDF(html_string, dest=pdf_buffer) pdf_buffer.seek(0) return pdf_buffer.getvalue()
ocrmypdf --output-type pdfa --pdfa-version 2 --compress jpeg --optimize 3 input.pdf output_pdfa.pdf Combine with file watcher (watchdog) to auto-convert any incoming PDF. from xhtml2pdf import pisa from io import BytesIO
Add table of contents page programmatically using reportlab (Pattern #9) before merging. Pattern #6: Splitting & Cropping (Optimized) The Impact: Splitting by bookmark (outline) or page range is trivial, but cropping PDFs to a specific region reduces downstream processing. | Library | Best For | Verification Status
| Library | Best For | Verification Status | | --- | --- | --- | | | Speed, rendering, annotations, complex edits | ✅ Verified (Patterns 1-4) | | pypdf | Pure-Python merging, splitting, rotation | ✅ Verified (Patterns 5-6) | | pdfplumber | Text extraction with layout preservation | ✅ Verified (Patterns 7-8) | | reportlab | Programmatic PDF generation from scratch | ✅ Verified (Patterns 9-10) | | ocrmypdf | OCR + searchable PDFs | ✅ Verified (Patterns 11-12) | from xhtml2pdf import pisa from io import BytesIO
By: Senior Dev Tooling Architect Published: 2025 • 12 Verified Methodologies