pdfwarez
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | |||
pdfwarez [2025-10-20 18:00:04] – jenda | pdfwarez [2025-10-20 18:01:09] (current) – jenda | ||
---|---|---|---|
Line 3: | Line 3: | ||
===== General PDF instructions (involves some resampling) ===== | ===== General PDF instructions (involves some resampling) ===== | ||
+ | Work in empty directory | ||
+ | < | ||
+ | mkdir foo | ||
+ | cd foo | ||
+ | </ | ||
+ | |||
+ | Separate to individual pages | ||
< | < | ||
- | Separate to individual pages: | ||
pdfseparate ../foo.pdf separated-%05d.pdf | pdfseparate ../foo.pdf separated-%05d.pdf | ||
+ | </ | ||
Render to PNG: | Render to PNG: | ||
+ | < | ||
for f in separated-*pdf; | for f in separated-*pdf; | ||
+ | </ | ||
OCR: | OCR: | ||
+ | < | ||
export OMP_THREAD_LIMIT=1 | export OMP_THREAD_LIMIT=1 | ||
for f in *.png; do t=${f# | for f in *.png; do t=${f# | ||
+ | </ | ||
Combine originals + OCR text layer: | Combine originals + OCR text layer: | ||
+ | < | ||
for f in ocr-*.pdf; do t=${f# | for f in ocr-*.pdf; do t=${f# | ||
+ | </ | ||
Produce final output: | Produce final output: | ||
+ | < | ||
pdfunite combined*.pdf output.pdf | pdfunite combined*.pdf output.pdf | ||
</ | </ |
pdfwarez.txt · Last modified: by jenda