pdfwarez
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revision | |||
| pdfwarez [2025-10-20 18:00:04] – jenda | pdfwarez [2025-10-20 18:01:09] (current) – jenda | ||
|---|---|---|---|
| Line 3: | Line 3: | ||
| ===== General PDF instructions (involves some resampling) ===== | ===== General PDF instructions (involves some resampling) ===== | ||
| + | Work in empty directory | ||
| + | < | ||
| + | mkdir foo | ||
| + | cd foo | ||
| + | </ | ||
| + | |||
| + | Separate to individual pages | ||
| < | < | ||
| - | Separate to individual pages: | ||
| pdfseparate ../foo.pdf separated-%05d.pdf | pdfseparate ../foo.pdf separated-%05d.pdf | ||
| + | </ | ||
| Render to PNG: | Render to PNG: | ||
| + | < | ||
| for f in separated-*pdf; | for f in separated-*pdf; | ||
| + | </ | ||
| OCR: | OCR: | ||
| + | < | ||
| export OMP_THREAD_LIMIT=1 | export OMP_THREAD_LIMIT=1 | ||
| for f in *.png; do t=${f# | for f in *.png; do t=${f# | ||
| + | </ | ||
| Combine originals + OCR text layer: | Combine originals + OCR text layer: | ||
| + | < | ||
| for f in ocr-*.pdf; do t=${f# | for f in ocr-*.pdf; do t=${f# | ||
| + | </ | ||
| Produce final output: | Produce final output: | ||
| + | < | ||
| pdfunite combined*.pdf output.pdf | pdfunite combined*.pdf output.pdf | ||
| </ | </ | ||
pdfwarez.1760976004.txt.gz · Last modified: by jenda
