User Tools

Site Tools


pdfwarez

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
pdfwarez [2025-10-20 18:00:04] jendapdfwarez [2025-10-20 18:01:09] (current) jenda
Line 3: Line 3:
 ===== General PDF instructions (involves some resampling) ===== ===== General PDF instructions (involves some resampling) =====
  
 +Work in empty directory
 +<code>
 +mkdir foo
 +cd foo
 +</code>
 +
 +Separate to individual pages
 <code> <code>
-Separate to individual pages: 
 pdfseparate ../foo.pdf separated-%05d.pdf pdfseparate ../foo.pdf separated-%05d.pdf
 +</code>
  
 Render to PNG: Render to PNG:
 +<code>
 for f in separated-*pdf; do t=${f#separated-}; echo convert -density 300 $f rendered-${t%.pdf}.png ; done  | parallel for f in separated-*pdf; do t=${f#separated-}; echo convert -density 300 $f rendered-${t%.pdf}.png ; done  | parallel
 +</code>
  
 OCR: OCR:
 +<code>
 export OMP_THREAD_LIMIT=1 export OMP_THREAD_LIMIT=1
 for f in *.png; do t=${f#rendered-}; echo tesseract -c textonly_pdf=1 --oem 1 --dpi 300 -l eng $f ocr-${t%.png} pdf; done  | parallel for f in *.png; do t=${f#rendered-}; echo tesseract -c textonly_pdf=1 --oem 1 --dpi 300 -l eng $f ocr-${t%.png} pdf; done  | parallel
 +</code>
  
 Combine originals + OCR text layer: Combine originals + OCR text layer:
 +<code>
 for f in ocr-*.pdf; do t=${f#ocr-}; pdftk separated-$t background ocr-$t output combined-$t; done for f in ocr-*.pdf; do t=${f#ocr-}; pdftk separated-$t background ocr-$t output combined-$t; done
 +</code>
  
 Produce final output: Produce final output:
 +<code>
 pdfunite combined*.pdf output.pdf pdfunite combined*.pdf output.pdf
 </code> </code>
pdfwarez.txt · Last modified: by jenda

Except where otherwise noted, content on this wiki is licensed under the following license: Public Domain
Public Domain Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki