Prerequisites:
First, create individual PDFs out of these images.
for f in *.ppm; do echo "convert -level 25,95% -quality 70 -density 300 -compress jpeg $f ${f%ppm}pdf"; done | parallel
Next, run OCR engine on the original files and create PDFs with text layer only:
export OMP_THREAD_LIMIT=1 for f in *.ppm; do echo "tesseract -c textonly_pdf=1 --oem 1 --dpi 300 -l eng $f $f pdf"; done | parallel
Now, merge the image layer and the text layer
for f in *.ppm; do echo "pdftk $f.pdf background ${f%.ppm}.pdf output combined-${f%.ppm}.pdf"; done | parallel
And finally merge all the generated pages into one big PDF
pdfunite combined*.pdf output.pdf