Archiving paper documents as PDF files is a great way to save shelf space and preserve essential records.
However, more than simply scanning the documents is required. It would be best if you also used Optical Character Recognition (OCR) to process the scans. Once OCR has processed a PDF scan, the file contains an invisible text version in addition to the scanned image of the document. macOS Spotlight can now index the content, and you can use HoudahSpot to search your document archive.
Scanning paper documents to PDF files lets you archive important (and not so important) documents without filling up cabinets.
Optical Character Recognition (OCR) makes these scanned documents much more useful than their paper originals. Once a scan has been processed by OCR, the PDF file contains both an image of the document and an invisible text version. The text can then be searched using HoudahSpot.
Unfortunately, you will find that not all of your PDF files have text content. You may have forgotten to run them through OCR. Or you may have received the scanned document from someone else.
How can you find these files and rectify this?
With a little trick, HoudahSpot can find PDF files that lack text content. It is safe to assume that any text contains either a space or a period. Thus, we will be looking for any PDF file that contains neither space or period.