Scanning paper documents to PDF files lets you archive important (and not so important) documents without filling up cabinets.
Optical Character Recognition (OCR) makes these scanned documents much more useful than their paper originals. Once a scan has been processed by OCR, the PDF file contains both an image of the document and an invisible text version. The text can then be searched using HoudahSpot.
Unfortunately, you will find that not all of your PDF files have text content. You may have forgotten to run them through OCR. Or you may have received the scanned document from someone else.
How can you find these files and rectify this?
With a little trick, HoudahSpot can find PDF files that lack text content. It is safe to assume that any text contains either a space or a period. Thus, we will be looking for any PDF file that contains neither space or period.
MacOS file metadata holds many useful and diverse file facts. This is not limited to file size, extension or creation date. Depending on the type and the application that created it, there is a whole lot of information available on a file’s properties.
Sender name, duration, f-stop number – and more
For e-mail messages, this can be sender name, e-mail subject and attachment type. Or duration, bit rate and musical genre for audio files. In photos, you can even find information on camera model, f-stop number and exposure time.
This metadata is useful when organizing files by specific criteria or searching for files with certain properties. E.g. images with a resolution higher than 72 dpi, audios shorter than thirty seconds, or e-mail messages containing PDF attachments.
Inspecting file metadata using HoudahSpot
But what metadata is available for a specific file type? How is it labeled? And what kind of information does it contain? HoudahSpot can help you find out.
HoudahSpot searches always go into subfolders. For example, when you search in your home folder, you can find letters saved to your Documents folder.
When you don’t want files from a subfolder to clutter search results, excluding the subfolder is easy: just drag the folder from the breadcrumb path at the bottom of the HoudahSpot window to the Locations/Exclude list.
You can repeat the procedure to exclude more folders. But if you want to see only results from the top level folder, it is easier to use the path filter to simply hide results from nested folders.
In Mac OS X’s Finder, you can save searches as “Smart Folders”. These give you quick access to all your files that meet certain criteria. For example, all Microsoft Word files modified this month. All JPG photos taken with a specific camera. Or all e-mails you’ve received from certain senders within the last seven days.
Because Mac OS X Smart Folders are actually saved searches, they differ from regular folders: they don’t actually hold anything – they only list files stored elsewhere. The content of Smart Folder is not static but dynamic. It is updated continuously as new files come to meet the smart folder’s search criteria. This means that its content changes every time files on your Mac are added, changed, or removed.
With HoudahSpot (4.1 or later), you can easily set up a search and export it as a Finder Smart Folder.
In HoudahSpot, snippets let you to set aside frequently used combinations of search criteria. HoudahSpot comes with a few pre-installed snippets, but you can also define your own.
Snippets can hold a single search criterion or a group of criteria that serve a certain purpose. The pre-installed snippet “Date Created range”, for example, holds two criteria in an “All of the following are true” group: “Content created before” and “Content created after”. Use this snippet whenever you want to find files created in a range of dates.
Sometimes, you get a list of file names and need to find the actual files. You may be a photographer who sent out a contact sheet for photo proofing and got back an email listing the images to be printed. Or you may have sent friends messages with tiny photos and got a reply asking for larger copies of images 7359, 7365 and 7366.