To reduce clutter and improve the efficiency of my home-office, I like to digitize all my “important” documents. To accomplish this task, I use a HP All-in-one scanner, an open sourced document management software called KnowledgeTree, and scanimage(1) from the SANE project.
It turns out that scanimage(1) is a fairly cumbersome command line program to use. I opted to avoid graphical scanning software because I don’t want to be clicking around with the mouse when scanning large number of document sets. It turns out that I use a consistent set of options for all my document scanning purposes. The only options that to vary are the number of pages and whether the document set is single or double sided.
For the longest time, I was very happy with my simple bash script to store my command options to scanimage(1) and I would “manually” set any arguments (e.g. number of pages). This worked pretty well for single sided documents. After some time, I finally had it with the headache of modifying the command options manually when I wanted to scan double side documents and decided to write a python script that would wrap all this knowledge into a script. I call my script scan_pages.py. It has very simple usage:
scan_pages.py [-d | –double] total-pages
The script does three things. First, it manages the “–batch-count” and “–batch-start” command line options depending upon the parameters of the scan job. Also, it automatically calculates the number of pages that are to be scanned for a two-pass double sided scan job. It is only required to specify the ‘-d’ flag and the total number of pages, both front and back, that are to be scanned. Lastly, the script will use the convert(1) command to trim any whitespace contained in the image that is not part of the original document.
If you use scanimage(1) you may find this script helpful. Let me know if you like it. You may download it here.