...the resolution does not matter as long as it's both legible by eye and your average OCR engine has no issues.
So just don't worry about it? I think that archivists might have issues with that. Keep in mind that, while I agree with you on the PDFs I'm producing for my own use, I'm aiming to produce the best original images possible with my equipment (consistent with scanning with reasonable speed and convenience) for others to process if they're interested in getting better copies.
Again, unless you are saving items such as marketing literature, posters or detailed diagrams...
There are definitely detailed diagrams in some of the work I'm scanning. Particularly schematics. You've mentioned SAMS; I've seen plenty of those and similar where it's been
very frustrating trying to figure out what's on a schematic.
Oh, and it's probably worth mentioning that a fair amount of the material is Japanese, which simply seems to need higher resolution than English, on average, to get decent OCR. (This is probably not surprising, since it has to tell the difference between characters such as 諸 and 諳.)
...there is nothing wrong with the JPEG format. The next person who is going to read this PDF very likely does not care if there's a bit of compression noise. They do not care about the grain of the paper.
Well, again, see my first paragraph above. And also, you didn't mention the effects on processing, where it seems that
@Al Kossow at least disagrees with you.
Output images are tweaked in Gimp 2.6...
Ouch! I definitely don't want to be touching Gimp (unless it's via an automated script). Remember, there are two things I'm trying to achieve here:
- The best possible quality of source image for someone else to tweak in Gimp or whatever they wish.
- "Good enough" quality PDFs for my own reference until someone uses those source images to make something better.
....sorted with Adobe Acrobat which also de-skews and ADF's the document if I wish.
What is ADF?
ImageMagick is a command-line tool. It can perform the same operation on multiple files in one command. Multiple commands can be placed in one batch file.
Yup. I'm familiar with ImageMagick in general, and use it for some basic operations (such as format conversion) daily. I'm not familiar with how sophisticated one can get with the scripts, though.
I normally keep one batch file with the common commands I use....
I use a flat bed scanner and a document scanner.
Yeah, so it sounds as if you don't have some of the processing requirements I do: in particular, identifying the locations of the pages in the image (which vary from image to image, because it's camera), decurling, and finger removal.
If one were consistent about placing the manual so the pages split in the same place, then an ImageMagick batch file could theoretically split out pages.
Yeah, I don't think I can get that consistent with my scanner. It's going to require something that can identify where the page is.
I would, however, be weary of software that tries to automatically remove anything from an image - they can get things wrong and remove content.
It's not a big deal here; I have and upload all the original images with the processed PDF, so if something is removed in making the PDF, it's still there in the original. And if it's serious, I don't mind reprocessing a page or two by hand here or there to fix it; I just don't have the time or inclination to do hundreds (or even thousands) of pages by hand.
I've said all that to back up what I say next: for document scanning of things other than text, the interaction between the scan bar or CCD/CMOS sensor pixel pattern and the specific screen used when halftoning the colors or the grayscale can produce significant moire and other quantization noise artifacts.
I don't really understand that, but....
Scan at as high of a resolution as possible, then de-screen the scan before down sampling to PDF.
Yeah, probably not an issue for me. So long as I get the original image as good as I can, and the PDF I create is readable, I can leave it to others to create better PDFs as and when they find the inclination.
But you can also build a jig to help if your camera supports scanning in a different direction from "directly overhead" and you can add lighting well to the sides in all directions so reflective surfaces are more evenly illuminated, and you can use things like sheets of plastic or glass to hold pages flat.
Most of what I'm scanning is books on matte paper, so reflections haven't been much of an issue, though I do have a diffuser I often enough use for the covers.
But unfortunately a scanning jig, such as the one in the video, is out of the question: something that size would involve selling off a half dozen vintage computers to make the space to store it. (I live in Tokyo, in 25 m².)
But actually the glass idea might be worth trying; it would at least get my fingers off of the page edges if I can manage to handle the reflection issues. (Unfortunately it still doesn't help with the need for decurling, since I have to lay the book flat.)