• Please review our updated Terms and Rules here

Best way to scan documentation ?

SunDown79

Veteran Member
Joined
Sep 8, 2005
Messages
783
Location
Netherlands
What is the best way to scan documentation, magazines etc ?
Asking because at my work I have access to some xerox multi functionals (monochrome and colour ones) that scan pretty descent (directly to pdf or multipage tiffs, automatic dividing of book pages etc). I usually go for 400dpi but I can go upto 600, is higher always better ?
 
For monochrome pages, I scan at 600 DPI for the quality.

For greyscale and colour pages, I will often go down to 400 DPI, sometimes 300 DPI, else the resulting document gets too large. I do that though on a case by case basis.
 
I have access to some high volume (high $$$$, seriously I paid a fraction of what they paid for one scanners on my last car, lol) Fujitsu scanners at work, and I have scanned a few manuals on them, I always scan at 600dpi, if the resulting file is too large, I will usually tweak after the fact, that way I still have a high quality original scan, but I can distribute/share a smaller version too.

I have scanned a few several-hundred page (spiral or ring bound - removed) manuals in a matter of minuets, pretty handy scanners.
 
From http://bitsavers.org/:
The preferred form for any contributed text scan is as a collection of lossless
Group 4 fax compression (ITU-T recommendation T.6) images saved as TIFF
files with a minium scan resolution of 400 dpi.

Lower scan resolutions produce noticable artifacts if a page needs to be
straightened in post-processing.

Lossy compression formats, such as JPEG, should NEVER be used to save pages
of text, since the compression format destroys edge resolution and contrast
would make it difficult to OCR in the future.
Of course, magazines might have high non-text to text ratio...
 
Here's my way of scanning mags & books

Here's my way of scanning mags & books

attachment.php

Above I'm scanning in the American Heritage hardbound Civil War book that runs to 640 pages.

This may or may not be relevant to your use, because it necessitates destroying the magazine you wish to scan, but the information is sound and from my own experiences. I had a lot of old magazines lying around - some musty smelling, and they all took up needed shelf space. Doing some research I found that since I was going to be scanning in complete magazines and I wanted it to be as easy as possible, I was going to need a scanner with a document feeder. My Lexmark X2470 flatbed scanner wasn't going to hack it - far to troublesome and slow for doing over 100 pages per magazine, sometimes even 200 pages. I also already had an HP Officejet 4215, which has a 20 page document feeder and I tried using that and it worked fairly well (that's the one in the picture above), but I can tell you one thing you want when you're scanning in hundreds of pages is speed. It's a boring job even with the automation of a document feeder. I did some more research and found that the Lexmark X4270 had a 30 page document feeder and literally popped out pages much faster than the HP Officejet 4215, and most others unless you wanted to spend the big bucks. I checked eBay and won an auction for a nice Lexmark X4270 for $20. with another $13. for shipping. I didn't care that it's ink carts were empty or that it had some silly fax phone thing attached - I just wanted the scanner part. It turned out to be a great buy. I scanned old computer mags, car mags, Popular Electronic mags, and even pulp Sci-Fi mags at a very good clip. I just sat there and stacked the pages as they popped out of the document feeder. I used the above pictured HP for quite a while and it does the same, just a little slower than I'd prefer. I used Irfanview to scan with - under Windows 7 64bit and used it's multiple page, auto file saving starting at page 1 and having auto numbering at 3, 5, 7, 9 and then when the mag was done I fed the other side in as 2, 4, 6, 8 etc. You do need to pay attention to get the page sequence right, but when it's cranking out the pages at a whole mag in about 45 minutes it's not that bad a chore. I did well over a hundred magazines this way. Now the Lexmark X4270 sits in the closet in case I ever need to do more quick mag scanning. I have all digital copies of my magazines and I just pitched the originals since I had to tear them apart to scan them. Some, many people in fact, may balk at pitching original magazines in favor of digital copies but I'm not one of them. I have backups on 2 separate hard drives. I prefer the digital copies. Especially when I can sit in my easy chair and read the mags with my Tablet PC. I converted all the scanned pages into single magazine files as CBZ comic book reader files for convenience. It works for me :)
 
Nice setup... this makes me feel stupid, but I never thought of doing it with an auto-page feeder. How does this compare so far as rotations, etc? Does it scan straight? How about bleed-through? If I'm going for preservation scans, I'm wanting to do as little post-processing as possible.

Currently, I scan on the Canon business copier/printer at work. It stores everything on a hard drive, and outputs either TIF (to 64-color, a waste for magazines and whatnot), greyscale (great for text docs), or JPG (what I have to do for all magazine/color scans). Of course PDF is an option as well, but it doesn't snag things as straight as I'd like.

Come to think of it, there is an auto-feeder on this thing, but as it's a walk-up scanner, it autonames files based on its own internal rules. I'd have to rename all of the files one-by-one. Not the most arduous task with some of the Renamers available for Windows, but still a bit cumbersome. Still... the time I save in manually turning pages and whatnot might be worth it. Assuming that the bleed-through is non-existent.
 
Here's the thing. With a document feeder you don't have all the activity you would have with a flat glass scanner. I load 30 pages in the feeder, set my Irfanview (which will work with many different scanners - you don't have to use the software that came with your scanner you know.) Anyway, I set my Irfanview Acquire dialog this way for example -

attachment.php


Now the file naming is automatic as is the page numbering. The pages here will be called 198309 Creative Computing - for Sept of 1983 issue and the page numbering will start at 1 and increment to 3, 5, 7, etc. As each page is scanned and sent to the output tray I pick it up and turn it over stacking for the flip side. When the magazine is done I can scan the other side and I simple change the dialog box starting page to 2 and they will then number 2, 4, 6, 8, etc. I use 3 digits to insure a proper page sort. When the magazine is done I will have my Creative Computing magazine pages in proper order and numbered right. Then I simply rar the pages into one big file and with Comic Book reader installed I change the rar extension to cbr and I have my book.

attachment.php


As you can see in the following picture there can be some bleed through if the following page has some dark areas, but I haven't found it to be objectionable at all. It really depends on the magazine. Some have some pretty thin pages, but this is about as bad as it gets, and that ain't bad :)

attachment.php


As for scanning straight and not slipping two pages through - I make sure I cut the glued edges off with scissors on the pages I've torn apart from the magazine so there isn't a 'thicker' side.

Incidentally, I bought all these Creative Magazines from Erik through the Marketplace a couple years ago.
 
As you can see in the following picture there can be some bleed through if the following page has some dark areas, but I haven't found it to be objectionable at all. It really depends on the magazine. Some have some pretty thin pages, but this is about as bad as it gets, and that ain't bad :)
Thumpnugget scanned a lot of Byte magazines over at the Atariage forum, in one posting he mentioned how he removes bleed through (although he doesn't go into details):
http://www.atariage.com/forums/topic/167235-byte-magazine/page__view__findpost__p__2183275

"I use the Fujitsu ScanSnap S510 which is a double sided auto-feed scanner. The scans are saved into a PDF which I then separate into TIFFS and use PhotoShop to clean up (my own small macros that remove some of the bleed through from the other side of the page and convert non color pages into greyscale). I use Acrobat Pro to reassemble the Tiffs back into a PDF. The OCR process straightens out the pages."

-Tor
 
Thanks for the explanation, Vint. I'll keep it in mind, as there are some magazines that I may scan in the future (as they'd be going to the recycle bin afterwards, I've no problem cutting the binding edge off with the industrial cutter at work)

For my software/video-game preservation thing, however, bleed-through isn't something I'm going to accept, nor am I willing to destroy my manuals for the efforts... Guess it's back to the tried-and-true (time-consuming) method!!
 
My local library has a collection of PC Magazines. I'd like to scan them in situ, and I can't dismember them. Would a hand-scanner be a viable option?
 
My local library has a collection of PC Magazines. I'd like to scan them in situ, and I can't dismember them. Would a hand-scanner be a viable option?
Probably, but you don't see hand scanners much anymore.

I have seen people build elaborate stands for using Digital SLR cameras and some lights to scan antique books without ruining them, same type of rig should work for magazines. I would think a modern DSLR would beat an outdated hand-scanner.
 
You can create a document viewer to scan your documentation . The sanner can Customize your document viewer interface ,Quickly open a multi-page document from a the local file.That's really amazing after i have try.
 
Back
Top