Digital Equipment Corporation - MicroFiche Underground

BitWiz · Sep 19, 2021

I haven't seen many affordable scanners that have a higher resolution than 9600 DPI.

From your numbers, a 9600 DPI scanner might work. I would set the scanner to black and white mode and low contrast (to remove some of the noise).

Joerg Hoppe · Sep 19, 2021

BitWiz said:
I have an EPSON 9600 DPI film scanner (back lighted). Rather than buy a fancy micro-fiche scanner any film scanner of high enough resolution should be able to scan the fiche into a PDF and then OCR the PDF for full text conversion.

9600 dpi could be sufficient, but this needs to be the true optical resolution, not some marketing exaggeration. Can you scan a dense micro fiche (COM listing) and show results?

BitWiz · Sep 19, 2021

The resolution of the CCD is 4800 or 6400 DPI (depending on settings, it switches internal lenses for the different resolutions). It uses Microstep technology (half steps the scanner bar) to achieve 9600 DPI resolution.

Here are the specs on the scanner: https://files.support.epson.com/htmldocs/prv7ph/prv7phug/html/specs_2.htm.

I don't have any microfiche to test it with.

Joerg Hoppe · Sep 19, 2021

Hi,

BitWiz said:
Here are the specs on the scanner: https://files.support.epson.com/htmldocs/prv7ph/prv7phug/html/specs_2.htm.

And here is an independent test (sorry, German).

https://www.filmscanner.info/EpsonPerfectionV700Photo.html

Translation of key phrase: "When scanning an USAF test chart an effective resolution of 2300 dpi results. Thats 40% of indicated resolution. Not bad for a flat bed scanner, but indicating 6400dpi is eye wash. The sensor of V700 may be able to scan this much pixels, but the whole optical system is not high-grade enough to nearly support that resolution. "

I've read similar results on tests of other "6400" or "9600" flat bed scanners.

If anybody knows of a true 9600dpi scanner, please let me know.

Jörg

BitWiz · Sep 19, 2021

My only question is were they in 4800 DPI mode, 6400 DPI mode or 9600 DPI mode.

I have done some extremely detailed scans with this scanner but never anything like microfiche. This would also depend on the software they were using to do the scanning.

I am still willing to try to help you if you want.

DDS · Sep 19, 2021

Joerg Hoppe said:
9600 dpi could be sufficient, but this needs to be the true optical resolution, not some marketing exaggeration. Can you scan a dense micro fiche (COM listing) and show results?

There is a lot of discussion on this and related topics on various photography and film conversion sites with similar frustration levels. A few things stick out to even a casual reader.

The goal tends to be to pick up sufficient detail from the film to be worth the trouble. One runs into the same question over and over. "What is the equivalent dpi of the silver halide grains in the film emulsion?" Simply because you're not going to extract more detail that the film captured. And the answer to the question is "It depends." It depends on the quality of the film, the speed of the film, the care taken in the processing, and quite a few other factors. But the numbers often quoted tend to be in the ballpark of 2500 to 3500 dpi.

.https://www.photography-forums.com/t...-as-dpi.58679/

Something analogous to the Fast Fourier Transform says that you will want to sample each grain at least twice in order to not lose any of what detail may be present on the film. So now you're in the ballpark where a 6400 DPI scanner just won't be good enough. And the DPI spec claimed by the manufactures tend to be more fantasy than reality.

https://www.largeformatphotography.i...3651b84c311fb8

Keep in mind that the original listings supposedly captured on the microfiche were often generated on line printers, not known even then for producing high quality text. Also keep in mind that by the time thought was given to preserving and distributing documents via microfiche there often were no real "originals" to copy. Rather the "originals" used were actually copies of copies and you start to understand why running OCR on something "recovered" from microfiche often gives such poor results.

Al Kossow · Sep 20, 2021

DDS said:
Keep in mind that the original listings supposedly captured on the microfiche were often generated on line printers

Very little was scanned from listings, most is COM (Computer Output to Microfilm)
I've been slowly working through some IBM listings with a high-quality Canon microfiche scanner with 600dpi effective resolution, which
can be seen at http://bitsavers.org/pdf/ibm/system34/S34_Software_Fiche/SSP/
The results have been quite good.

BitWiz · Sep 20, 2021

The Epson as tested was 4 times that, so it might be able to be usable. I don't know, I've never scanned microfiche before.

Joerg Hoppe · Sep 20, 2021

Hi,

seems the "flatbed-scanner-yes-or-no" discussion will never be resolved, it depends too much on the fiches scanned and on the expected document resolution.

I feel there's a large range of acceptable document quality:

- I've seen worse-than- FAX-like scans with < 200dpi, barely readable but they can really save your day.

- Then there's OCRable "bitsavers" quality, rated as 200dpi, like in the attached page from 0069_CNKMCA0_KMV11A.pdf .

Then the same page from my current work, obsessively oversampled and intended to be better than the fiche's resolution (future proof, so I never never never have to do these scan jobs again). It is gray leveled and may have to get downsized before PDFed.

Joerg

Roland Huisman · Sep 20, 2021

Looks like an awesome quality to me!

Sometimes we just have to work with the quality that is available... I would still be happy with the first one if there is nothing else. I've seen documentation which is scanned 20 years ago and stored in a low quality because storage was expensive back then. But I'm very happy hat you are putting so much effort in it to make these MicroFiches available. I have box of them for you as well... Also PDP8 related stuff on MicroFiche

DDS · Sep 20, 2021

Al Kossow said:
Very little was scanned from listings, most is COM (Computer Output to Microfilm)
I've been slowly working through some IBM listings with a high-quality Canon microfiche scanner with 600dpi effective resolution, which
can be seen at http://bitsavers.org/pdf/ibm/system3...are_Fiche/SSP/
The results have been quite good.

My experience is mostly with microfiche from Western Electric (Lucent Technologies) for their ESS products. The #1ESS and #1AESS PR's (program listings) showed both the lack of character alignment characteristic to the Model 35 TTY and its replacement the Model 40 TTY line printer. They also had the lack of character definition that came from printing with a cloth ribbon and the crud that would accumulate in the small nooks and crannies in the type face itself. The results looked a lot like Joerg's first sample above. Joerg's second example above is much easier for a human to read, but take a look at the '#' and '$' characters and imagine trying to recover usable source code from that via OCR.that could then be run through an assembler.

My only personal attempt to recover something useful from DEC microfiche was when I took a set containing the source for the BDV11 ROM to the library and made a hard copy via a Xerox fiche to paper copier. The quality of the images on the fiche varied from page to page and the resulting hard copy looked about like Joerg's first example. Mostly legible, better than nothing, but lots of room for improvement. If Joerg had thought that was good enough he could have, would have stopped there. I'm glad he didn't.

Regardless, Joerg, thanks for your considerable effort and expense incurred in this project and making the results available to the community.

BitWiz · Sep 20, 2021

My experience with scanning text of OCR purposes is to set the scanner for black and white scanning and to set the scan depth to 8 bit or even 4 bit. The goal is to produce what is effectively a binary image, either black or white. Depending on the source material, that threshold between black and white can be adjusted so that the characters are readable and OCRable without adding noise from dirt and other imperfections in the media.

A good OCR algorithm can handle dropping a few pixels from a character better than it can handle extra pixels that are not part of any character. In addition the OCR package I used a few years ago would prompt for all un-OCRable characters after it was done.

That was my experience last time I scanned text for OCR.

If the fiche has images, the images should be scanned with different setting on the scanner for increased clarity. For example you might scan the text as 4 bit gray scale but images could be scanned at 256 or even more levels of gray scale.

Joerg Hoppe · Oct 17, 2021

Milestone, the 1000ths DEC PDP-11 Diagnostic Program Listing was scanned.
Up to now we have 1017 new listings on 1273 fiches with a total of 146910 pages, raw data size is 680GB.
... continuing ...

Roland Huisman · Oct 17, 2021

Great work Joerg!

whartung · Oct 17, 2021

How do you eat an elephant? One bite at a time.

"What's for breakfast today?" "Eggs and smoke elephant strips" "What's for lunch?" "Elephant sandwiches" "What's for dinner?" "Roast elephant."

This is a pachyderm project to say the least.

Terry Kennedy · Oct 26, 2021

Joerg Hoppe said:
Milestone, the 1000ths DEC PDP-11 Diagnostic Program Listing was scanned.
Up to now we have 1017 new listings on 1273 fiches with a total of 146910 pages, raw data size is 680GB.
... continuing ...

Are you looking for mirror sites? I'm in New York, US with a gigabit internet connection.

Joerg Hoppe · Oct 26, 2021

Terry Kennedy said:
Are you looking for mirror sites? I'm in New York, US with a gigabit internet connection.

Thanks for the offer!
Will come back to you when size of final PDFs has settled and we have a quote from Al@bitsavers.

Joerg

Joerg Hoppe · Nov 3, 2021

Hi,

I completed most of the PDP-11 Dignostic Program Listing fiches now:
1366 documents, 1700 fiches, 194584 listing pages. Whoa!

Now PDFs must be made.
After some try-and-error, and for max bitsavers-compatibility,
I'd like to use Erik Smith's "tumble" http://tumble.brouhaha.com/
as it apparently can compress TIFFs very well.
Is there a MS-Windows .exe?

Thanks,
Joerg

Al Kossow · Nov 3, 2021

Joerg Hoppe said:
I'd like to use Erik Smith's "tumble" http://tumble.brouhaha.com/
as it apparently can compress TIFFs very well.

There is none that I know of, and it is fairly challenging to build with the patches and finding the right libraries.
Also, it only supports bitonal TIFFs. It has its own G4 compressor, I should see if anyone ever migrated it to use libtiff.

Joerg Hoppe · Nov 4, 2021

Al,

Al Kossow said:
There is none that I know of, and it is fairly challenging to build with the patches and finding the right libraries.
Also, it only supports bitonal TIFFs. It has its own G4 compressor, I should see if anyone ever migrated it to use libtiff.

Thanks for reply.

After your post and a look into https://github.com/brouhaha/tumble I found it easier to reinvent the wheel.

Enhancing an own PDF-Maker was surprisingly easy, the commercial https://www.imageen.com/ components do all the heavy stuff now.

Joerg

Digital Equipment Corporation - MicroFiche Underground

Experienced Member

Experienced Member

Experienced Member

Experienced Member

Experienced Member

Veteran Member

Documentation Wizard

Experienced Member

Experienced Member

Attachments

Veteran Member

Veteran Member

Experienced Member

Experienced Member

Veteran Member

Veteran Member

Experienced Member

Experienced Member

Experienced Member

Documentation Wizard

Experienced Member