what is the best way to archive a service manual?

icono · Aug 16, 2022

Super excited! I just got my hands on a copy of a service manual I haven't been able to find online and I would like to archive it for more people to use

It is a Technical Reference manual for the "LASER TURBO XT, LASER TURBO TX/2, LASER TURBO XT/3"
as well as the Technical Reference manual for for the LASER Multi-I/O Card

Both manuals include full board schematics and Parts list
The manual for the I/O board includes a PCB layout as well as some data sheets for some of the chips on the board.

I'm experimenting with different DPI and color settings(the manual is all monochrome) the schematics have some very fine print that its not picking up very well

modem7 · Aug 16, 2022

I scan to PDF.

Sometimes, I make multiple PDF files, then join them into a single PDF file. Examples:
* Cover is in colour, but insides are black & white. So, I scan the cover in colour creating a single PDF. I scan the insides as black & white creating a single PDF. Join the PDF's.
* Some pages (e.g. schematics) require higher scan resolution compared to the other pages. ...

T-R-A · Aug 17, 2022

modem7 said:
I scan to PDF.

Sometimes, I make multiple PDF files, then join them into a single PDF file. Examples:
* Cover is in colour, but insides are black & white. So, I scan the cover in colour creating a single PDF. I scan the insides as black & white creating a single PDF. Join the PDF's.
* Some pages (e.g. schematics) require higher scan resolution compared to the other pages. ...

Second that. PDF is the most common format available and for even more safety I make 2 digital copies and also email them to myself on 2 separate email accounts.

Minus zeros' (Modem7 - above) site is considered one of the best should you wish to share.

icono · Aug 17, 2022

Should I delete blank pages(like the back of the schematics) or leave them in so it would print correctly?

Ruud · Aug 17, 2022

Just leave them. I don't know if the pages are numbered but if they are, you will get questions about the missing pages.
FYI: I do as Modem7 said. Color: 200 DPI, B&W: 300 DPI. And schematics at higher resolutions, if needed.

Please let us know if you scanned everything. I'm always looking for schematics. TIA!

booboo · Aug 17, 2022

May I suggest it be hosted at archive.org?

SomeGuy · Aug 17, 2022

I normally scan such documents with the idea that it should be possible to print the results and have something look good and readable.

This usually involves adjusting various contrast/brightness settings until the black print is mostly true black (printers can badly dither grey text or graphics) and the background is white. (no need to preserve every grain of the paper or yellow water stains).

Also, deskew the pages but keep an eye out for errors - images with slanted text can confuse deskewing software.

That does mean including blank pages. If printing double sided, then ideally all the pages should match up with the original.

Personally, I use a process of scanning to 600DPI color TIF or PNG files and then use ImageMagic (mogrify) to adjust most of the image, then import the results to OmniPage, OCR the contents, and save to PDF. OCRing the content is important is it lets people easily search for keywords within the document.

I always sanity check the final results to make sure all pages are present, and there are no horrific errors such as pages getting cut off, upside-down pages, unreadable figures, or such.

Foldouts such as schematics can be a bit tougher as those may have to be stitched together from multiple images. With schematics, keeping small details can be important. What I have sometimes done is make a PDF file, but include copies of those pages in external PNG files as well.

Ruud · Aug 18, 2022

SomeGuy said:
.... then import the results to OmniPage, OCR the contents, and save to PDF. ....

In 2006 I bought a Brother printer with scanner and, most important, built-in sheet feeder. Then I started to scan almost everything I had. I did this by cutting the back of the magazine or book and using the sheet feeder to scan the now loose pages. "Almost" because I skipped the, IMHO, rare ones.
I also tried to OCR some of them, preferably the ones with source code, but in general that was a big failure.

It could be interesting to give that stuff another try. So my question: what are your experiences with OCR software?

Trixter · Aug 18, 2022

My method: https://trixter.oldskool.org/2020/07/14/how-to-reasonably-archive-color-magazines-to-pdf/

daver2 · Aug 18, 2022

booboo said:
May I suggest it be hosted at archive.org?

Please let @Al Kossow know and he will upload it to bitsavers. Have you actually checked bitsavers to make sure it isn't actually there?

Dave

NeXT · Aug 18, 2022

I know a few great methods have been already mentioned above but this thread is giving me PTSD from the countless arguments that have been on CCTALK over the years.

SomeGuy · Aug 18, 2022

Ruud said:
It could be interesting to give that stuff another try. So my question: what are your experiences with OCR software?

The key thing with OCR is that you always want "images on text". That is, what you see on the screen is the (cleaned up) scan of the page. OCR *WILL* get things wrong, so you always want to see what was actually where.

You have to consider how the OCRed text will be used. Primarily it is for quick searching, and perhaps copying and pasting a small selection of text.

I don't usually bother spell-checking or changing text/graphics bounding beyond the first couple title pages. Once and a while I have to mark "figures" on a page as graphics rather than text. For example, OCR tools may try to interpret a grainy photo figure of a spreadsheet and produce piles of puke. So I may manually mark that figure as a graphic to prevent that.

OCR won't usually recognize symbols used in manuals, for example a manual might repeatedly use a bent arrow symbol within their text to represent the "Enter" key, that then gets OCRed as "L" or some such. However, some OCR software allows "training" of symbols and mapping to Unicode.

So, it's not good for copying code, unless you want to manually proof read and correct everything.

The quality of the OCR job heavily depends on the clarity of the image you feed to the program. In some cases OCR programs may not even see grey or light colored text. They also need the text to be properly de-skewed. Which can get annoying if the print isn't actually straight on the original page. This is why I process the images before feeding in to an OCR/PDF generator.

Again, I like to have the background mostly pure white, the center of the text mostly pure black, but the edges of the characters still shades of grey so they don't look jagged on the screen. That usually gives the best OCR results.

Trixter · Aug 18, 2022

SomeGuy said:
The key thing with OCR is that you always want "images on text". That is, what you see on the screen is the (cleaned up) scan of the page. OCR *WILL* get things wrong, so you always want to see what was actually where.

Seconded. I deliver everything in PDF so that the original images are there, but are also searchable thanks to OCR. If someone doesn't like the OCR, they can re-do it using a variety of methods. (And I save my original .tiffs, so that when storage and bandwidth are effectively free in 20 years, I can just upload those somewhere.)

icono · Aug 19, 2022

The its auto-deskewing on my scanner seemed to have some trouble with the blank pages. So I Deleted the Scans of the Blank Pages and Replaced them with actual Blank Pages in the PDF. It did a good job otherwise.

My wife had a few weeks left on her adobe pro sub subscription and I was able to use Adobe Pro to flip all the pages the right way around. run OCR, and add a ton of bookmarks.

All told it comes to 428 pages and 13.7 MB I tried compressing it more but the schematics became even more unreadable (they are not that readable to begin with even in the paper manual)

Ruud · Aug 19, 2022

SomeGuy said:
You have to consider how the OCRed text will be used. Primarily it is for quick searching, and perhaps copying and pasting a small selection of text.

In most cases I'm interested in the source code published in the books/magazines. But what is quite annoying, in quite some books these sources are displayed in a font that mimics the output of a matrix printer, something that wasn't understood at all by the OCR program I used at that time.

SomeGuy said:
The quality of the OCR job heavily depends on the clarity of the image you feed to the program. In some cases OCR programs may not even see grey or light colored text.

Most of the time I scanned in B/W, 300 DPI. That ended up in 30-40 KB for a A4 page. Color would end up in 1 MB for 200 DPI. Almost all books of the 80's and 80's were B/W. Grey-scale pictures were scanned as B/W as well and most of the time that worked out good enough. If a picture contained vital information, I scanned it again in grey-scale.
I used (and still use) PaperPort to organize the various output and to turn it into a PDF at the end.

SomeGuy said:
They also need the text to be properly de-skewed. Which can get annoying if the print isn't actually straight on the original page.

Hmmm, a part of my material is skewed. Reason: wear and tear of the sheet feeder mechanism. I gave that Brother away to a student who needed the printer part mostly and bought a new one. One that could scan A3 as well. Before I had to scan A3 schematics as two A4 JPGs and then had to use a program, Micrografx, to merge those parts. In the end I was so fed up with that, that I skipped all those manuals etc. that contained A3 pages. Philips was notorious for using A3 schematics inside their A4 manuals.

VileR · Aug 19, 2022

What's a decent OCR solution that doesn't cost an arm and a leg? I've used PDF-Xchange Editor where this is one of the (few) fully-supported features in the free version. Works well but not all that tunable.

icono · Aug 19, 2022

For anyone interested I emailed the manuals to modem7 and the are now up on the site under VTech.

exidyboy · Aug 21, 2022

VileR said:
What's a decent OCR solution that doesn't cost an arm and a leg? I've used PDF-Xchange Editor where this is one of the (few) fully-supported features in the free version. Works well but not all that tunable.

You could look at https://en.wikipedia.org/wiki/Tesseract_(software) - I know people who are having some success with dot matrix fonts as well which other tools really struggle with.

Ruud · Aug 21, 2022

I have tried several OCR programs, free ones as well as demos. In this case I skipped the ones like Abby Reader for which you have to pay par month. It seems that Readiris 17 does the the job. Just a bit more testing and I will buy it.

what is the best way to archive a service manual?

icono

Member

modem7

10k Member

T-R-A

Veteran Member

icono

Member

Ruud

Veteran Member

booboo

Experienced Member

SomeGuy

Veteran Member

Ruud

Veteran Member

Trixter

Veteran Member

daver2

10k Member

NeXT

Veteran Member

SomeGuy

Veteran Member

Trixter

Veteran Member

icono

Member

Ruud

Veteran Member

VileR

Veteran Member

icono

Member

exidyboy

Experienced Member

Ruud

Veteran Member