The book is completely disbound. All the pages are detached. The corners
of most of the pages are broken off. Paper is very brittle. Here's an image of a typical page.
It is a fairly important book and worth rescuing.
An HP ScanJet 3C.
The scanning software is DeskScan II that came with the scanner. The software setting is: millions of colors, and 150 dpi.
A PowerPC 7100/80, with 24 MB of RAM.
The software includes: PhotoShop 3.0; OmniPage Professional 6.0; Fetch 3.1; Netscape 3.1.
An UltraSparc I, with 256 MB of RAM, running Solaris 2.5.1.
The software is xv 3.10.
The text is scanned in with OmniPage, and saved in html format.
It is then ftped to the web server for further treatment: proofreading
and html formating.
The plates are scanned in with DeskScan II, and saved in tiff format. As, shown in the sample image, the whole page is scanned in. This way, there is no need to preview the page before scanning, since the size and scanning setting are kept the same, saving a lot of time. The scanned page is about 1260 x 1700, with the file size of about 6.4 MB.
The scanned images are then ftped to the web server, where xv is used to crop, and convert the images into jpg format with varied sizes and a resolution of 72 dpi.
This procedure could be done with Photoshop. The reason we chose xv is that it is very easy to use.
|What we learned||
Proofreading is the most tedious and most time-consuming. We knew it
would take a lot of time. That's why we selected a book with as little
text as possible. Still we were buried with it. OCR technology is just
not here yet.
The decision to scan the plates at the 150 dpi resolution is a tough one. Originally we planned to scan at a much higher resolution (300 dpi or higher). But scanning at 300 dpi would have created a file of 23 MBs per plate. It is simply beyond the capacity of our current hardware.
The scanned images take up 830 MB of harddrive space. We are planning
to make two CD-ROMs out of them.