DIzzIE's Scanning Tutorial (on using FineReader)
BY: DIzzIE [antikopyright 2003]

Intro.
This is a quick tutorial on how to scan content (i.e. books,
magazines, pamphlets and the like) using the popular software called
FineReader. I'm well aware that there's a couple tuts already out on
this, but they all focus around scanning fiction; that is to say
they're all focused on OCRing text from text-only books, or
conversely on perfecting images, i.e. scanning comic books. This
guide will briefly discuss working with raw images (as well as touch
on OCRing and the basic features of FineReader).

0. Naturally, you first need a scanner. You can pick one up cheap at
pawnshop, salvation army type store, or from a friend or nearby
library. You may also want to try scamming a scanner:
dizzy.ws/kodak.htm . Alternatively, if you have a high quality
digital camera some would suggest simply taking snapshots of pages.

1. Get ABBYY FineReader OCR Professional 7.0 (the latest version at
the time of this guide). Download here: download.com.com/3000-2079-
10228095.html?tag=lst-0-1 or fill in your e-mail and get download
link emailed to you here: download.abbyy.com/content/default.aspx

2. Download a keygen to register your try-n-buy version here:
allcracks.net/html/a-1.html or find more places to download here:
dizzy.ws/serials.htm

3. Once you install/run keygen, connect your scanner to your
computer and run FineReader (FR).

4. First thing to do is go to Tools > Options, under the Scan/Open
Image Tab if your scanner is not automatically listed in the TWAIN
Driver box, click on Select Source. If nothing is showing up, this
means FR can't detect your scanner. You should make sure the scanner
is connected, turned on, and that you have the latest drivers. Go to
the scanner manufacturer's website to download the latest drivers for
your scanner. After this restart your computer. If after updating
drivers/checking connection FR is still not picking up your scanner,
try running the default software that came with your scanner. If even
that does not work, contact your scanner's manufacturer.

5. If FR recognized your scanner in Step 4, that is if you can see
your scanner's name in the TWAIN Driver box, then select the Use
FineReader Interface radio button, not Use TWAIN-Source interface. If
however in Step 4 you could only get your scanner to work with its
default program, then keep the TWAIN-Source interface button checked.

6. If you are using the TWAIN-Source for scanner settings (the
default program that came with your scanner) you will need to
configure the same things I describe in the next steps in the default
program for your scanner. Using the TWAIN-Source software is not
recommended, only if you could not get FR to recognize your scanner.

7. Still in the Scan/Open Image Tab, click Scanner Settings.

8. Here you can configure a variety of options; these will vary
depending on what you are scanning. A few guidelines:
*Unless you are scanning something that is very light printed, the
Brightness should be kept on Automatic (default), with the slide bar
in the middle of the light/dark spectrum bar.
*Paper size should be changed to match the dimensions of whatever
you are scanning. This saves time in that you don't have to wait for
the scanner bar to go all the way to the end for every scan. Also
saves us the trouble of splitting excess image blocks later on.
*Pause between pages is how long you want the scanner to wait before
automatically scanning the next page. 5-10 seconds should be
sufficient.
*The Resolution should be at a minimum of 300dpi, moving upwards to
600dpi if what you are scanning is in small print/detailed pictures.
*Pictures Scanning Mode should be color if you're scanning color
images (magazines, book covers, etc), or grayscale if you're scanning
text/b&w pictures. The black & white mode is not recommended as it
produces grainy poor-quality images.
*Unless you want to see the Scanner Settings dialog every time you
scan a page, uncheck Show This Dialog Before Scanning.
*If you have a feeder scanner (versus a flatbed), that is if you
feed pages into your scanner like a fax machine versus lying them
down on the scanner like a copy machine, you may want to select Use
automatic document feeder (doesn't work for all feeder scanners).
*Finally hit OK to exit out of Scanner Settings

9. Now let's configure a few more things in the Options menu again.
Still in the Scan/Open Image Tab, select the following options:
*Despeckle Image
*Split Dual Pages (optional, more on this in a little bit)
*Detect Image Orientation (during recognition)
*Open Image During Scanning

10. Under the Recognition Tab select the following options:
*Recognition Language: obviously make sure it's set to the language
the content that you're scanning is in
*Autodetect Layout
*Clear Background Noise
*Autodetect (print type)
*Do not use user patterns

11. Under the Formatting Tab select the following options:
*Retain Full Page Layout
*Keep Pictures

12. Finally hit OK to exit out of the Options menu. Feel free to
look at any other options and modify them as you wish, most are self
explanatory and if not FR has a great help file (just hit F1) or
download an additional FR tutorial from the manufacturer:
download.abbyy.com/content/default.aspx

13. One more thing that needs to be changed: go to Process and
select Start Background Recognition.

14. Before you start scanning, clean your scanner (if it's a
flatbed) with some window cleaning solution, or just soapy water, use
a window cleaner if possible to avoid streaks, or a towel with even
swipes to avoid leaving streaks. Once your scanner is clean and dry
proceed to step 15.

15. Now then, onto scanning. Position the material onto the scanner
and hit the Scan&Read button. You should see a "collaborating
scanner...." Pop-up window followed by a ScanGear progress bar. The
image should then be scanned. Wait for the automatic recognition
process to finish and then you can work on the image.

16. Let's look at the image you scanned. You should see a thumbnail
picture of the image on the left-hand menu, a larger picture in the
middle menu, and any recognized text on the right hand menu. The
middle "image" window is where we'll be looking at next.

17. You need to make some decisions about how you want your finished
scan to look: do you want it OCRed (optical character recognition),
meaning that the words will be converted to text. The upside of
OCRing is that your finished product will be smaller in terms of file
size, it will be searchable for specific words, and it will be easier
to read. The downside is that it takes more time to produce an OCRed
text because it will require at least minimal proofreading of the
text to root out any OCRing mistakes. OCRing is thus recommended if
you're scanning a largely text-only book, have sufficient time on
your hands to proofread the scan, and are not doing a precise text
that involves important formulas/calculations. If you're scanning a
magazine, comic book, or a scientific text with precise formulae,
OCRing is NOT recommended.

18. In the Image menu, you should see a list of button on the left-
hand side. The two we'll be working with are the OCR (text) button,
and the Image button. They are the 2nd (The green-bordered T button)
and 4th (the red-bordered mountain button) buttons from the top,
respectively. Briefly, you select text blocks (that will be OCRed)
and image blocks (that wont) and then hit the Read All button. But
before you do this, there are a few things you need to do first.

19. If you had automatic image splitting enabled and FR didn't split
the scanned images the way you want, or you want to get rid of excess
borders and such, go to Image > Split Image and then select how you
want to split the image. You can then delete portions you don't want
by clicking on the thumbnail image in the left-hand menu, and
pressing delete.

20. If the image is not rotated correctly, go to Image and choose
the needed rotation.

21. Also if at any time you find that an image scanned badly or you
skipped a page and such, scan the image again, it should now appear
as the last numbered page in the thumbnailed Batch menu. Then, if you
are simply replacing an image, select the image to be replaced in the
thumbnailed Batch menu and delete the image. Then highlight (select)
the rescanned image, go to Batch > Renumber Pages... and selecting
Selected Pages, type in the page number that the original image was,
thus sliding it into place.
If you're inserting a missed image, things are a little trickier.
Find the spot where the image should be and then do the following:
(for this example the image should have been #21), select all images
from the current 21 (inclusive, meaning select the current 21) to the
end (non-inclusive, meaning don't select the image that you are going
to be inserting), (click on number 21, hold down shift, and click on
the last-to-last image). Go to Batch > Renumber Pages, and selecting
All Pages, Continuous Page Renumbering, type in 22 for First
Renumbered Page. Then repeat the steps for replacing an image
explained in the preceding paragraph.

22. Now that you have your images scanned/fitted correctly, back to
the middle Image menu we go. If you're not happy with the
fields/boxes auto recognition selected for you, you can click on that
box and just delete it. Then select portions that you want OCRed (if
any), and the images. After you have done this for all scanned images
hit Read All. Note that after you experiment with a few sample pages,
you can select Scan&Read Multiple Images from the Scan&Read dropdown
menu (this will save you the trouble of hitting the same button for
every scan).

23. Once your new recognition has finished, if you have only chosen
to recognize images (no OCRing) you are ready to save.

24. Click the Save button. If you want to save all your images as
one PDF file (FR has a built-in PDF printer driver, so need to
install any additional software ) click on Formats Settings... and go
to the PDF Tab. Flirt with the Save Mode options, by saving only a
page or two of your scan and seeing if you're satisfied with how it
looks in the created PDF document. Text and Pictures Only will save
only the pictures you recognized (recommended), while Page Image
saves the original, unedited image seen as a thumbnail in the left-
hand Batch menu.

25. Under Font Use Mode, keep the default Use Standard Fonts option,
and under Reduce Picture Resolution To and JPEG Quality, experiment
with amounts to balance the total file size (the higher
resolution/quality the larger the file size) with image quality. You
may want to create two versions of your scan, one with smaller file
size and slightly worse quality, and one with a larger size and
better quality. Regardless of different sizes/recognition ratios, any
of your versions should be readable without eyestrain. After the
Formats Settings, click Save to File (keeping the Keep Pictures box
checked), select PDF, and keep the default save options unless
there's something you want to change (all the save options are self
explanatory so I wont go into them here).

26. Once you are satisfied with your PDF, you are done

27. If however, your scan involves text that you felt like OCRing,
there is some more work that you will have to do.

28. Once you have selected all the text/image portions and clicked
Read All (as per step 22), you will now need to edit/format the
scanned material. This is best done in a word processing program
rather than FR.

29. Select the Save button, and click on Formats Settings. Under the
DOC/RTF/Word XML Tab select the following options:
*Default Paper Size: Letter (the 'automatically increase paper size'
feature usually does not matter, if you start getting irregular paper
sizes, by all means uncheck it)
*Make sure that everything else is unchecked, save for Retain Text
Color and Save in Word 97 or Later Format (both are default options)

30. Back in the main save menu be sure to select either retain font
and font size or remove all formatting in the retain layout section.
Keeping the default radio button, retain full page layout, selected
will result in restricted margins, awkward page breaks and other
annoyances when you are editing the file.

31. Now save the scan as either doc or rtf (both can be opened using
Microsoft Word, or the free WordPad or the free desktop publishing
package Openoffice � .openoffice.org ). You will now have to
proofread/format the scan. Some basic things to do include:
*Thoroughly skim over the text/spell check to catch any spelling
mistakes, as well as any false positives, that is words that when
OCRed form real words, just not the correct contextual words, for
instance "mom" instead of "morn"
*Cut/Paste misplaced pictures/captions/titles. While the advantages
of saving without the formatting feature in FR are many, one of the
disadvantages is that graphics often get shifted from their correct
order, sometimes requiring that you look at the original paper
(treeware) document to see where they belong.
*Set desired spacing. Various spacing issues may need to be fixed as
well, these include (but are not limited to): paragraph indentation,
spacing between chapters and removing the '-' mark that may have
split words in the original treeware version.
*Renumbering the Table of Contents (TOC). If your document included
a table of contents, you may want to change the page numbers to
change your scanned version.

32. Finally you are ready to save your work and release it to the
public . You may save as rtf, which is a popular formatted similar to
doc save for the fact that it is much more versatile and does not
require special software, while at the same time allowing formatting
features (unlike pure txt files). The downside is that rtf files are
usually a bit larger than doc files. If you wish to save in pdf
format you will need to get a pdf printer driver such as Fineprint
PDF Factory PRO (check out dizzie.serein.us/serials.htm for tips on
finding serial numbers).

33. Also remember that even if you scanned your document using
another program, and now just have images of pages, you can easily
import them into FR and OCR/format/save as pdf (basically any of the
aforementioned steps). To import images go to File > Open Image and
select the images you want to import (hold down ctrl or shift and
select more than one image). If a pop-up window appears asking about
resizing, select Leave Original. The images should now appear in the
left-hand Batch menu.

Well, I should wrap this up; this guide has gotten a tad bit longer
than I intended. As you will have doubtless realized by now, FR is a
very powerful tool with a vast array of features. To give a final
summary, a basic process of creating an e-document involves: 1)
scanning the document and 2) editing/formatting/proofing/saving the
document. Obviously everything could not be covered in this guide; if
you have a question about something, look through the official FR
help file, and if you still can't find an answer feel free to drop me
a line.

-
Comments? Get in touch: xcon0 @t yahoo \/d0t/\ c||o|m
(or call +1 (610) 887-6072)

For more knowledge check out www.rorta.net and www.dizzy.ws