Myths And Realities
You pay for what you get: Not really. Some OCR software are expensive and yet are poor, while some cheaper alternatives can be much better.
You can expect to get a page scanned with almost no errors: Despite technological advancements, there is still no such thing as a perfect OCR software. It is for this reason that you still need to countercheck and proof an OCRed document.
Handwriting cannot be OCRed: Due to vast advancements in character recognition technology, most OCR software can recognise handwritten documents in a variety of languages.
QUESTIONS TO ASK
What is the minimum and maximum character size that it will recognize?
Look for an OCR software that is capable of correctly recognising font sizes as small or as big as your requirements may be. Typically, you should be looking for font sizes from 5 to 72 points.
Does it have an inbuilt spell-checker?
Some OCR software have this feature and let you spell-check documents immediately after OCRing. You can then correct the errors and save the document in your preferred format.
Can it recognise compound documents with text and graphics?
Separating text and graphics in compound documents like PowerPoint presentations can be a headache for OCR software. The thing to look for here is that the OCR software should be able to recognise and separate the text, and also, the final document should retain the images.
How many OCR recognition engines does it use?
Some OCR software supports more than one OCR engine. Engines divide a page into zones, and the larger the number of zones, the slower will be the scanning, but the higher the accuracy will be.
Usage Tips
· Cleaning the glass scanner bed and the paper to be scanned will make a great deal of difference in the quality and accuracy of the OCRed document.
· Initial training to the OCR software (before you begin to use it) will increase its accuracy. You’ll get fewer errors.
· If it can’t recognise a complex document, divide the page into parts and scan the text parts separately. You can later integrate the text results with the images into a single document.
· Explore the options; this can help-sometimes you need to specify what to do with different fonts, what font the document is in, whether to look at pictures or not, and so on.
· Sometimes, you’ll need to scan documents at a place where OCR is unavailable, and you only have the option to scan the document and bring the images to your PC. OCR software have the capability of accepting various image formats as input and extract text from them. This is an added bonus worth looking for.
What To Look For
Learning mode
Many OCR software have a learning mode, using which it learns from the corrections made earlier where it had erred. It thus adapts itself so as not to repeat its mistakes.
Language support
Check for the number of languages supported by the OCR software, if your office deals with foreign languages. You might not see support for Indian languages as yet though. Almost all OCR software can recognise more than one language. Most of the languages other than English are European languages. Even if a language is not supported, some OCR software have the capability of adding language dictionaries, thus increasing the number of supported languages. In such case, check whether your language pack is available.
Saving formats supported
Most OCR software can save documents in MS Word format. Apart from this, look for support for saving in other formats such as PDF, HTML, and MS Excel.
Batch-processing support
This feature is very useful when there are a large number of documents to be scanned. Using this, you can stack up the documents in the ADF (a scanner with an ADF is necessary, of course) and OCR all of them instead of sitting in front of the scanner, feeding in one paper at a time.