Offices are increasingly digitising their paper documents. OCR (Optical Character Recognition) software is the tool that enables this process, and the MFD makes it convenient. OCR software, in conjunction with the scanner, recognises characters in the scanned image of the hard copy and converts it to a document format that a word processor can recognise. This document can then be edited if necessary.
It is sometimes necessary to convert the document to PDF, because this file format is platform-independent and does not require any fonts to be installed-a PDF looks the same on any computer that can read the format. PDF is therefore ideal for exchange and distribution of electronic documents. We shall see how to create a DOC file as well as a PDF from a hard copy using an MFD.
The OCR software we’ll be using is SimpleOCR-because it’s free! SimpleOCR has many of the features of paid software-a 1,20,000-word dictionary, error correction, format and image retention, plain text extraction, batch and zone OCR, and more. It is a 9.3 MB download, available from www.charactell.com/ scanstore/InstSocr.exe.
To convert a scanned document to PDF, you need a PDF converter. You can use CutePDF Writer, formerly known as CutePDF Printer. CutePDF Writer installs itself as a printer subsystem and can be used to output a PDF file as easily as printing to a printer. This way, any Windows application that can print will be able to output PDF files. CutePDF, too, is free, and is available from www.cutepdf.com/download/ CuteWriter.exe. In addition to this, you may also want to install CutePDF Writer Companion, available from www.cutepdf. com/download/CuteComp.exe. This requires a PS-to-PDF converter such as GhostScript, available from www.cutepdf. com/download/converter.exe.
Using A Laser MFD You Can… |
Send faxes directly from your computer’s modem without having to print the document and feed it manually into the fax machine. Scan photos and e-mail them directly by just pressing a button (on models that support the scan-to-email feature). Scan multiple documents in batch-processing mode, and also OCR them. Save received faxes on your computer and print only the required ones. Use it as a copier without the need to connect to a PC by just pressing the Copy button. And,(Automatic fax/phone switch) If you have a single line for phone calls and faxing, some MFDs will switch to the fax function when there’s an incoming fax. |
How To Go About It
1. Install all the above software. Place your document on the flatbed glass of the MFD. Launch SimpleOCR.
2. You will be prompted to choose the type of document you want to read. The options are Machine Print and Hand Writing. When you choose Machine Print recognition,
The SimpleOCR welcome screen
you can only scan and OCR documents that are printed, whereas when you choose Hand Writing recognition, you can even scan and OCR handwritten documents-the precision of a different and more powerful OCR engine is used in this case. While the Machine Print recognition engine is free, the Hand Writing recognition engine can be evaluated
Choose the type of document to scan
for 14 days (it costs $60, or Rs 2,700). Choose the type of document and proceed.
3. In the next screen, you can choose the language. SimpleOCR recognises four languages, including US English and UK English. Choose between these and click Select.
4. You need to select the appropriate scanner for the software to use. Click on File, then on Select Scanner. A window will appear where you
Select the appropriate scanner
can see the list of scanners on your system. Choose the correct one and click OK.
Choose the scanner settings
6. The document will now be scanned, and it will appear in the window. To begin the OCR process, click “Convert to text.”
The Fax Features Of An MFD |
Most MFDs come with a fax feature. They can send as well as receive faxes without the need for a PC. The number of pages that an MFD can store depends on the fax memory. Some MFDs can provide a detailed report about the number of faxes sent and received, as well as the numbers to which they are sent to and received from. This is useful for system administrators. The “junk fax barrier” feature in some MFDs lets you specify certain fax numbers that you think are likely to send junk faxes. Faxes from these numbers are then blocked. Then there is the Broadcast feature, which lets you send faxes to a number of recipients at once. The Delayed fax or Scheduled fax feature lets you prepare a fax to be sent at a later time. |
7. You can now click Accept to accept a word if it is correctly recognised, or you may correct it. You can also click Decide Later to, well, decide about a word later, or Keep As Image if an image was misinterpreted as text.
Convert the scanned document to text
8. Go to File > Save as and save the file as a TXT or DOC file. If you wish to save only the text, you should choose TXT, whereas if you wish to also save the formatting and images in the original document, you should choose the DOC file format. You can also save it as an image by clicking as File > Save Image As. You can save it as a TIFF image, but doing so defeats the purpose of the OCR software!
Finally, close SimpleOCR.
We’ve now converted the hard copy to the electronic format; let’s now convert the electronic document to PDF.
1. Using your word processor application, open the document you just scanned and created .
2. Go to File > Print, and select CutePDF Writer from the list of printers. You can change the paper size, orientation, and so on to what you want. You can also change CutePDF Writer options according to your preferences. Click OK.
Select CutePDF as the printer device
3. Click Print. The CutePDF Writer Companion will launch; here, click Save to save the document as a PDF.
The CutePDF Writer Companion
4. Remember that you can also insert or delete pages, edit document properties, combine PDFs, and so on before creating the PDF.
While SimpleOCR can get most of the job done, it is still not as powerful and accurate as some of the better and more powerful OCR applications available. Particularly problematic areas are the retention of formatting, recognition of images as such, and stray marks on the paper throwing the software off.
But then, it’s free!