While hardware devices like the CCD (Charge-Coupled Device) in the scanner sense the light information from the scanned document, it’s document-imaging software like scanner software that does most of the work. The scanner software saves the retrieved red-green-blue light information in a standard graphic format like JPEG.
Text in Document Images
The saved JPEG image might contain text characters (in a business context, most of the scanned documents would be text documents). These text characters are human-readable but not yet computer-readable.
Unless the text is computer-readable, one can’t edit the document or add any additional data. If you want to use the text document in your workflow, you would most likely need to convert it into a machine-readable text format such as ASCII.
Character recognition software like Optical Character Recognition—OCR—and Intelligent Character Recognition—ICR—do this task. They make the text characters machine-readable.
Making Documents Searchable
The machine-readable document is not yet an integral part of the enterprise content. To make it an integral part, it needs to be indexed and its meta information (also known as document properties) included in the relevant index file.
Again, it’s a piece of software, the indexing software, that does the job. It links the document to the words in its content (full-text indexing) or in the tags attached to the document (tag indexing, document-property indexing). It then becomes possible to retrieve the document with the linked words.
Indexing software can even work automatically if the selected option is full-text indexing. For tag-based indexing, you have to attach relevant tags (usually words that people use to look for the document) to the document.
Fine-Tuning Images and Character Recognition
The original scanned images might not be all that good. They might contain illegible characters from poor contrast in the paper document, unsightly black borders, distorted characters from folds in the paper, and so on.
Here again, document imaging software can come to your help. Programs with sophisticated algorithms can help you get images that might be better than the original paper document.
Character recognition can also face problems. For example, closely-related characters might be confused with each other, producing unreliable text documents. OCR programs are now available that can handle such problems and carry out highly reliable character recognition.
Selection of Software
There are some issues you should be aware of when selecting document imaging software. The selected programs must be compatible with your existing system—scanners, document management system, operating system, and so on.
Most software has been developed to be compatible with well-known scanner brands and major document management systems. You only have to check that the software you buy will work with your systems. Backward compatibility with earlier versions of the software can also be an issue if you have document images created with the latter.
Some document management software packages come complete with a document-imaging component, and you may save money by purchasing the package as a whole, rather than purchasing each component separately.
Cost is another criterion. Going for unnecessarily sophisticated features can push up your costs. Assess your needs and select only what you want, now and in the near future.
It’s document imaging software that does most of the work related to document imaging—such as saving the image in a standard format, making text characters in the image machine-readable, indexing the documents to integrate it with the enterprise content, and so on.
You should select software that is compatible with your existing systems.