Scanning paper documents into digital images is what we think of when we hear digital document imaging. However, digital document imaging involves additional processing before the image becomes useful content.
While the scanner reads light information from the document or object being scanned, the scanner software saves this information in a graphic format like JPEG. This is the first processing task.
Often, the image quality leaves something to be desired. There might punch-hole marks, black borders, illegible characters owing to poor contrast (such as blue ink characters on a bluish background), and so on.
Sophisticated scanning software can remove undesired elements from the digital image and adjust contrast to provide legible characters (often better than on the original). Improving image quality can be the second processing task.
Where the digital image contains text characters, they will be in a format that’s not readable by the computer. Text characters need to be converted into ASCII or another text-specific format before machines can read it.
Such conversion is done by character recognition software using OCR (Optical Character Recognition) or ICR (Intelligent Character Recognition) technologies. Here again, more sophisticated software distinguishes between closely related characters and produces the right output. This is the next processing task for text documents.
Paper-based text documents need to be scanned and their characters recognized before they can become proper digital text documents. These processes might require advanced capabilities to provide consistently high-quality images and accurate character recognition.
Even though the text document image is editable now, it cannot yet be part of the electronic workflow or an integral part of the enterprise content management system. To achieve this status, one more processing step is needed: indexing.
Indexing involves attaching the words in the document content, or the document tags, to the document itself. It would then be possible to retrieve the document based on those words. Some refer to these words as document properties.
You enter the words into a search box, and the search facility would bring up a list of documents that have been indexed with those words. You can then select the particular document you want from the list.
It’s at this stage that the document can be said to be a true part of the enterprise content management system. Until then, the document might not be retrievable even though it may reside on the computer storage media.
If it’s not retrievable, it cannot become part of the workflow processes.
Once the document has become part of the enterprise content, it can be made available on the intranet and extranet. People in the organization, whether they work locally or in a geographically distant location, can access it if they have necessary access rights.
No large enterprise of today can stay competitive if it tries to manage information flows using traditional paper/folder/filing cabinet/file room retrieval methods.
Digital document imaging means more than scanning. Scanned images have to be saved in a recognized graphic format. Any text in the image has to be made machine-readable using character recognition technologies. Quality of the image and accuracy of the character recognition might need improvement using sophisticated processing software. The text documents would then need to be indexed before they become an integral part of the enterprise workflow.