Many of our customers have a daily challenge with document classification. It may be documents (files) located on common drives, and you may not know what it is. It may be data files submitted to a central archive, but not sufficiently described. It may be by migrating documents and data to new systems or many other scenarios. Cleaning up things like this, is a mature task, and we have previously written about it (see if possible). This post will deal with another approach.
One of our customers has a desire to significantly reduce their cost of archiving. The document classification requirement is basically, that the document being uploading to the archive database is attached a classification, reprenseting the origin of the document (main and subprocess) as well as a retention time. There are relatively few pieces of information needed for the document to be saved correctly, and it can be retrieved (with some difficulty). But in this case the end goal is to a higher degree accurate storage than easy and fast recovery.
The discussion was about how we can lift information off from the submitted documents with an automated solution. As seen in the articles referred to above, there are really good solutions for it, and many of these solutions are deeply impressive (… about to come is an article about the use of artificial intelligence …), but whatever the method, there is a possibility of errors to arise.
During the discussion we found out that the typically scenario woudl be, that all the information we need to determine the final classification is available at the moment the document is created. Therefore, if we somehow could manage to go to the birth of the document and collect the information, the risk of error is significantly reduced compared to the sitution where we receive the document for classification out of context later in the document’s lifecycle. Now we need to get out and talk to the business and identify the places, where we need to collect the information and find a form to collect it. Some cases will definitely allow for an automated collection, but some cases we propably need to depend on an archiving form to be filled in. But it will be better and cheaper – no doubt.
Looking at the ISO standard for Records Management (ISO 15489), one is also convinced that in every respect there is better practice. We should therefore often consider going upstream in the document process / lifecycle to get our information, rather than later trying to recreate the information from the documents out of context.