Archive or Migrate?

It is possible to approach the work in a more nuanced way and identify where to put your efforts by distinguishing between two different ways of moving documents from an existing system to a new home. 
A customer case demonstrates the point and the huge savings involved.  

Why Differentiate?

We will discuss and compare two different ways of moving documents. For lack of a better term, we will refer to them as migration and archiving. The main difference is that the documents end up in two different states and are prepared for two different types of use. The effort put in moving a particular document, should be focused on preparing it for the way it will be used in the future. This approach has been shown to significantly reduce the workload of the document move.

Active and Inactive Documents

The first step in the differentiation is to distinguish between active and inactive documents.  

Inactive indicates that the documents are in their final form and, therefore, no longer need to be modified. Active documents, on the other hand, are alive, used and perhaps even evolving.
An active document could be a procedure currently in use, or a report or drawing from an ongoing project. Examples of inactive documents could be a procedure that is no longer applicable but must be kept for regulatory reasons, or a contract from a closed project.

In the article “Digital Archive – What and why“, we argue that long-time storage is best done in a system dedicated to this purpose. Most often, what we call inactive documents are what is generally referred to as ‘records’. The management of records is a separate discipline, associated with a significant set of guidelines and best practices.

We will not repeat that argument here, but simply point out that the requirements for use of active vs. inactive documents are quite different. Active documents must function smoothly in a system supporting the business process of the document. This requires fast and easy retrieval, editing capabilities and a granulated permission model. Inactive documents, on the other hand, need to be found and accessed seldomly, and focus is more on longevity and bulk storage.

Migration versus Archiving

When we propose distinguishing between migration and archiving, it’s primarily a matter of distinguishing between active and inactive documents and thereby giving them the attention they need, when moving from one system to another.

Migration is a broader term meaning to move documents. Strictly speaking, archiving is thus a specialised migration, where the receiving system is an archive. Although archiving as a discipline involves much more than merely moving documents, we use the two terms, as they are most commonly used:

Migration

Moving documents from one business system to another.

Archiving

Moving documents from a business system to an archive.

Examples from Real Life

A pharmaceutical company had decided to move from a legacy business system (eDMS) to a new system. It quickly became apparent that the majority of the workload in the project was to enrich the documents with metadata. Classifying and indexing the documents was a huge task. It is actually quite normal, but the extent was not realised beforehand. This resulted in a change of plans where only the documents that needed to remain active, were moved to the new business system. “Everything else” was transferred to the archive.
The difference in workload for the two activities was enormous. Had they chosen not to differentiate and put everything into the new system, the project costs would have exploded. As an added bonus, the new system ended up in a much better condition with nicely indexed and relevant data, with no interference from irrelevant data.

Data

The customers legacy system contained a lot of valuable documents – just under 2 million. The system had been in operation for many years and undergone several changes over the years. As a result, the documents were stored inconsistently and with very inconsistent metadata. This is perfectly normal and the rule rather than the exception and should not be seen as a lack of control. After all, the world evolves over the years and so do ways of working.

The Migration 

In the legacy system, the documents were arranged neatly in a folder structure build for end-user logic. This structure contained, amongst other things, information on product names. Metadata on the documents themselves contained country information. This meant that as part of the migration we needed to extract data, both the folder structure (like product names) and document metadata (like country). Both were to end up as document metadata in the new system.
In theory a straightforward task, although time-consuming. 

However, as theory became reality, we encountered problems on especially two fronts:

Product Names

The product names stored in the logical folder structure, turned out to be an inconsistent mix of working names, marketing names, country specific names, etc. Throw various misspellings and languages into the mix, and the straightforward tasks becomes quite complex and voluminous. Since product names were regarded as essential metadata, it was decided that they needed cleaning up and indexing. 
Through a lot of automated rules and logic, as well as a great deal of manual indexing, the product names were cleaned up and aligned with the company master data standard. In the end it turned out well, but it required an incredible amount of work.

Country

The country information presented a different type of problem residing in the difference in the way the old and the new system was structured. Since the legacy system had a country stored as document metadata, we just “translated” this to the country names prescribed by the master data plan and entered them into the new system. 
However, in the legacy system the country information was just plain metadata for sorting and filtering documents. Access to documents was controlled independently of country and a user might have access to both documents from Denmark and Iceland. This was by design and the users were used to having shared documents labelled with one country or the other. 
In the new system, the country label was integrated in the permission model and used to restrict access. Danish users only had access to documents labelled ‘Denmark’, and so on. Without additional considerations (and work) Icelanders would not be able to access the documents they had in common with the Danes. As with the product name, this led to extra and partly “handheld” efforts.

The Archiving

Anything that didn’t go into the new system went into the archive. In fact, this turned out to be almost 95% of the content. This led to the decision that rather than filtering out the 95%, the archiving would just include everything. Alternatively, we could have devised a way of tagging migrated documents to allow an easy filtering. However, this was not done in this particular project.

Most archiving systems store the content in dedicated “boxes”, and this kind of migration is very much about granulating the content into the right boxes and setting retention dates. Of course, metadata is still important and for this project the archived documents also needed to be labelled with both product name and country. The big difference is that since this is an archive, we can use the metadata as is. As the archive data is only needed seldomly, it can be justified having to spend a little more time finding it. For product name we used mix of working names, marketing names, country specific names as they were. The permissions residing in the country field could be disregarded in the archive, since this has its own permission model.  

Conclusion

Almost 2 million documents were moved in the project. Active documents were migrated to the new system and the rest was archived. In addition, the data in the new document management system was (greatly!) reduced in quantity and the quality was improved. In the end, the users ended up with not just a new and more up-to-date system, but also enriched and updated data, easier to find and manage. 

The project really underlined the need for cleaning up and enhancing the data quality. Not just for the sake of the end users, but also for the project itself. The per-document-effort in migration was an order of magnitude higher than the archiving. It could almost be said that the 5% migration accounted for 95% of the work in the project. And with almost 2 million documents, the numbers speak for themselves.