Guide to Document Migration

OK, so you got the new system. Now you just need to get the documents transferred from the old system.
It sounds simple, but it’s not as easy as one might think… The project gets delayed, users can’t find the documents, data is lost, etc.
Hopefully this guide can help you avoid some of these things.

1. Introduction

This guide covers the migration of documents. Migrating documents differs from migrating (structured) data in several ways. These differences are often the culprits leading to misjudgements about an impending document migration. Here, we go through and review a migration from A to Z.   

1.1. All Sectors – Including Regulated 

The experiences and methodologies shared in this guide largely stem from the heavily regulated pharmaceutical industry. However, we will explain our methods in broadly applicable terms, noting only in comments, where there are particularities in regulated industries. Towards the end, we will finish off with a section specific for regulated industries.

1.2. Different Reasons for Document Migration

Documents are migrated for three main reasons:

  • A new system is being implemented and the content from the old one needs to be loaded into the new one. We call this (system) migration.  
  • A large delivery of documents from a supplier or partner needs to be loaded into the local solution. We call this import.
  • Documents from a running solution must be moved into the archive. We call this archiving.   

The guide focuses on what we call (system) migration, but much of what is described below will be equally valid for the other two situations.

2. Table of Contents

  1. Introduction
  2. Table of Contents
  3. New System – Appendix Migration
  4. Before You Start
  5. Phases of the Migration
  6. Migration-Center
  7. Life Science
  8. Conclusion

3. New System – Appendix Migration

The new system is on its way. Business users have been involved in requesting, designing, and setting up the new solution and are looking forward to putting it into use. The focus has been on business processes, functionality, and ease of use. A natural, and very important, focus. 

3.1. History

However, there is also the documents and the document history in the old system. Some documents may need to be archived; others discarded in a controlled manner. What has to be migrated to the new system must be identified. It is rarely wise nor correct to assume that everything must be migrated. There is a separate project to manage the documents that are not migrated, but we will cover those considerations elsewhere. The focus of this article is on the documents eligible for migration. How to get them migrated in a way that they can be found, used and versioned in the new system.

3.2. User Aceptance of the New System

It is essential for the user acceptance of the new system that the documents are migrated in a way that does the new system justice. We have seen unfortunate cases where otherwise excellent systems are simply declared unusable by users because they cannot find and use their documents after migration. This should be avoided for obvious reasons.

3.3. New System – All Inclusive

The supplier of the new system often offeres a good deal on migration with a promise to import documentation right before go-live. They just need you to provide your documents and metadata in a particular specified format. So now we have a plan. Or do we?

It seems very approachable that the vendor only needs to be handed a bunch of files and some spreadsheets right before go-live. In rare cases, that is in fact all there is to it. However, most often it is not. Sometimes it is much worse. 
We’re not out to vendor-bash. The vendors offer exactly what they need to, and they will very often be the best at getting the documents safely into the system. Although, if it’s a well-known standard system, it might not be so important to have the vendor involved.

However, we want to focus on what appears to be an easy task – “just delivering” document files and metadata. Many are deceived by this, and the consequences can be huge. Let’s start at the beginning.

4. Before You Start

4.1. Understand Your Migration

Migration can be problematic if it is underestimated and taken too lightly. It is therefore necessary, first and foremost, to understand the migration that one is facing. Below are some brief points on that, as well as references to other articles that go into more depth on the individual aspects and claims. 

Recommended articles:

4.1.1. The Short Version

Document migration has some aspects that are particularly complex, and these are largely related to the difference between document- and data-migration, namely the fact that a document consists of several parts: both the content and the metadata. 

Some of this metadata is merely descriptive and used for retrieval. Other metadata must be considered an integral part of the document, for example, proving that the document is authentic. Additionally, documents, as a type of metadata, also have relations to other documents, other versions, other formats, other files on the same case, etc. 

Documents thus have two parts and a lot of relations, and these must be handled without losing the links. Some of the articles referenced above explain how documents can lose their usability and integrity – and thus, for example, legal validity – if this handling is done incorrectly.

The result is that we are faced with a critical and complex operation that is all too easily underestimated.

4.2. Value-Adding Migration 

The historical documentation does not attract the same attention as the exciting new system. Understandable, but a pity. It is a fantastic opportunity to clean up and get high-quality data in the new system. In turn reflect positively on the user experience and system value.  

Even if the old system has been used with care, it often turns out – we dare almost say “always” – that there is still something that is not as one thought in the data. Perhaps a significant number of documents are stored under “miscellaneous”. Perhaps documents with no actual purpose or value have been filed. Perhaps the owner of the documents is long gone, etc. Some of these things are minor, but typically the volume of such things is overwhelming, and it makes the job of delivering the spreadsheet and files to the system vendor grow out of hand.

If we look beyond the new system and beyond the organisation, improving the quality of documentation is something that contributes to efficiency, quality, and compliance in general. 

Migration can be seen as an annoying additional cost of implementing a new system. On the other hand, it might be welcomed as a value-adding activity. If it is to create value, the right skills must be brought to the job and the work must be taken seriously.

4.3. Set the Right Team

Although there is a lot of heavy IT involved in a migration, it should not be regarded as an IT exercise. First and foremost, it is a business exercise. Value is created by addressing documents and data in a completely systematic way, and by cleansing and enriching them in the process. 

A migration requires a broad range of knowledge: 

  • Technical knowledge of the old system and the new system. 
  • Business understanding of the use of the old system and intended use of the new system 
  • Knowledge of the organisation’s master data 
  • Knowledge of the rules and regulations the company is subject to (especially for highly regulated companies) 
  • Document management knowledge 

In particular, we would like to point out that many organisations have actually a trained archivist, records manager or information specialist or similar somewhere in the organisation. This is a skill that could be particularly usefully to include in the migration process. Unlike the rest of the team, this person will typically be interested in the historical documentation, the traceability, and critical metadata – a much needed skill in any migration.

5. Phases of the Migration

The reality of migration is a lot of trial and error and unexpected surprises in the data. We can’t ignore that reality, but we can decide to handle it. The best thing to start with is to divide the migration into phases to give it structure. Structure is a great help, but it should be used with in respect of the reality surrounding it. For instance, something is bound to come up that needs to be analysed, even though the analysis phase is long over, etc.  

Our recommended structure consists of a pre-analysis and a choice of methods and technologies based on the results of the pre-analysis. Next step is an iterative process, with analysis, configuration and testing of the chosen method and technology. When testing is satisfactory, a more formal test is conducted. If successful, the migration is ready to proceed. Finally, reporting follows. 

In the following sections, each phase will be discussed.

5.1. Pre-Analysis

The purpose of the pre-analysis is to identify the overall requirements for the migration, to make informed choices about technology and methodology.

A pre-analysis can be very long as there is always more to analyse. It is important to set clear goals and stop when you know enough to make a decision.

Typical issues to be addressed are: 

  • Approximate scope (which documents and data) 
  • Overall technical differences between source and target systems 
  • Overall structural differences between source and target systems
  • The data quality in the source system 
  • Regulatory requirements 
  • Target system implementation strategy (e.g., incremental, or big bang)  
  • Business requirements e.g., timing, possible decommissioning etc. 
  • Financial, resource or time constraints

5.2. Method and Technology Selection

Based on the pre-analysis, a decision is made on how to migrate. Three main questions need to be answered:

  • Technology:
    Do you use a migration tool, homemade scripts or is it “dump and load”?
  • Segmentation:
    Should the migration be “big bang” or phased, and if so, how? 
  • Quality assurance:
    How will we assure the quality of the migration (in life science= define our validation approach) and how will we report on the migration?

5.2.1. Technology

In the context of the technology choice, we have the following advice and experiences to share: 

If the two systems are structurally similar, e.g., the new system is a newer version of the old system, it may be possible to detach/dump the entire database and attach/load it in the new system. If so, the migration task is purely technical and some of the above-mentioned concerns about integrity and the right skills in the project can dismissed. 

A migration is basically an export from the source and import into the target system. Often the supplier of the new system has offered to import to the new system. If the source system has a sufficient export option, and the output can be used by the import feature, this approach could be the right solution. 

Often, however, there are differences in the metadata the new system and the old system rely on. This means that metadata must be transformed in the process. There may also very well be a need for enrichment or clean-up during the migration. If this is the case, we recommend using a commercial tool designed to handle metadata mapping and metadata value transformations. The IT department may offer to develop something, but this option is seldomly worthwhile compared to buying or leasing an existing tool. 

Depending on the platforms to be migrated to and from, there is a variety of tools on the market. We have a favourite that we like to rely on unless circumstances dictate otherwise. It is the migration-center tool from the German company fme AG. In many cases, this tool is a really good choice. It can connect to a wide range of technologies out of the box. In addition, it has all the functionality we have needed so far in terms of mapping, transformation, and reporting. Artificial intelligence for classification is starting to appear in the tool as a possibility. They are not frontrunners, but rather exercise deliberate caution, to maintain the tool’s core capability and document full control throughout the migration.

5.2.2. Segmentation of the Migration

There are physical limits to how much data can be moved at a given time. We see that sometimes it is not practically possible to shut down on a Friday night, migrate over the weekend and be ready Monday morning. This is however very often requested. In these cases, the migration has to be broken down into smaller chunks.

If users are onboarding to the new system in phases, and data should be migrated accordingly, there is need for a segmentation of the migration. 

It is also often decided not to migrate documents from in-progress projects. These are left in the old system until the project is finished. This minimises the impact on the business, but means that a catch-up migration must be performed when the project is finished. Only then, can the old system be closed completely. 

In short, you very often end up having to migrate in stages, adding to the complexity of the project. You need to keep track of what has been migrated, deal with new documents, and documents that might have been accidentally edited in the old system after the initial migration. 

It gets even worse if you have to allow documents to potentially be modified in both systems by two different user groups and these need to be synchronised. This should be avoided, if at all possible, because the complexity becomes insanely high.  

The thing to decide, is whether to sub-divide the migration into smaller chunks or go for the big bang. Subdividing is usually a good indicator that a commercial migration tool is needed.

5.2.3. Quality Assurance

If the documents you are dealing with are records, then migration is a critical action. Here we leave the assertion, but the articles referenced in the “understand your migration” section above explain and justify the assertion. 

This means that we need to have an apparatus in place that ensures – and can document – that records are properly migrated. There are two tracks to this: 

  • One is to ensure that the method and technology are tested and work correctly 
  • The second is to quality assure and document the migration execution  

If the technology is homemade, the testing work will obviously be quite extensive.

5.3. Analysis, Configuration and Testing

This phase consists of iterative cycles of examining, testing and improving until the a satisfactory result is achieved.

  1. The first thing to work on is to define the subset of the source that you want to migrate. This could be all documents of a certain type, from a certain department, in a certain state, related to certain products or cases, or many other things. 
  2. The next thing to manage is classification and mapping. This covers how objects in one system and the other system fit together. Often the systems operate with some basic types of documents, and these are the ones we need to match between the new and the old system. In the old system we might logically split our system by the type of report, while in the new system we logically split by the type of product the report describes.
  3. Next, focus on the metadata of each of the above classes or types. For example, in the old system there was a ‘title’ on a report, whereas in the new system it is ‘name’. The ‘title’ fields must then be mapped to the ‘name’ field so the value is put in the right field during the migration. 
    But these can be – and typically are – much more complicated rules and transformations. Sometimes the rule, does not give the expected result when used on document metadata. Even if the rule is correct, there may be a number of documents that are different than expected. Typically, this means that the rule must to be adjusted, or perhaps the data must be further subdivided.
     
  4. Finally, there are the values themselves. In many systems there is a list of allowed values for certain fields. Months, countries, product numbers, client names, etc. Therefore, the values migrated into a given field must match the allowed ones. 
    If there are differences between the two systems – and there almost always are – the value lists must be mapped against each other. Examples could be translations, misspellings, new department names, changed product names etc.
    The tool we often use, has an option to simulate a migration. It can collect all the information from the source system and experiment with it, offline from the actual system. Once the rules are set up, they can be imposed on all documents in the subset and show the result. Did it behave as expected? Probably not, in the first few attemts. At some point the results must be reviewed by someone with business knowledge of the document contents. It might be convenient to be able to export the results to a spreadsheets for review. Once a satisfactory result have been achieved, for all datasets, you can start preparing for the actual import to the target system. 
  5. Before starting an actual migration, it is highly advisable to consider setting up a rollback strategy. For example, should a script be constructed that removes and cleans up after a failed migration? Rollback scripts and processes can be written manually or set up as a feature in a suitable migration tool.

5.4. Formal Test/Pilot

So far, we have tested and checked the result. Now it’s time for a formal test that leads to us being ready to perform the migration. What a formal test should then contain is industry-dependent and very dependent on technology and methodology choices.  

It would be customary to do a pilot migration, where a few but representative documents are migrated in a quantity, so that you can go over them in detail and make sure that what you hope is happening is happening. And it typically shouldn’t be the technicians and business analysts who have been part of the migration who are doing the testing here. It should be done by the business users. It can be thought of as a user acceptance test of the outcome of the migration – much as one typically does such an acceptance test of the system itself.

In the life science industry, one should expect having to qualify the technology and methodology and subsequently validate the migration. Here we are at the qualification stage. Strator has templates ready for both IQ, OQ and PQ/UAT, which can be adapted to the circumstances and your QMS.

The result of this – regardless of industry and form – is of course that we dare to start our migration.  

5.5. Migration

It is now time to launch the migration itself. This could be a big bang migration, where we close the old system for good on Friday night and open the new one on Monday morning, and in the meantime the migration is done. 

More often, the migration consists of smaller migrations, as indicated above. Regardless, you will often have a closing window, and during this get (the last parts of) the migration done and close the old system or parts of the old system. 

So, in short, this phase is about doing what we have been practicing – migrating the documents and testing that it went well. 

In some situations, you need to have a tight schedule running and have worked out when it is the last chance for a rollback. It is understandable that when problems occur, you keep trying to solve problems and don’t focus on anything else. That’s when you need a well-calculated schedule that ensures you stop in time to roll back. There’s nothing that can be said in general about that, other than to be aware of whether it might be a problem in your case. 

There may also be some mandatory steps in the execution from a quality assurance point of view. For example, you will typically require the migration to be verified (tested) and approved before allowing you to open for access to users on the new system.

One must also be prepared to deal with the fact that there will be failing documents. Something odd will always happen, and it is essential to be clear in advance of what you are going to do about it. Often you will collect them in a list and end the migration with a deficiency list, which you then deal with afterwards, but there may be reasons why this is not a good practice.

5.6. Reporting

After migration, it is good practice to collect and archive documentation of the migration’s completion. In many contexts this will be an explicit requirement.

The documentation may include a log from the technology used. Ideally, this documents what transformations the data has undergone along the way and shows that the content files in the receiving system are identical to those in the source system. 

Documentation of the verification of the final migration, and documentation from the phase where we formally tested the system, will both constitute key evidence that the migration is in order.

All this documentation should be stored. Indeed, it constitutes the proof that the documents have maintained their integrity during the migration.

6. Migration-Center

As mentioned earlier in this guide, we have a tool that we use/suggest when relevant. It is called migration-center and is owned by the German consulting company fme AG. 

Like us in Strator, fme AG has historically worked a lot with the large document platform Documentum, and the migration tool was originally developed to migrate documents into Documentum, and as such we in Strator have known it for around 15 years.

It is not a bad heritage to have. Documentum is a very large platform that supports very complex relationships between documents, so when the tool can handle that complexity, there’s not much on the market it cannot support. In addition, Documentum has historically been widely used in the life science industry, where the tool also can meet strict regulatory requirements at its core.

The tool has now been generalised and the migration engine at the core of the tool can now be used with different connections between source and destination, the so-called migration “migration paths”.   

6.1. Structure of the Tool

The tool is – for now – a traditional client-server tool, designed to be installed inside the customer’s firewall. For now, the source of a migration is usually on-premises (i.e., not cloud), and so you get the best speed from having the migration host alongside. 

The tool is operated from a client, and behind it runs a migration server and a database.  

The tool works from the operator point of view as described below.

6.1.1. Scanning

  • The source is scanned for documents  
  • All available metadata for the documents is retrieved into the migration database from the source 
  • You can also choose to download the document file itself into the migration tool’s file system

6.1.2. Processing

  • A subset of the documents is selected 
  • For that subset, metadata is mapped, and rules are set up for transformations 
  • A transformation is performed, which means that the migration-centre simulates the migration for the selected subset and presents the result  
  • A validation is performed, which means that the migration-center technically checks the calculated values. It will check that the data type is correct, e.g., that a data field contains a date.  
  • One keeps correcting rules and mapping, transforming and validating until you have a satisfactory dataset.  
  • Now it will make sense to get business experts involved and check that they are getting what they expect.  
  • One works with and prepares a subset of the documents at a time – subsets that make it simplest to set up rules for the whole subset.

6.1.3. Import

  • After the simulations, the confidence is now there to start importing to the destination system. According to the phasing we presented earlier in the guide, here you will perform the formal test before proceeding to initiate the import into the production system.  
  • Once the import is started, there are basically two options. One is to import the data that was loaded during the scan. The other is to ask the system to perform a new scan, retrieve the documents in their current state and process them according to the rules you have set up. The difference is whether you need and want the latest.
  • During the import, all transactions will be logged, and the log can be pulled up afterwards and formatted into a migration report.

6.2. Gate Model

There is one more subtlety to the tool that is worth highlighting. It is built on a gate model. This means that there is a number of states a document can be in inside the migration tool. For example transformed, validated, or imported. The states are linked (gated), so a document cannot get validated before it is transformed, or imported before it is validated.  

That is, the migration centre keeps track of every document for us, if it is ready to be imported, and how the import went. When a document is failed, migration-center knows it, and if you fix something in a rule and run your migration again, it will try to migrate that document again – but it skips migrating the others that are already migrated successfully. It’s a tremendous relief and support in the practical execution of the migration.    

6.3. A Large and Complex Tool

The migration-center is not a tool you learn to use in a day. It can do a lot and is therefore complex. It requires training and practice before you can take on a migration project. We highly recommend that til seek experiences assistance, if not for the whole project, then at least to get you off on a good start.

Previously, migration-center and other similar tools were exclusive to one or two particular systems, and hence quite expensive. However, as the tools have matured and become more general, the price have come down significantly and with flexible license models.

7. Life Science

This section is specific to handling of GxP-critical material, which requires special care. Everything we have discussed above applies here too, or perhaps in particular here too.

Strator works across a number of industries, but the life science industry has always received special attention from us. It is here our deep expertise is particularly needed, and we have a long-time experience in dealing with this regulated environment.  

The migration-center tool has over time been used in migrations in many large pharmaceutical companies. The supplier – fme AG – is open to auditing and can provide the SOPs for the development of the tool upon request. In other words, it is both a tool and a supplier geared for the pharmaceutical industry.  

The tool is also absolutely fit for life science, in terms of functionally. One example is the thorough logging of all activities. Quite simply, there is full traceability at document and metadata level.

Strator has developed a process and a package of documentation, to go with migration-center. This can be used as a startingpoint for the validation and documentation required by your QMS. We have various best practice templates we use in gathering requirements and specifying the migration. We have an “operation handbook” for the migration tool, and a set of IQ and OQ test case templates. 

8. Conclusion

Migration is often underestimated at the expense of data quality or a possible new system. Migration is, in many people’s perspective, “straight forward” and “something IT can fix”. The reality is that this is rarely the case. We hope this guide has helped you see the migration you are facing more clearly and anticipate where the problems and challenges may lie.

8.1. The Experience Behind

Our consultants at Strator have worked as document management specialists with a technical or process perspective throughout most of their career. Some with as much as 10-20 years of experience. Our recommendations, best practices and not least our statements about common mismanagement of migration projects, stem from this long and collective experience. 

This is also why we decided to put migration on a formula. We gathered all our experience and created an operating model with build-in continuous improvement. We reached out to fme AG, incorporated their tool into our methodologies.

8.2. Want to Know More?

This guide is our attempt to share our experiences and methods. It is by no means exhaustive, but we hope you find it helpfull. 

The knowledge section on our website contains articles about the inconveniences of migration, dos and don’ts and other topics within handling, maintaining and moving documents. We try to share our experiences with what we do: Managing, organising, classifying, and migrating large volumes of electronic documents.

Please visit our knowledge section to learn more and feel free to reach out and get in touch – your questions are welcome.