Digital Archive – What and Why?

Active documents must be in a system where they can be easily accessed and worked with, such as a document management system. But should inactive documents also be stored there just because they were once active? This article looks at that issue.

Active and Inactive Documents

Documents are stored for many different reasons. Some because they are actively being used right now. Perhaps something clever we have written and are using or ongoing work that is being documented. Perhaps a valid procedure that must be followed. These active or ‘hot’ documents are alive, being searched for, read, and perhaps even edited and evolving. 

Other documents are stored, but not actively being used. A procedure for a piece of equipment no longer in use, a report of something that has happened, a contract from a done deal. The documents are not actively in use but needs to be kept for reference and documented evidence. As the name implies, these inactive or “cold” documents are in their final form and therefore no longer need to be changed. They need to be kept and retrievable, but they will be needed only on rare occasions and perhaps only to a limited audience.

It makes sense to store active documents in a system that supports active use. Active documents need fast and easy retrieval, granulated permission models, authoring, editing and collaboration. In regulated industries history and audit trail are also a requirement. Document management systems are designed to do just that. But what about the inactive documents? Most of them start out as active documents, which at some point become inactive and are therefore ‘born’ in a document management system. But should they stay there? In many cases it makes sense to transfer them to a digital archive. These are systems designed specifically for inactive documents and support the requirements around those.

In the ideal world, processes for retention, deletion and archiving are set up. However, in the real world such regular ‘spring cleaning’ is often not set up. 

What is a Digital Archive?

A digital archive as a system purpose built for long-term storage of inactive electronic documents. A place where documents are kept for reference during the period they should and/or must be kept. Unlike the document management system, the digital archive is not a place where you version and update your document. If this is needed, the document does not belong in the archive.

The key requirements of a digital archive are retrieval (not necessarily on-demand), data preservation, and retention management. Some documents must be kept for a minimum number of years, while others require deletion after a certain period. The permission model of an archive is usually simpler and more restrictive, than in other systems.

Maintenance of Your Digital Archive

We know from paper document archives that managing these requires a lot of work. Retention periods means that someone must keep track of the dates, and handle the document accordingly. With digital archives, even more factors must be considered. Some not present in the paper world, and so less ingrained in us.

Future-Proof Format

A paper can – if stored under proper conditions – last almost forever. At least a very long time. But that’s not necessarily the case for digital formats. If you put a Microsoft Word document in an archive it may be difficult to read it ten years later. You may need a 10-year-old Microsoft Word software package (and a machine to run it on). Alternatively, you are dependent on the software vendor to have included a suitable backwards compatibility in the newer versions. 
Thus, if data is to be stored for a very long period, it may be a good idea to convert the document to a format suitable for long-term storage. For example, a TIFF or PDF/A file.

Integrity

Another issue in the digital world is that the concept of authenticity can be challenging in its own way. Digital documents are easy to copy and edit. The document owner must ensure that the integrity of the document is preserved. That it is demonstrably original and unaltered, and that its context is accurately represented. In practice, this means that the documents, with relevant metadata – typically at least author and dates – must be archived in a controlled way and that they are not altered. 

Data degradation

Finally, there is the concept of data degradation. If kept from fire and rot, the only real concern for paper documents if the fading of the ink. For digital documents the actual data stored on the media may deteriorate over time. All forms of magnetic and optical storage (traditional hard drives, DVDs, etc.) will slowly deteriorate and cause data loss. Solid-state media (Flash, SSD) will also, although much slower, leak their charge and cause data loss. Regardless of your long-term storage media, measures should be taken to refresh or reprogramme the data.

Why Archive?

The alternative to archiving is to leave the documents where they are – typically in an active document management system. Often this can work perfectly well. The many documents may take up valuable diskspace, clutter the navigation and perhaps effect performance. But it can work. 

So, the motivation to spend money and effort on archiving must come from somewhere else. Some of the most common reasons we see are either practical or financial:

  • The active system does not support archive features. 
    Main archiving requirements like retention time and file formats suitable for long-term storage are typically not supported by document management systems. 
  • The active system has lower usability and performance.
    Too many documents in your active system will make it harder to locate the documents you need on a day-to-day basis. It will become increasingly more difficult to browse for your documents. Also, your system will require many resources to index end search in a repository with too many documents. 
  • Operational costs.
    To ensure availability and performance, active systems are running 24/7 with high performing servers and redundancy on both servers and storage. Keeping all your documents active and available on-demand will have a higher cost, both financially and environmentally. 
  • Your current active system is being replaced with a new one. 
    It is then often cheaper and easier to move the documents to a digital archive than to enrich the metadata and make it ready for migration to the new system. 

What’s the Alternative…

The above, are typical – and perfectly valid – reasons why archiving should be considered. They can lay a good foundation for a financial business case on the creation and operation of an archive. Decisions are often driven by practicality and money, and good business cases can carry a lot of weight. However, the best motivation and reason for archiving is really something else. Namely the great risk in not doing so.

An archive is a specialised system with business-critical functionality. Archiving is a profession. It is not a job for a student assistant. An archive must be managed by record managers to make sure the records are in order, data is secure, and regulatory compliance is maintained. Even if you can store documents in business systems, neither the systems nor the staff are the right people to do good archiving.

Conclusion

If long-term storage is important for you in case of legal disputes, regulatory compliance, patient safety, business continuity, due-diligence, or any other reason, you ought to consider “doing it right”.

Setting up a digital archive with the right systems, staffed with the right subject matter expert will make sense. Both from a practical, financial, and business point of view.

Our experience and best advice for a controlled migration, where the documents are well afterwards, is gathered in our Guide to Document Migration.