SessionDigital Archiving Strategy for Production Archives:
A Pragmatic Starting Point
Presenters

Nicolas Hans
Dalet, Paris

Johan de Koster
Radio Netherlands

Katherine Straub
Dalet Digital Media Systems

ABSTRACT


An increasing number of broadcasters and organizations are considering the digitization of their media archives. Implementing digital media libraries so as to ensure the proper preservation of legacy archives has been recognized as a priority. Yet, many organizations are faced with a paradox: although strategic, these digitization projects are postponed because of budgetary constraints. As a result, little attention is paid to the opportunity and necessity to archive day-to-day programming and use that as a starting point of a digital archiving campaign. This paper, a follow-up to one recently presented to AES in Berlin, discusses several case studies and suggests a new approach to implementing a pragmatic archiving strategy – one that will get approval and support from management.

_______________________________

PRESENTATION

By 2020, storing 1.4 million hours of audio content on-line should cost less than 100 Euros. One petabyte – a thousand terabytes – will hold 165 years worth of continuous broadcast encoded in 16 bits at 48 KHz. Storing the equivalent amount of media in analog form typically represents 2.5 million documents spread over 80 Kilometers worth of shelves! [1] In 15 years from now, that amount of content will be in our pockets, on our children’s future iPODs.

It is time for audio archive departments to recognize the drastic impact that digitization has on their mission. Whereas in the analog world, the preservation of a master copy was paramount, in a digital environment, all duplicates are masters. If one used to choose a carrier format for decades to come, in today’s digital world, carriers have become virtual. Hierarchical storage management (HSM) systems that combine hard-disk storage with tape libraries and DVD jukeboxes now make it possible to store hundreds of thousands of hours of broadcast material on-line or near on-line. In effect, carrier virtualization is the most cost-efficient option for preserving and distributing records. But, the impact of the digital wave does not limit itself to storage.

The emergence of open standards such as Broadcast Wave Format (BWF) and Metadata eXchange Format (MXF) combined with the increased quality of streaming formats such as MPEG-1 Layer 3 (MP3) are creating the conditions for a new ecosystem. Archivists need to re-consider their priorities and change their primary focus from preservation to distribution. Archive professionals need to move out of their historical habitat – the basement of most broadcast facilities – into the production space. By moving upstream in the workflow, they can ensure proper metadata collection and guarantee that search and retrieval do not become more and more expensive as archive volumes increase.

Such a migration does not happen overnight. It requires a step-by-step, pragmatic approach similar to that of a surgeon dealing with a patient in an emergency room. By first digitizing the day-to-day archival process and making sure that tomorrow’s program will not be stored on tape, archivists “stop the bleeding”. This first step improves the quality of service that archives provide to production departments (the “heartbeat”); it also frees up the resources required for “reconstructive surgery” i.e., ensuring the digital transfer and proper cataloguing of decaying media. That process can be spread over multiple years and pave the way to “re-education” and the launch of new services open to both internal and external organizations.

The need for VIRTUAL CARRIERS

A number of institutions are still considering shelved CDs as the most efficient preservation medium for audio archives. In a recent survey of 84 international archiving institutions, forty-five percent of collection owners use CD-R as the preservation format of choice and 57% still regard CDs as the appropriate media for access and distribution. [2]

Digital storage is cheaper than labor

At first sight, this may be considered as a smart economic move. CDs remain one of the cheapest digital media to date at 0,75 € a piece or 0,001 € to the MB, but their cost of maintenance – from production to retrieval – is huge. Handling CDs is a labor intensive operation. “Burning” a single CD, generating the corresponding bar coded cover and storing the resulting carrier on a shelf can be estimated as a 15-minute operation in an assembly-line type of environment. On this basis, archiving one year worth of audio recording on CDs – or 8 760 hours at 48 KHz, 16 bit – costs nearly 40 000 € of which 80% is labor. If one compares this cost to that of permanently storing the corresponding media files on-line, the adoption of CDs as carriers comes to light as being an economic absurdity. Storing 8 760 hours of audio recording requires a total of 6 Terabyte (TB). By today’s market prices the corresponding storage solution based on high-performance, redundant hard-drives costs 20 000 €. In a fully on-line environment, handling operations are limited to triggering file copies and can be considered as negligible. As a result, storing audio archives permanently on-line can be estimated as being at least 50% less expensive than storing them on CDs!


Figure 1: Archiving audio on CDs is not cost efficient

This estimate is conservative. On the one hand, it does not refer to near on-line storage systems such as tape libraries and DVD jukeboxes which provide cheaper storage than on-line drives. On the other, it does not take into consideration labor costs linked to retrieving material archived on CDs.

Digital files are a remedy against technology obsolescence

As demonstrated, adopting information technology (IT) based on-line or near on-line storage systems offers immediate cost savings. In the mid-term, it obviously implies faster retrieval of archived material: it takes less time to access a file over a network than to retrieve a carrier from a shelf. In the longer run, it enables archivists to truly take advantage of the fact that digital copies are in no way different from original digital recordings. Making sure that the archiving process does not give way to off-line archives simplifies the generation of back-up copies at alternate sites and allows for the implementation of a disaster recovery policy. What is more, it makes future migrations from one storage medium to another seamless and constitutes a pre-emptive measure to technology obsolescence.

Adopting virtual carriers is necessary because the lifespan of digital data systems is much shorter than that of analog technologies. The life expectancy of polyester-based magnetic tape maintained in a proper environment is evaluated to be 50 years; for digital optical media, it is estimated to be between 30 and 200 years; but for IT based storage, it will not exceed 5 to 10 years. [3] From that perspective, the use of files as virtual carriers constitutes a major switch of paradigm for most players of the archive world. It recognizes that the quest for the ideal audio-visual storage medium is a dead-end and that archives can no longer be defined by their physical attributes but as a “logical space independent of the production environment where records are protected from loss alteration and deterioration”. [4]

Figure 2: IT based systems such as this tape library in use at Emirates Media Inc.
constitute “virtual carriers”.

 

Archive tomorrow’s programs

The digital transfer of legacy archives is a multi-year process. Emirates Media Incorporated, for instance, started the digital transfer of its legacy archives some 18 months ago. Of a total of 60 000 hours of legacy recordings, roughly 20% have been digitized to date.

Play against the clock

The biggest challenge for broadcast organizations lies in determining the most efficient starting point for the digitization process. When faced with this situation, most archivists focus on decaying carriers and disappearing playback devices. This approach offers little benefit from a financial perspective. On the one hand, most studies show that “reformatting is […] always one of the most expensive options compared to providing the proper storage for originals to extend the usable life”. [3] On the other, this approach strictly focuses on preservation and does not consider usage patterns.

When it started its digitization project in the mid-nineties, CBC – Radio Canada decided to analyze the use it had of its archives. The resulting study showed that 50% of requests were for program content that had been broadcast in the preceding 12 months. [5] As a result, CBC – Radio Canada decided to “stop the bleeding”. It put an end to the constant addition of tapes to its archives by first digitizing the recording of its on-going broadcasts.

Accelerate Return on Investment

The major benefit of this approach is that it provides short term return on investment. It requires minimal capital expenditure, cuts archiving costs and offers immediate visibility to the digital archiving project. The archival of tomorrow’s programs requires little initial investment compared to that of ensuring the digital transfer of past recordings. As detailed previously (see section 2.1), digital on-line storage is now cheap enough to allow for the continuous logging of broadcast programs in linear, uncompressed format.

In addition, the recording of current programs can be automated. As a result, the workload of a broadcaster’s archiving department is decreased and staff can be freed up for the coming digitization campaign of legacy analog carriers. More importantly, day-to-day archiving can be used as a pilot to spearhead the overall digitization process. As pointed out by the European Broadcasting Union, [6] the digital transfer of archives implies profound change management of existing work practices; “cultural change of this scale will be an evolutionary process over several years”. Digitizing the day-to-day archiving process and generating the associated metadata constitutes the first logical step of that evolution.

Pull archives out of the basement

Abandoning tape based archiving for tomorrow’s productions allows broadcast organizations to review current work practices. Existing workflows can be optimized across departments so as to address the challenge of metadata collection. IT-based storage may provide the ideal, virtual carrier but it does not solve the challenge of indexing content.

Build alliances

Traditionally, production teams have had little concern for archiving. Tapes and analog carriers required large storage space and careful logistical procedures. As a result, these were systematically stored in the basement of broadcast facilities or in remote silos. Ease of use and access were limited. The move to a digital, on-line environment eliminates these technical constraints and allows for production teams to have direct access to archived material. News and feature producers can have immediate and around-the-clock access to existing recordings and sound libraries. As a result, they can stop maintaining their own private micro-archive islands which typically litter corridors of most broadcast organizations. When one considers that nearly a third of a typical European newscast takes advantage of archived material, [6] the impact of pulling archives out of the basement is potentially tremendous.

Although on-line access to archives may provide production staff with immediate benefits in their daily jobs, it is not sufficient to turn them into supporters and advocates of a digitization process. Archive professionals need to take advantage of the fact that they act as beholders of the collective memory and as such are valuable assets in newsgathering meetings and brainstorms about future programs. They should take advantage of it. To that effect, some suggest that archivists and cataloguers change their title to media or knowledge manager. Although nomenclature has its importance, the strategic issue is for archive professionals to move out into the production space. As such, they can be in a better position to promote the usage of the goods they preserve and advocate the benefits of modern archiving practices.

Collect metadata at the source

Building alliances with production departments is important because producers and journalists can be of great help in the archiving value chain. By properly documenting production material, they can play a crucial role in the proper indexing of archives.

Cataloguing content is time consuming. In the analog realm, properly indexing one hour of radio material typically takes three times longer. [6] A large fraction of that time is dedicated to ‘metadata safaris’ whereby cataloguers go out into the wild production realm to obtain the proper spelling of a name or the details of a location mentioned in a recorded program.

Archivists can save valuable time by promoting the merge of metadata into the production workflow. This merge should not boil down to having production staff painfully fill compulsory, pre-defined forms. Much of the information required for indexing is already collected by journalists and feature producers. In many cases, technical and descriptive metadata can be aggregated with minimal changes to daily operations.

For example, most broadcast operations have to continuously record their programs for legal purposes. Today this requirement is often ensured by dedicated logging robots which continuously record on-going broadcasts in low audio quality; once the imposed legal deadline is past, recorded material is purged. An efficient digital archive infrastructure can leverage this process to generate properly indexed archive material. In parallel to the recording, corresponding broadcast logs, line-ups and scripts used to produce the programs can be collected. In most cases, this metadata track already exists in electronic format; it can be combined with the continuous digital recording so as to generate time code markers. As an extension, speech-to-text engines can be deployed in order to automatically generate a transcript which can be associated to the original audio track so as to enable full-text searches. [7] Whichever option is chosen, the resulting time-code based markers can serve as indexes for navigation and future extractions.

Figure 3: Metadata gathering needs to be merged into the production workflow.

Ensure the digital transfer
of decaying media

By first digitizing present and future programs, archive professionals free up the resources required to ensure the digital transfer of decaying media. By improving the quality of service offered to other departments, they ensure that they will obtain the internal political support they need to secure the investment funds that a digitization campaign requires.

Ride the wave

The need for qualified professionals is often underestimated. Experience shows that the proper handling of legacy carriers often turns out to be trickier than expected. [8] On the contrary, the investment required for storage infrastructure tends to be initially over-evaluated.

The digitization of material recorded on analog carriers takes time. Yet many digital archiving project teams consider the deployment of a large scale storage system as a preliminary requirement to any other task. Although this approach is often the consequence of budgetary constraints, it is not financially sound. Since 1997, raw storage prices have declined 50 to 60% per year. [9] Hard drive “capacity is doubling at a frantic 100 percent per year. This means that the capacity (for a single hard drive) is likely to be […] 2,56 terabytes in 2007.” [10] In parallel, local area data networking solutions are becoming easier to deploy and Network Attached Storage (NAS) systems are turning into plug-and-play appliances. Digital archivists need to ride this wave and adopt a step-by-step, incremental storage policy. Ideally, capacity should be acquired on-demand; less than capital expenditure, storage should be considered as an operating cost.

Turn archives into assets

The digitization of legacy archives is expensive. The European Preservation Technology for European Archives (PRESTO) project recommends that “the value of an item be more than four times the preservation cost in order to be financially justified on a commercial basis. […] For most broadcast archive material, this condition can easily be met as one minute of sold or re-used archive material will pay for the preservation of one hour of archive material”. [11] Lowering the cost per use so as to optimize the number of times a recording is re-purposed should be the priority. To turn archives into such assets, archivists need to focus not only on the descriptive metadata which is the key to future access but also on the associated copyright information.

Figure 4: Assets are more than just archived recordings.

Although most broadcasters make a financial reserve for the rights they might have to pay for the re-use of historical material, the re-issuing of past recordings is too often halted because of a lack of information regarding copyright. So as to put an end to this type of scenario, the ARD public broadcaster in Germany has linked its archive and rights management databases. A basic color code (green, orange or red) lets users know if a recording can be used with no limitation, if a detailed inquiry is necessary or if no broadcasting rights are available. The EBU Project for Future Radio Archives (P/FRA) goes one step further; it considers “Rights” to be one of the fifteen core fields of the metadata scheme it promotes for radio archives. [12]

Promote and distribute assets

Converting legacy archives to digital assets constitutes a heavy financial burden. Distribution and promotion are the keys to revenue and long-term return on investment. Providing on-line, self-service access to producers and adopting a “build and they will come” approach is not sufficient. Archivists must partner with program makers to seek and promote new distribution channels.

Create new broadcast channels

Although some predict the end of traditional broadcast in favor of a stock model whereby content is strictly consulted on-demand, creating programs that leverage archived recordings can be a key factor to changing consumption habits. The digitization of video archives at the RAI in Italy resulted in an 85% increase of hours of archive material used on-air. Although this success was partially due to the improved service provided to production departments, it also resulted from the launch of several digital television channels dedicated to historical recordings. It is true that until digital radio platforms such as DAB or DRM come of age and simplify the multiplication of broadcast channels, such a radical approach is difficult to implement for audio archives. Specific slots can nonetheless be developed within the existing program grids. [6]

Move to content distribution networks

Archivists can also explore the opportunity to build their own content distribution networks. These can take the form of CD compilations. For example, Radio Netherlands Music has encountered greater than expected success with a series of Royal Concertgebouw Orchestra concerts directed by renowned conductors. The internet provides an even more powerful platform for both consumer to consumer and business to business distribution. National Public Radio (NPR) set up an agreement with audible.com to sell MP3 versions of its most successful programs. In parallel, it launched the Public Radio Exchange (www.prx.org) to provide an online service for peer-review and digital distribution of public radio programming. This website allows for station program directors to search for new programs and directly download corresponding recordings. Such services result in increased distribution of programs and lower cost per use of existing recordings.

Conclusion

Recognizing that today’s broadcast is tomorrow’s archive, improving the quality of service provided to production departments, launching a digitization campaign of legacy archives, promoting the resulting assets and then distributing them; these are the five pillars of a pragmatic archiving strategy. Revisiting archives from a digital media asset management perspective is a cost saving exercise for broadcast organizations.

The financial benefits of digital storage are obvious. If one considers that tape typically costs 25 € for 730 meters and that archiving tape speed is 0,38 m per second, then the cost of tape required for archiving a year’s worth of continuous recording in analog form is 410 400 €. In other words, tape is twenty-one times more expensive that on-line storage! The benefits do not stop there. Sharing files across the network facilitates access to content and reduces administrative costs. More importantly, the use of a scalable and open digital asset management system can guarantee that archives become the back-bone of the production workflow. Metadata can hence be aggregated at every step of the lifecycle of a recording. Indexing can be improved and content easier to find and re-purpose. Audio archivists have to learn new skill-sets and become digitally literate. They also have the unique opportunity to re-define their role within broadcast organizations.

Acknowledgements

The authors wish to thank Mat Hans, Eric Richardson, Rich D’Angelo, Alexis Rowell and Anne-Marie Swift for their careful reading of this paper and for their suggestions. In addition, they wish to give a special mention to the Google team for providing a fabulous tool for searching the world-wide web on-line archives.

References

INA Official Website (www.ina.fr).

“Survey: Dams & Digitization Preparedness”, John Spence, ABC Sound Archives, IASA Conference, Aarhus, 2002.

“Preservation Reformatting: Digital Technology vs. Analog Technology”, Steven Puglia, US National Archives and Records Administration, 18th Annual Preservation Conference, March 2003.

“Preservation of Electronic Records”, Charles Dollar, National Association of Government Archives and Records Administrators, 1999.

“Overview of the CBC Radio Digital Archiving System”, Tom Holden, Canadian Broadcasting Corporation, SMPTE 143rd Technical Conference and Exhibition, November 2001.

“Archives in Digital Broadcasting”, EBU Archive Report, European Broadcasting Union, September 2003.

“SpeechBot, the first internet site for content-based indexing of streaming spoken audio”, Eileen Quinn, Compaq Computer Corporation.

“Audio and Video Preservation Reformatting: A Library of Congress Perspective”, Carl Fleischhauer, Library of Congress, 18th Annual Preservation Conference, March 2003.

“The evolution of storage systems”, R.J.T. Morris and B.J. Truskowski, IBM Systems Journal, Vol. 42, N° 2, 2003.

“Hard disks and media PC”, John C. Dvorak, PC Magazine, November 2003.

“Archive preservation and exploitation requirements”, Preservation Technologies for European Broadcast Archives, PRESTO IST-1999-20013, June 2001.

“EBU Core metadata set for Radio archives”, European Broadcasting Union, Tech 3293, December 2001.

_______________________________

SPEAKER BIOS

Nicolas Hans is Director of Product Strategy for Dalet, with offices in Paris, France. He can be contacted at nhans@dalet.com. The corporate website is at www.dalet.com.

Johan de Koster is Head of News Production at Radio Netherlands. He can be contacted at johan.dekoster@rnw.nl.

Katherine Straub is currently based in New York as Worldwide Director of Training for Dalet Digital Media Systems. She is a longtime member of AMIA. She was formerly the news librarian for the TV station CHCH in Canada, and has worked for major broadcast software companies, in addition to serving as an independent international broadcasting consultant.