 |
| Session | Linear Uncompressed Video Archiving on High Performance Computer Tapes |
| Presenter |
Franz Pavuza and Julia Ahamer
Phonogrammarchiv, Austrian Academy of Science
|
 |
|
 |
ABSTRACT
A large percentage of video footage recorded during the last four decades used analogue signal representation. High-quality archiving of this material is possible - even under the restrictions set by the limited budgets of small archival institutions - by applying accurate digitisation and storage in a linear, uncompressed form.
Magnetic tape proves to be a viable target storage medium at present and for the foreseeable future, especially when high-performance computer tapes are used that have been designed for reliable and fast data storage for critical applications requiring high data security.
Our paper gives a survey of our PC-based archival system and explains the qualitative, technical and financial aspects that led to the system configuration. It discusses the advantages and the drawbacks for the archival process and explains the current workflow and the different forms of access for the user. Finally it offers guidelines for future additions and possible adaptations to upcoming standard
_______________________________
PRESENTATION
Summary
Video Archiving requires careful considerations concerning both technical and budgetary aspects to preserve the valuable footage by applying adequate methods in order to maintain the best possible status of the carrier media and their contents. Today’s hardware technology allows transfer of analog source material to the digital domain and subsequent uncompressed storage on high quality computer tapes, originally designed for backup tasks, while staying within acceptable financial limits.
This paper describes the structure of the newly established video archiving system of the Phonogrammarchiv based on these principles. It outlines the philosophy behind the concept that rests on the mainly audio-related roots of the archive, and the necessary modifications for the video archiving procedures. Technical guidelines partly influenced by budgetary boundaries and their consequences to the layout of the system are discussed.
It is shown that a solution based on modules of the personal computer (PC) technology can be found which is suitable especially for small or medium sized archives.
Introduction
Audio archiving of scientific materials at the Phonogrammarchiv started back in 1899. One hundred years later an internal evaluation recommended the inclusion of scientific video materials into our archive’s agenda.
The condition of the footage at the academic community – i.e. universities and museums – ranges from well maintained material to poorly stored media that rapidly deteriorate and are in acute danger to completely disintegrate in the near future. This holds especially for analog source material recorded two or three decades ago.
Traditionally, the archiving process consists of two stages:
First, an accurate transfer of the content to a technically equal but newer and more stable carrier while preserving the technical nature of the content, using the best available hardware to minimise the signal deterioration usually connected with the transfer of the content.
Second, a careful preservation of both the new carrier containing the information and the original carrier as well, if possible. The last two decades brought a significant change in the philosophy of electronic archiving by introducing the transfer to the digital domain, thereby accepting an inherent but very small loss at the initial digitisation process for the benefit of lossless subsequent transfers.
In the field of video, both the digitisation process and the storage of the digital data are real challenges for hard- and software engineers since the necessary accuracy and the high data rate push the converters and the storage media to their limits, the latter with respect to both the data transfer rate and the storage capacity. As a consequence, compression algorithms to lower these demands have been developed and introduced to the professional and consumer market with great success as they reduce data rates and storage requirements to a fraction of the original value. These benefits come along with some drawbacks, admittedly negligible for many applications. However, in our opinion only the uncompressed way for analog source material is adequate for archival purposes and it can be taken even with reasonably small financial means.
The source material
For the academic community video production history is linked with the classical U-Matic tape recorder based on the composite signal representation. The rugged layout of these recorders guaranteed a long lifetime, exceeding those of modern consumer equipment by an order of magnitude. Consequently – and fortunately for video archivists – quite a lot of these recorders are still functional and can be reliably operated with the tapes recorded then, thus providing one of the major sources of analog footage.
The era of relatively low cost home video recording created a substantial amount of VHS and S-VHS material though the camcorders could not compete with the DV equipment of today in picture quality, and even less in costs. The Hi-8 camcorder, being reasonably priced and very portable, extended the fields of application to the lower academic levels, while the top quality applications (e.g. in the medical area) welcomed the introduction of the Betacam system. Fig.1 shows the result of a survey of the video footage among the Austrian academic community a few years ago.

Fig.1: Archival footage at the Austrian academic community (2001).
The total footage for older analog material certainly did not change during the last few years. On the other hand the DV era led to a boom of new material, from which only a fraction will eventually be archived. Nevertheless, the DV bar in our diagram will very likely now be far beyond the 1000 hour mark, with a few hundred hours of valuable footage to be added every year.
Analog or digital storage ?
The very first contemplations about possible ways of video archiving did include the possibility of analog storage on a high level analog system (Betacam SP), which had proven reliable, was widely accepted and performed very well. Later on, developments concerning the hardware, especially the analog to digital conversion systems and the storage devices, focussed our interest to the digital target media. Digi-Beta seemed to be the right choice, for its very moderate compression provides excellent picture quality essential for maintaining as much original content as possible. Digi-Beta, though substantially cheaper than D1 and D5, still was an expensive choice, not only because of the initial hardware costs but also because of the permanent expenses for the tapes. But at that point – despite the high costs – Digi-Beta was the system of choice.
Compression topics
The only major technical concern about the Digi-Beta system referred to the internal data compression that modified the digitised signal, though in an almost negligible way. There was good hope that nobody would be able to directly recognize any changes – e.g. by observing the recorded signal through an excellent monitor.
Still the question remained whether an analysis of single frames in deeper detail by future video processing software for scientific investigations would not be obstructed by compression artefacts. This question was spurred by experiences gained in the field of digital audio, where the extraction of signal characteristics of original analog sources was possible only when first applying accurate digitisation and subsequently processing the uncompressed data stream.
Though the equivalent video processing algorithms are – due to the substantially higher software requirements and the more complex mathematical background – still to be developed, archivists already have to consider the consequences for such future applications. This results in efforts to avoid even small alterations possibly induced when using lossy compression algorithms.
Though the core of our concerns refers to future data extraction software, doubts were fuelled by visual tests and picture quality comparisons of systems with higher compression ratios, though not by orders of magnitude. For instance, medical videos showing heart activities revealed visible artefacts at ratios of about 1:5 (Digi-Beta has 1:2).
An interesting example from the world of sports with compressed video streams at the DV level (ratio 1:5, with an additional loss of colour data already during the digitisation process) showed the passing of a basketball between two players. A careful investigation of the specific spin of the ball was more difficult than with the same video sequence that was recorded in an uncompressed way. These examples led to the assumption that even the very slight compression used in Digi-Beta systems might be an obstacle for scientific processing software of the future.
Therefore it came as a pleasant surprise for our archive when hardware became available that allowed uncompressed recording of analog source material. And even more so because this hardware could be implemented into a modern, off-the-shelf standard PC equipped with a high performance hard disk array ready to cope with the enormous data rates associated with uncompressed video data streams.
The costs are still high when compared to a standard PC loaded with consumer software for DV editing. While even good quality components for the PC would not drive the costs above a few thousand dollars, the capture card, the accompanying “breakout box” for the analog interfaces and a reasonably fast and large hard disk array require much higher investments.
Still, the costs for this solution are far below the expenses of a D1-based or a Digi-Beta system, and have the additional advantage of being within the IT world by proper selection of the final file format.
The storage medium
The use of magnetic tapes of very high reliability as the long-time storage media may appear questionable, considering the alternatives. Optical disks offer faster access and better cost-to-storage space ratio due to their popularity on the consumer market. However, their long-time stability still gives reason for concerns. Until now there are no guarantees given by the manufacturers about the readability of the data on the disks over a well specified period of time. But this issue is vital for archivists because they have to know exactly the point of time when to transfer the valuable data to a new disk (or other, meanwhile developed media).
Hard disk arrays have reached an astonishingly high level of reliability, especially when selected from types designed for server applications (24-hour operation, 365 days/year). Access times are negligible compared to other media, but permanent online access for each and every archived item is usually not required, making this advantage appear less important in our eyes. Costs for larger archives containing the complete material on hard disk arrays are very high.
This leaves the magnetic tape for long-time storage of video data. The handling procedures have been developed to perfection over half a century, and the knowledge about best storage conditions far exceeds experiences with newer media. The slow access is acceptable for offline operations, and while data transfer rates still cannot compete with fast hard disks, they are already very comparable with optical disks when high performance computer tapes designed for backup purposes are selected.
At our institute all three types of storage media are used, depending on the application. Users’ or authors’ copies, previously recorded on tape, are increasingly delivered on CD or DVD, depending on the equipment of the customer. In accordance with the used media these copies contain compressed material. Selected parts of the archived material will be stored on the video server, for access over the internet (low level browsing, MPEG1 quality level), and medium level research and more extensive browsing at the archive (MPEG4 compressed material).
The long-term storage material is put on LTO computer tape, well known within the IT community. Designed and manufactured by global players and well defined for 3 generations it guarantees compatibility over 2 generations (for reading the content). Fortunately, prices dropped substantially during the last two years and encouraged the archive to generate one more copy for long-time archiving, thus increasing the safety of the valuable footage.
This tape theoretically offers information about the status of the medium (tape and cassette) by monitoring access and other operational parameters. Unfortunately, until now no further information about this particular property has been released, but the data, if accessible in the future by the user, would be of importance to determine the point of time when – at the end of the lifespan of the tape – a transfer of data to a new tape is advisable.
At this time, analog sources are stored in an uncompressed way using a proprietary format as a temporary solution, selected to safeguard endangered material. Already compressed digital video source material is stored "as is", without any modification.
Lossless compression based on MJPEG2000 may be considered for the future when the video data stream is to be included into the final file format as the "essence", together with metadata.
Archival requirements concerning the accurate documentation of the material through metadata are – together with financial aspects – the main reason for a decision in favour of an IT based solution, rather than a technically suitable (but otherwise less flexible) implementation using uncompressed video archiving by streaming the video data to D1, D5 or Digi-Beta tapes.
The "ideal" file format
For the archivist the ideal file format allows a flexible inclusion of the essence (the video stream itself) into a file that usually is referred to as a "container" because it also consists of a section filled with metadata. The format is open for future enhancements, and also accepts variable sizes of both the essence and the metadata partition. It is independent of platform and operating system for easy data exchange, and it is not proprietary. For applications in the video field it is of utmost priority that this format has passed the tests and standardising procedures of the two major societies linking the video community: SMPTE (Society of Motion Picture and Television Engineers) for North America, and EBU (European Broadcasting Union) for the European Community. The broad support by manufacturers will in all likelihood follow these procedures.
The MXF file format (Material Exchange Format), a subset of the well-known AAF (Advanced Authoring Format), is certainly a good choice, and the development on the market within the last year seems to strengthen this opinion. It will take quite a while until the support covers all the different video systems on the market. It is no surprise that the first applications have been developed for the broadcasting industry, particularly for the news-gathering departments, currently based on IMX, DVCPro and DVCAM systems. As soon as the description for the inclusion of consumer DV-, MPEG2- and uncompressed signal representation formats has been published by the MXF-consortium corresponding applications will follow.
The structure of the video archiving system
Having outlined the major aspects of capturing and storing, we now add a short description of our archiving system.

Fig.2: Video archiving system of the Phonogrammarchiv (Schematic)
The core is a modern PC, supplemented by a capture card accepting analog sources over a breakout box. Analog sources, mainly U-Matic, S-VHS, Hi-8 and Betacam players are combined in a rack and feed the breakout box.

Fig.3. Video archiving system (part 1)

Fig.4. Video archiving system (part 2)
Digital sources are captured either over the onboard IEEE1394 interface (DV material) or a digital I/O card (SDI). The capture card controls its own hard disk array of currently 550 GB capacity, enough to process more than six hours of uncompressed material or 25 hours of DV material (now being the major source of new footage from field research).
The editing software – used to process the material because of the lack of dedicated video archiving software – enables the collecting and (virtual) segmenting of the raw video stream, controlled over both the data- and the separate video monitor.

Fig.5: Screenshot of editing software
The processed video data is written to the LTO tape using conventional backup software. For easy retrieval the virtual segmentation is maintained and no rendering applied. This will simplify the use of the possibilities of the MXF format when available.
The video server will be implemented within the next months, the first browsing station already has been implemented and extended by hardware and software for the production of users’ and authors’ copies and educative material.
Taking a closer look at the cost breakdown shows that the capture station will require about $ 30-40k depending on the size of the external hard disk array (the capture card alone will contribute to that sum with $ 12k). The rest of the total costs of the station consist of the PC, its system disk, the memory and the I/O cards, the monitors and the sound system.
It depends on the luck and the skills of the management how much money has to be spent on the analog players, namely the ones that can only be acquired as used modules since their production stopped long ago. The Phonogrammarchiv could manage to buy well-serviced equipment from the Austrian Broadcasting Company and the University of Technology, and so the costs were limited to about $ 20k. The digital players (a combined Digi-Beta/IMX player with analog playback compatibility, and two DV players) added another $ 17k.
The costs for the video server will strongly depend on the capacity, about $ 40k are a reasonable estimated value. The browsing stations cost between $ 4k and 6k, depending mainly on the size of the hard disks.
Additional test hardware has been bought to monitor the status especially of the older analog players. The Phonogrammarchiv got hold of an infrequently used test system, originally priced around $ 80k, for a fraction of this sum.
Workflow issues
When starting with video archiving we were inclined to make use of procedures already tested in the audio field. However, the nature of the footage, particularly the typical file sizes of video material (compared to audio files of equal duration) required some changes of our approach. The basic philosophy underlying the archival process has been maintained – that is, storing the signal in its technically best possible form and adding documentary material in a separate data base; but some procedures have been modified. Intelligent time management is even more important than in audio archiving, different archival tasks have to run concurrently to make use of lengthy transfer times. These initial experiences suggested the acquisition of a second capture station even for archival tasks done by a single operator. The production of users’ and authors’ copies has been outsourced to a slightly upgraded browsing station in order to avoid blocking the capture station.
Markers – special tools to point out specific positions along an archived audio file – will be replaced for video material by keyframes, preferably combined with rapid access to essential points on the timeline that contain key activities of the content.
Further steps, possible enhancements
To maintain the highest possible quality it is mandatory for the archive to compile a set of basic internal test procedures to constantly monitor the status of the system. A second project will try to access the analog signal components of the U-Matic players directly at the output of the video-head amplifiers and before they are combined by the internal electronic circuits to form the composite output signal. Thus we hope to avoid distortions of the signal inferred by the original circuits.
The previously mentioned status information hidden on the LTO tape or on an internal chip of the cassettes is of high interest to us as a major step for the development of refreshing cycles for the tapes.
We are closely watching the developments on the analog/digital converter market and will upgrade the input modules for the analog sources when affordable solutions are offered on the market.
We will have to adapt to the recently defined consumer High-Definition material, though we strongly hope that the majority of the incoming material will be of standard definition at least for the next five years.
- We also would appreciate the development of dedicated video archiving software.
And finally, video archiving would not make any sense without having in mind an optimised user interface, not only by means of convenient browsing stations but also by tools that further enhance the access, especially for scientific purposes. This would include intelligent data mining and tools for careful restoration procedures, leaving the original material unharmed but offering a new or extended approach to specific properties of the footage and better support for scientific exploitation.
Conclusion
We have shown that uncompressed archiving of valuable analog video footage can be done with reasonable expenses, using off-the-shelf personal computers with additional dedicated hardware. The final storage medium can still be the magnetic tape if modern developments introduced for IT tapes with high demands on durability, reliability and fast data transfer are considered. The versatile new file format MXF will guarantee compatibility and simple data exchange for many years.
__________________________________
SPEAKER BIO
Julia Ahamer
Julia Ahamer currently works at the Phonogrammarchiv of the Austrian Academy of Sciences in Vienna. She is a scholar of African Studies specialising in the Chadic language Hausa and currently working on her PhD thesis about contemporary Hausa literature. She has worked as a network administrator and webmaster for some time, and is now an archivist at the Phonogrammarchiv. She started her training at the Archive by audio archiving African Language recordings. Since then she has focussed on video archiving and in this capacity is responsible for the videography workflow at their new video archive.