Session Long-Term Storage of Video in the Digital World

Panel Coordinator

Panelists

Jim Wheeler

Ian Gilmour
ScreenSound Australia

Jim Wheeler
Tape Restoration and Archival Services

Jim Lindner
Media Matters

ABSTRACT

It is now possible to store over two hours of uncompressed high-quality video on a single medium and at a reasonable cost. This savings in space and cost means that Hard Disk Drives or DataTapes are the answer to our search for an archival digital medium for video.

What are the pros and cons of these new technologies as archival media for video?

The speakers will discuss these new concepts for storage of digital video, the costs and densities of the Hard Disk Drives or DataTapes that are on the market now, and the advantages of storing video in data formats rather than conventional video formats.

Panel Speakers:
Ian Gilmour
ScreenSound Australia
DataTapes

Jim Wheeler
Tape Restoration and Archival Services
Archiving on Hard Disk Drives

Jim Lindner
Media Matters
New Ways to Migrate Video

_______________________________

PRESENTATION TRANSCRIPT: LINDNER

We're going to be talking in this session about video as data. My session focuses not so much the "how" but the "what" of storage. Jim Wheeler and Ian Gilmore will be talking more about the what. I'll be talking about the results of some research that Media Matters did for the Dance Heritage Coalition. This project was a digital video preservation reformatting project. We did some research on compression formats, and that's what my presentation today is about. I want to thank the Mellon Foundation who funded the Dance Heritage Coalition, thus enabling us to do this research as well as the preservation work itself. And I also want to thank is Carl Fleischauer, who is in the audience, and who is a continual inspiration. If it weren't for him, we wouldn't have been involved in the first place. And Carl is the one who helps me out whenever I get in a pickle.

I'd like to talk a little bit about first the impact of compression on video. There are all sorts of technical issues relating to compression but what's really important from an archival point of view is that the artifacts that are created in compression actually become part of the piece. It's something people don't really talk about much. When we're thinking about compression, usually we're thinking about bandwidth or bit rate - those sort of things, and we don't talk about some of the larger issues.

One doesn't normally think of dance as being on the cutting edge of technology; they're usually just struggling to survive. The reason we started this work with the Dance Heritage Coalition is because their archives are mostly on motion picture and video - moving image media. There are various forms of notation for dance, but the notation doesn't relate very much to performance. The critical documentation is recorded on film and video. And so for this "financially challenged" dance community, how we preserve dance is very important. People in the dance community are struggling; they're desperate for answers - "What do we do? Do we put everything on DV?" They came to us with questions about formats and conversions and compression and cost factors.

And we found we didn't have any real information, any research that specified one method or format, or suggested the proper terms for a decision making process. Every format has strengths and weaknesses, and there is usually the issue of compression in the reformatting scenario. No matter what type of compression you're talking about there is no free lunch -- it's fair to acknowledge that up front. As with anything else, compression offers only trade-offs.

In order to address the dance community's questions, we needed to examine those trade-offs, to determine what was acceptable, to in some sense quantify them in order to be able to apply that knowledge to a decision making process. We needed to look at those trade-offs from the archival point of view. We wanted to understand what would be on one hand minimally acceptable and on another hand, what would offer the best possible solution. We went into many issues in great depth, and ultimately, what I'm going to present today, which is based on that research, is a case study for determining how to select a video preservation file format.

I'm not going to be talking much about file wrappers, but I can tell you that we determined from the beginning that, by definition, we are looking at AAF and MXF. They both capture metadata satisfactorily, so we consider that file wrapper format is not a significant issue. That being the case, the metadata issue becomes the quality of your data itself. I have been thinking recently about some of the AAF and the MXF features and extensions that allow you to edit extensively inside the file. I don't know anyone who has. For instance, it would be interesting to hear about someone who's made an AAF file, done a thousand edits in that file, and attempts to reconstitute the original essence - to compare and assess whether there have been any changes. I don't know whether that's been done. But for the Dance Heritage Coalition Project, we did not test that at all.

So what are the objectives of our test? We looked at three basic areas; quality, usability, and preservability. I want to give you an idea first of the kind of material we were working with. [Ref: demonstration image] You'll notice that there are a lot of lines in the image - that's video interlace, not a compression artifact. That's an artifact of actually showing this here today.

We are really looking at quality from a technical point of view, and we are looking at real-world examples, not test charts of signals. We are looking at the characteristics of picture and sound quality, including resolution, chroma bandwidth, luminance -- the different criteria we use to make a before-and-after comparison. What did it look like before? What did it look like after? We decided that a copy will pass the quality test if the measurement of these elements shows little or no diminishment or degradation when compared to the measurement of the original. The quality criteria objective, if you will is to make sure that what we end up with shares the same characteristics - technically speaking, the same resolution, chroma, bandwith and luminance - as what we start out with.

And the techniques we use to affect this transfer have to be affordable. For instance, there are all sorts of esoteric encoding schemes that may be very effective in one context or another, but that are not affordable by a dance community. We've decided that it had to be possible to edit the new copy, and that the new copy had to retain any innate information that supports any kind of search engine. Since we believe that the new copy will have to work in an environment where HD will be ubiquitous, we decided that one of the objectives is that the new copy has to be able to be output to HD, it has to have "upward mobility," in the sense that it should be migratable to a higher resolution context.

And the new copy must permit tape to film transfer. Some of this material will be used again - projected - in the context of a live dance performance. So the ability for dance to use material in performance again is very important, and we don't want to limit its utility in those possible contexts.

Finally, we specified that the new copy should have the characteristic of preservability. That's a word we don't have in the dictionary yet. The idea of the preservability is that the end product, the new copy must be migratable and must avoid technical protection such as encryption. The format must be open source, public, well documented, and it should require little or nothing in the way of license fees (which gets us back to "affordable").

The result of this research I am presenting today is a 150 page report, and so you're only getting a very superficial look at the data we produced in the short period we have here today. I'm going to move through this very quickly, but you'll be able to read the report and get into all of this more deeply. So what was the essence of our research project? In conjunction with the Dance Heritage Coalition, we selected twenty-two clips for analysis. The clips were generally in excess of one thousand frames, and were selected for both technical and aesthetic criteria. We prepared these clips from the original tapes for analysis.

This slide shows the process. [REF: PowerPoint slide]

You have the original tapes on different formats -- Betacam SP, Hi-8, DV, and so on. DV was interesting because we did cross—transcoding, which is a nightmare. We then took the product of that exercise and tried to calibrate it using some of the Samma analysis tools, to try to find equivalencies in terms of video levels. For instance, we didn't want a video level registering one hundred ten percent, because that would necessarily cause a codec problem. So we of necessity had to impose a certain equivalence, to modulate the video playback in the transfer process so that it would fall within legal video range. These twenty-two samples were thus transferred to Betacam SP. We did not put them on Digital Betacam because we didn't want to compound the compression error — that is, we needed to avoid combining the compression programs being tested in playback with the native, proprietary compression system of the proprietary Digital Betacam. We then converted the clips to uncompressed AVI files, and we compressed each clip using six different codecs. Finally, we played back the migrated, digitized, compressed data files of these twenty-two clips and analyzed each of those variations using fourteen different metrics. This testing process thus requires no less than 1848 specific measurements, and this required more than a year to complete.

Let's talk about the clips. I'm not going to go over them in detail, but I'll show you a few samples to give you an idea of the range and types of material that was analyzed.

Perhaps the most important thing is to look at the original formats. The first is Hi-8. Then DV-cam. Then 3/4" U-matic. [REF: images from each format]. We did not try to bias the test in terms of format - say only demonstrating the up market alternatives such as Betacam SP or Digital Betacam. We tried to get a wide range of quality and we even included VHS.

In terms of compression, we applied six different types. For example, MOV files were submitted to Sorenson Video-3. I can't go into all the details here, but you can consult the report and learn about the full complement.

Here are the criteria for MPEG IV. You'll notice the bit rate what we believe to be reasonable considering the task at hand. We wanted to keep things relatively equivalent, insofar as that was possible. Windows Media, Real Media. And as you probably know, there's a relationship between Real Media, Windows Media and MPEG IV. There's a lot of politics involved in all this, but MPEG IV and Windows Media have a particular relationship with one another. MPEG II at twenty megabit. We also did MPEG II at fifteen megabit. And JPEG 2000. JPEG 2000 was our lossless compression type for this test. One of the main things that we wanted to test with JPEG 2000 lossless was whether it was lossless, because we in fact didn't know. We'd been told it was lossless, but those of you who work with Avid Editing Systems have been told that too, but in fact, that system is not lossless. So we wanted to verify that what has been sold as lossless is in fact lossless. And indeed JPEG 2000 was lossless; I'm not going to show you any of the graphs on that test because the results are the identity set. What went into compression is what came out of compression. And because of this, the JPG2000 lossless samples actually proved to be a challenge for our analysis tools.

There are devices available - one made by Techtronics, for instance - that will allow you to do an analysis of a system, encoder, decoder, vtr, etc. . And you can test material from tape or hard disk or whatever source you need to analyze by passing a standard reference - a test pattern or other standard analytical content - through each of the components of the system and measuring the signal. We felt these systems didn't speak to the issues at hand, the issues of real-world material in an archival context.

We wanted to work on two levels: the technical level and the perceptual quality level. We chose one tool which was the only analytical software tool for video we were able to find, made by [get data from PowerPoint slide or JL], a small Japanese company. Their system has has a number of different metrics. Probably the most important metric is the MOS -- the mean opinion score. The MOS is a simulation based upon real world testing of different criteria, and a weighting of it as a derivation of the responses of a subjective audience. Data for the MOS is generated by showing audiences different sample material and asking them to rate the samples with slider switches. The compiled information was reduced to an algorithm that models audience response. The MOS is sort of mean opinion of overall perceptual quality. So MOS was perhaps the pre-eminent metric of the fourteen characteristics we measured. We looked at metrics that take into account the image content. We looked at blurriness and blocking. We looked at video quality metrics. [REF: PowerPoint slide] In terms of technical metrics, the analyses can be grouped into two categories: spatial and temporal. Spatial metrics include effects such as blockiness. Temporal metrics are mainly directed at the instabilities produced by incorrect interframe movements, what we call "jerkiness." To complete the spatial-temporal metrics of our test program, we looked at several million individual frames. There are advantages to both referenced and non-referenced metrics, but in this study we concentrated on non-referenced metrics because we needed to measure the real-world test footage against itself - before and after compression. We looked at fidelity metrics, and at measurements that quantify the mathematical differences between samples. We looked at the spatial-temporal metrics related to ANSI and its standards matrix.

Here are some of the individual criteria we looked at, and these were the ones that I found most telling. Simpler aspects of the image -- jerkiness, blockiness -- stand out assertively when you're looking at different compression. Some of the other defects we tracked include blur, noise, ringing, color saturation. MOS was extremely revealing as an analytical tool, as I mentioned.

I'm going to now show you that clip again, and you've seen it once before, so I'm going to ask you to concentrate on it. Understand that you're looking at it through PowerPoint. [REF: clip from PowerPoint] Here's what you're about to see. Beginning with and an AVI file, the clip was compressed by the Windows Media 9 encoder using a middle setting. It's playing back on my laptop through this data projector, so a lot of the artifacts you're seeing relate not to the compression but to the system that we're playing it through. I'd like you to look at a couple of things in particular. Look at details such as edges along the humans as they are moving quickly. These are characteristic areas where compression algorithms fail. Notice that in the middle of the clip, we go to a close-up. You can see two windows on each side where they are actually using the video in the dance production. Those of you who work with compression would know from a technical point of view why we would pick a clip like this: an overall dark area and a blown out region in the middle where all of the action is taking place. The codec has a lot of work to do to try to keep up with all that. Let's look at some of the results. Let's look at MOS first. And here we're comparing two different codecs. This is a Sorenson V-III versus MPEG IV. [REF: PowerPoint slide] In this particular graph, higher is better, and five would be the best quality. You can see MPEG IV is lower than that. I'd like to draw your attention to one particular segment: the close-up. You'll notice that the Sorenson Codec didn't compress this segment much. And the image quality oscillated back and forth within a rather narrow band, and the results were not particularly good. But if you look at MP IV, you'll see that there are several spikes along here. Those are probably the base frames. The interpolated frames have generally lower quality than the base frames.

The MOS analysis of the Sorenson compression indicates consistently better performance in a tighter range than the MP IV clip. However, while averaging lower overall, at times the MP IV produced moments of extremely high quality.

By contrast, the Sorenson delivered a more even and better level of quality although clearly the results were not outstanding. Moments of greatness doesn't necessarily equate with an overall enhanced viewing experience. In fact, the inconsistency produced by this compression can be quite distracting. You have a base level of quality which improves and reverts in a pattern which draws attention to itself and to the lower end of the quality spectrum when it appears on screen.

One of the things we learned by this study is that it's very difficult to forecast how program content will respond to compression. In the same clip, all along we have blockiness. The problem occurs in MPEG II, perhaps less seriously.

This is MPEG II versus MPEG IV. [REF: PowerPoint clip]. All of a sudden, the quality level of MPEG II at the close-up takes a dive.

In this particular graph, we've tracked percent of distortion. MPEG II performs better, and Sorenson actually got dramatically worse. The MOV was looking consistently good; you don't see too much oscillation. Pretty consistent results all of the way across and a very nice tight band. And then all of a sudden everything sort of goes to hell up here. Okay. And by contrast, the MPEG II actually got better. I can't tell you why that happened but I can tell you that it did happen.

What did we learn from that? It's almost impossible to predict codec performance in advance. There was little consistency in codec performance from clip to clip or within a clip. Sometimes performance is divergent and other times the performance is uniformly bad.

Here is Real Media versus Windows Media. Now these are two arch-rival companies. And here you can see Windows Media and Real Media almost shadow each other exactly. And when performance drops off, it is a very substantial decline. Neither of them are ‘stellar' performers. So sometimes you have a codec which performs well until it encounters an anomaly in the datastream that causes the compression algorithm to perform very sub-optimally, and then it returns to normal performance when the signal returns to what it recognizes as a stable source. There is no way to forecast that. I call that "smooth sailing followed by disaster."

This chart is for the second example [REF: PowerPoint slide], a different clip from the first demonstration. This is represents Windows Media versus MPEG IV, which is interesting because they are variations of the same basic compression engine. And you can see that at the beginning they were comparable -- clearly neither codec was doing a very good job at the beginning of the clip. I suspect that's because of the quality of the initial frames. But as the clip progresses, you can also see that both of them totally fall apart towards the end of the clip. We had those kinds of results in many, many, many of the clips.

We also had clips of better quality with large variations in quality; we felt this variation was actually worse in the sense of being more distracting than a lower-scored but more even performance. In this set of examples [REF: PowerPoint slide], you see wild oscillations in quality versus consistent, lower-quality performance. And whether the eye is drawn to these wild oscillations or not, whether subjective viewing finds the inconsistent or the lower-quality performer easier to watch -- in either case, it seems clear from an archival point of view that this kind of performance is not acceptable.

There was no clear leader over a wide variety of material. Each codec had its own problems and we found that bit rate was not a good overall predictor of quality. That surprised me because I assumed bit rate would be a good indicator -- that's the ‘real world' feeling we all have - that you'll get better performance out of a 20 megabit encoder than out of a 15 megabit encoder. But our results indicate that is not the case. It depends upon the material that's going in and on the specific algorithm at the heart of the codec. You can't generalize that higher bit rate equals superior results. You probably could make the generalization that one and a half megabits is worse than 15, but within a reasonable scale, you can't assume that higher bit rate equals superior results. Artifacts can be generated by any codec at any bit rate. Some are perceptually significant and others aren't. Business issues and marketing dynamics notwithstanding, there was no clear performance leader even in the match-ups such as Windows Media v.MP IV or Windows Media v. Real Media, where very similar systems were closely compared. In some ways, these are disappointing results.

Our opinion from an archival point of view is that lossy compression is unacceptable, any type at any rate. The reason is that there is no way to reliably predict performance over a wide spectrum of material unless you can do scene-to-scene compression, which we reject as being economically unfeasible. For example, such a solution would be prohibitive for the dance community. Those of you who have been involved with mastering a DVD can attest to what I've suggested here this morning. Unless you're going section by section and closely analyzing the scenic content, it's very hard identify the algorithm that will give the best performance for a given video source. No one algorithm or combination of algorithms showed outstanding performance in our test.

As a result, we believe that lossless compression is the only viable and acceptable option for video preservation. In terms of lossless compression, there is a standard. JPEG 2000 is a compression scheme based on a mathematically lossless algorithm, and it is an open standard. Mathematically lossless compression offers many advantages. There are no artifacts due to the compression process. Frames are available as discreet units—this is with JPEG 2000 -- which is very important. MPEG compression, for example, does not retain all the frames as discrete units. We think a three to one compression factor is feasible, and possibly more with certain types of material. With JPEG2000, Each frame can compress differently because each frame is an individual unit.

Now, quality of the compression program is the most important factor, but certainly not the only factor in the viability of any mass conversion effort. If compression is going to work well in an overall program of archival migration, it needs to be real-time. Analog Devices has introduced JPEG 2000 lossless — JPEG 2000 codecs that work in real-time and aren't hardware based. We have a prototype in our lab, and we are very encouraged by it. We expect that JPEG2000-lossless will be able to run on cost-effective hardware - that is, workstations that cost less than ten thousand dollars. And I'm hoping that if the media-sector market responds positively to JPEG2000, there will be an increase and concomitant price drop in media-enhanced chips so that it will become very cost effective to run lossless codecs on relatively inexpensive hardware.

Even compressed data needs to exist somewhere in storage, and fortunately, we find storage technology trending cheaper as well. Just over the course of this study we've seen a substantial and characteristic drop in the cost of data storage. [REF: PowerPoint slide]. In 1998, the cost per gigabyte then was $57.97. When we started the project, data storage was about a dollar a gigabyte for raw storage - not for a higher level system such as a RAID system, but just for the basic storage function. The cost of a gigabyte is down to seventy-nine cents as of May 28. And based upon this curve which should look very familiar to many of you, we believe that by 2010, the cost will be somewhere around six cents per gigabyte. So that being the case, the need for high data compression is mitigated, and it becomes more economically feasible to deploy archivally acceptable lossless compression even though it uses more space than other forms of compression. There are many cost components involved in the migration of video to data files, including facility overhead. As the economics of processors and storage evolve, the additional cost of lossless compression (as compared to lossy) becomes smaller relative to overall project cost. By 2010, an hour of content is going to cost about $1.50 US in 2004 dollars for raw data storage. This is roughly where audio is now in terms of unit cost. And, you may recall that the audio archivists had the same conversation we are having now several years ago. The price of storage began to drop, acceptable compression tools became available, and now one hardly ever hears about these factors as significant impediments to conversion. We believe video is following that curve.

At this point, I want to hand this over to Jim Wheeler, who is going to talk about the emerging medium for archival storage of this migrating data. Thank you.

_________________________

PRESENTATION TRANSCRIPT: WHEELER

Jim Wheeler
Tape Restoration and Archival Services
Archiving on Hard Disk Drives

Good morning.I brought in a couple of hard drives for people who haven't seen them. This is a standard hard drive package. It's got a 50-pin connector for the data and a four-pin connector for power; that's a universal open format. There is a 370 page international standards document if you want to know about this kind of disk - it's virtually all standardized. There are several manufacturers of this type of disk, and four manufacturers make about 90 percent of the hard drives that you find in desktop computers today. These are not laptop drives -- laptops are a whole different ballgame. They're very sensitive to being dropped; supposedly you can drop them from five feet. This one I took apart so you can see the head and other components. You can move the head back and forth. Both of these specimens I got from friends. The disks had failed; they are all from the same manufacturer. I've got several bad hard drives, all from the same manufacturer, not one of the big four. I will put these out on the stage so you can look at them later.

Last week I was a speaker at a one-day symposium in Silicon Valley, a one-day symposium on the reliability of hard disc drives. Reliability representatives from all the hard drive manufacturers in the world -- about 50 people - attended. The first thing that I said in my talk was "You have a different definition for archive than an archivist does. An archivist has to think forever." And I emphasized that word "forever." Nobody came up afterward and said, "That's ridiculous," which was the typical response I got when I used that term. I was introduced to this concept by Paul Spehr, the long-time head of the Library of Congress' Motion Picture/ Broadcasting/ Recorded Sound division. We learned some interesting things at this symposium. For example, the TiVo - which is an example of a hard-drive based storage system -- is running 24 hours a day. It never shuts off. The TiVo reliability engineer indicated that they expect the lifespan of the system to be at least seven years. Of course they haven't run any that long, and they are based on new hard drives, and it is impossible to do real-time tests on longevity before the product is deployed. It's impressive that we've reached the point where a disk system can run continuously, under all kinds of environmental conditions, for seven years or more. Similarly, the expected on/off cycles for electronic components used to be around 10,000, and last week, the engineers were talking in terms of 100,000 or more on/off cycles before failure. So I popped the big question: "Should I turn my computer off at night?" No one at the conference seemed to have the answer -- nobody knows. I found that very interesting.

I'm discussing two things today. One is a term originated by Bill O'Farrell: the Digital Oasis -- thinking of an archive as an oasis. And a new idea I introduced: the active archive. I'll be discussing these two ideas plus other ways of archiving that involve methods such as migration of data and so forth.

For some time, archivists have recognized two basic problems of archiving electronic moving image material (video): equipment obsolescence and degradation of media. We are just beginning to take into account a third critical factor that can be generically identified as software. There will always be issues pertaining to the specific problems of continuously upgrading software.

There is a widespread feeling that these software problems will somehow take care of themselves, and that the absorption of video into the digital world will be a backwards-compatible process. I strongly believe that the task of continuously updating software constitutes a third major problem for the archive.

Digital technology has been changing much faster than the previous analog technology, as you all know. The audio reel-to-reel tape formats were around for 50 years; two-inch quad video lasted 25 years. With the advent of video, we are forced to think of product cycles as short as three years; with data engineers, the perceived life cycle appears to be about five years. In the data world, people tend to follow the model that has been in place for about four decades - that is, refresh the data tapes every five years.

If you talk to a television station engineer or administrator, they'll tend to say five years is long enough. Apart from certain events such as an assassination or a fire, where there might be legal or economic issues documented on tape, why would you need to keep something longer than five years? So normally, five years seems to be the consensus time period for retention in that community.

Clearly there's a divergence of perspective now between those people and the archival community with its goal of indefinite - i.e. permanent retention. That may not seem like a realistic objective given the technology environment today, but that's the true goal of archiving, and what we have to aim for.

As I review these issues, I'm not talking as much about media products that are born digital; I'm really more concerned with the problems of the analog video material that constitutes our legacy archives today. We need to convert this vast body of material to digital. Analog equipment isn't being made any longer, with a few exceptions such as audio reel-to-reel recorders and VHS, since that's a very common format. VHS will persist for a while; even though DVD is pushing it out, there are still some advantages to VHS. But other than a very few exceptions (which are not archival in any case), there will be no analog choices, and you'll have to commit to digital or your library will deteriorate beyond recovery.

It's important to start migrating right away, incrementally, and this begins with planning. Start budgeting for digital equipment, networking, storage media and so forth, as well as for staff and overhead.

Now other than suicide or retirement, how can you cope with the ever-changing digital technology? Well, I'm proposing some paradigm shifts, and shifting paradigms always means stepping on toes. When this talk is available, you'll be able to read my paper. There will be resistance to what I'm proposing, and I sure would appreciate comments and criticism by e-mail at JimWheeler@aol.com.

Okay. Before I get into paradigm shifting, let's go back to Bill O'Farrell's idea for a moment. Think of your archive as an oasis. You're really isolated from the rest of the world. I doubt that any archive here is networking their material with external institutions or facilities via wire or fiber, let alone wireless. You're keeping everything in-house. You've got your storage vault, playback equipment, computer, your work stations, your tapes or drives -- that's your archive. Just as the desert caravans take their products to or from an oasis, your digital archival material can be sent out, and you can receive digital material from others, by way of electronic wire or fiber optic cable. You can do this with NTSC, PAL, SECAM - and these are well-defined standards and legacy systems that have been around since the middle of the last century -- or whatever other systems you may find useful. High definition is coming up, so in the near future you may have to deal with that format.

Now let's get down to paradigms. The first paradigm shift is to refrain from upgrading the software. And refrain from upgrading your hardware. This is going to be difficult to do. Think of your archive as an oasis, and that's your equipment and software and contents. Don't change it - never upgrade. So, you need not so much an IT person, or information technologist, but what Linda Tadic has called an AT person, an archival technologist. IT specialists are thinking very strictly within the confines of the five-year cycle, and they're not familiar with the archival imperative. Obviously, there are certain things that will be necessary under either an IT or an AT regime, but the AT has to be more creative, to be able to think outside the normal IT box so to speak. If you want to make sure your digital material is going to be playable in several years -- in 10, 20, 30, 50 years -- you want someone who can choose good equipment and good software, someone who thinks about a very different set of information technology needs, a very unusual person. And you need to back-up your equipment. I'm recommending that you buy three of every critical piece of equipment used in the course of archiving your assets. Whatever work station and software you choose, acquire three identical copies and use one and maintain the others, archive them, as it were, for the eventualities of the future, so you've got a double back-up.

My experience is mainly in the old technologies. I've been maintaining tape recorders and radios and other equipment that are as old as 77 years, even though none of the components and replacement parts are made any longer. I'm talking about vacuum tubes and many kinds of analogue electronic components. Whenever anything breaks down, I have to salvage a part from other equipment or improvise a replacement solution. But I can generally keep this equipment working. This is much harder with equipment based on circuit boards. You have to maintain your equipment meticulously. For example, why do most circuit boards fail? Because of high temperature. So make sure you've got a schedule for cleaning the filters on your computer, and also put a muffin fan on the back. Regardless of what's in the computer, and regardless of what your room environment is, you need to guarantee air circulation over the printed circuit board in the computer. These are some of the kinds of modifications and recommended practices that your AT person should take a pro-active interest in. These things are not difficult -- you can get a muffin fan at almost any electronics store.

If you can do this and similar things - if you establish a regime of rigorous equipment maintenance and operation that protects your hardware, you will be able to continue to use the same hardware, and thus the same software, and you will be able to fulfill the mandates of my proposed archival paradigm.

I know this flies in the face of conventional wisdom and you may not be able to convince your management to establish this high standard maintenance regime. In that case, anytime you upgrade your computers to new software, make sure to do a sample of your archived contents - on hard disc drive or data tapes, or whatever medium has been selected by your archive - to insure that they can be played back correctly and without problems. It's important to test all your media forms when you upgrade or modify your hardware or software system. In your archive, you may have different forms of video (from analog formats such as Betacam to digital formats such as D1 to DigitalBeta and beyond), or data files (DCT, DLT, LTO, AIT, etc.), or possibly even hard drives as your storage medium. Each time you upgrade a critical component in your system, you need to test each of these media forms to see if they can be successfully migrated to your new system. You can't assume that if one media type can be successfully migrated that all of the others will be equally compatible.

I have been something of a proponent of hard drive as a storage medium for archives, so I want to talk a little here about hard drives. There are several manufacturers to choose from. I can't say at this time whether any of the major manufacturers has better quality control than the others.

After talking to many archivists at AMIA, it seems clear that very little of an archive's holdings are accessed regularly. Probably less than 10% of a typical collection is active, and the rest is very seldom used. Normally, I would recommend a robot system for near-line retrieval, but in a scenario where 90% of your content is rarely accessed, a robotic retrieval system does not make sense economically. Perhaps the frequently accessed part of the collection could be accessed robotically. And in practice, robotic systems are often combined with a program of periodic refreshing - that is, transfer of data to a fresh tape. Another solution - one that both Ian Gilmore and Jim Lindner have advocated - is to provide access via a RAID system to the most frequently accessed resources. And for the other resources, just store the hard disk on a shelf, and catalog it.

Hard drive manufacturers are not used to hard drives not being used, and they're a little concerned about what happens to the disk, to the bearings. In the past, disks used ball bearings, but all of the major manufacturers are now making hard drives with fluid bearings, which seems to have solved the problems of the most characteristic failure mode for these devices. Mick Newnham said he had a computer that had been stored for eight years, and he set it up and turned it on recently, and it started immediately and functioned properly. So that hard drive worked after eight years of not being used.

But it's very hard to get specific information from the hard drive manufacturers. They seem to suggest that the disks be started every three or twelve months - they have suggested various periods. If you're going to do that, then you need a power connector on the back of the storage shelf. So, you wire the shelf power to power-up the drive every three months, and you can take this a step further and also wire the data connector so when you power up to test the disk, you can actually run tests -- bit error rate tests, for example - and check the quality of the disk content. I've always recommended quality checks for whatever media you're using for archival storage, but in a real-world context, this just hasn't been very practical given the volume of material in a library. But if you've got hard disks in storage that are already wired, then periodic testing of the entire collection becomes much more practical.

And hard disks don't take up much space - they're only four inches tall and not very deep - so you can store them at high density efficiently. Since most of them are not running, you don't have a heat dissipation problem or a power problem. As I mentioned before, one of the big concerns with hard drives in the past has been head crashing on the disc -- that seems to be the only generic problem with these devices, and this problem seems to have been solved by the adoption of fluid bearing technology.

So, for your Digital Oasis, there are really two important regimes to think about. The first is to archive your technology, to maintain historical equipment that will play back your original archival materials as well as possible. The second is to develop a form of active archive that gives you access, allows for a regular program of data checking and refreshment. And I still suggest that you not upgrade the software until the ideal archival medium is invented, or if you do upgrade, verify the upgrade for all your media.

Thank you.

_______________________________

POWERPOINT PRESENTATIONS

Ian Gilmour
ScreenSound Australia

Datatape Storage and Technology

Jim Wheeler
Tape Restoration and Archival Services

_______________________________

SPEAKER BIOS

Ian Gilmour

Ian is currently the Senior Manager of Preservation and Technical Services at the National Screen and Sound Archive in Australia. He is also a member of SMPTE and Chair of the AMIA Preservation Committee. Ian is in charge of the Digitizing of the Audiovisual content at Screen Sound, and works with the IT section to design and implement the Digital Mass Storage and delivery system. He has worked as an engineer and conservator at the National Library in Australia, at the Department of Defense, and is a consultant form many AV organizations around the world.

Jim Lindner

Jim Lindner, an internationally respected authority on the preservation and migration of magnetic media, is the Managing Member of Media Matters. Jim pioneered many of the techniques now commonly used for videotape restoration and has lectured widely and written about media preservation for the past twenty years. After founding the videotape restoration company VidiPax, he served as its president and executive director, stepping down after selling the company in 2001. He is a founding director of the National Television & Video Preservation Foundation and acted as a witness and panelist for the Library of Congress' "The State of American Television." Jim was twice a member of the board of the directors of the Association of Moving Image Archivists and FIAT. Currently, Jim sits on the Executive Board of SEAPAVAA and is the Chief Video Consultant for the National Audio-Visual Conservation Center at the Library of Congress. An active participant in SMPTE and ANSI standards committees, Jim has also served as Chairman of the Board of Anthology Film Archives.

Jim Wheeler

TAPE ENGINEER AND EXPERT WITNESS

  • Internationally recognized authority on tape preservation and tape restoration.
  • Engineer, tape media and tape recorder design and development, Ampex Corporation (Ampex perfected professional audio and video tape recording).
  • Presented papers at Conferences in Italy, Germany, Spain, France, Sweden, Brazil, Canada, Chile, and U.S.
  • Presented seminars at UCLA, U.C. San Diego, Canadian Maritime Provinces, and Southeast Asia.
  • Annually, teach two days at the Selznick Film School.
  • Chaired two international conferences on tape problems and solutions
  • Member of special Library of Congress Video Heritage Task Force.
  • Expert for the U.S. Department of Justice Nixon White House (Watergate) tape case.
  • Member, Ford Foundation Advisory Committee for preserving historical material in tropical countries
  • Advisor to FBI, Library of Congress, U.S. National Archives, National Transportation Safety Board, and other organizations.
  • Consultant for NASA to determine problem with Jupiter Galileo probe tape recorder
  • Invented the original instant replay. Co-recipient of an Emmy for subsequent instant replay development.

PROFESSIONAL AFFILIATIONS:

  • Speaker, newsletter contributor and Past-Chairman of the Preservation Committee, AMIA
  • Speaker and Journal contributor, Society of Motion Picture & Television Engineers (SMPTE)
  • Tape Standards Committee, Speaker and Journal contributor, Audio Engineering Society (AES)
  • Preservation Committee & Speaker, International Federation of Television Archives (FIAT)
  • Tape Standards Commission, American National Standards Institute (ANSI)
  • Tape Standards Commission, International Standards Organization (ISO)