Episode 8 of More Podcast, Less Process Unleashed
31 March 2014
Episode #8 of “More Podcast, Less Process”, the archives podcast co-produced by METRO and AVPreserve, is now available for streaming and download. This week’s episode is “The Video Word Made Flesh” with guests Nicole Martin (Multimedia Archivist and Systems Manager, Human Rights Watch), Erik Piil (Digital Archivist, Anthology Film Archives), and Peter Oleksik (Assistant Media Conservator, Museum of Modern Art) discussing the different approaches and challenges of managing video collections for access and preservation. Whether dealing with video in an archival or production environment, there are a number of decision points around digitization, storage, description, and playback, the options for which are highly dependent on the mission and capabilities of the organization. Josh & Jefferson and their guests talk about these issues and all things video. It’s a visual treat for your ears.
New Digital Preservation Services From AVPreserve
24 March 2014
AVPreserve At Code4Lib
24 March 2014
AVPreserve Senior Consultant Kara Van Malssen will be attending the Code4Lib conference this week in Raleigh, North Carolina. We have long followed the work that comes out of the Code4Lib community and strive to follow a similar ethos of innovation, openness, and collaboration in achieving technological solutions for the challenges of digital preservation and access. We’re excited to be attending the conference for the first time and look forward to learning and sharing. If you’re in Raleigh, stop and chat with Kara about the projects we’re working on — just don’t mention Date and Time formatting…
Catalyst Case Study #1 – Sudden Loss Of Institutional Knowledge
20 March 2014
What happens to a collection when its sole caretaker suddenly goes away? This case study examines such a situation and how the use of AVP’s Catalyst inventory solution was used to document an audio collection in support of preservation planning. Download the first in a series of case studies about practical, outcomes based approaches to audiovisual collection appraisal and processing.
New Catalyst Case Studies in Audiovisual Preservation
20 March 2014
What happens to a collection when its sole caretaker is suddenly out of the picture and has left no documentation? This is an all too common occurrence with archival collections, and the problem is compounded with audiovisual collections where content may not be accessible and identifiable, production practices may create multiple versions or derivatives, and preservation or reformatting is often a necessary first step before anything else.
AVPreserve at New England Archivists Spring Meeting
19 March 2014
AVPreserve is proud to participating at the New England Archivists Annual Meeting taking place March 20th-22nd in Portsmouth, New Hampshire. AVPreserve President Chris Lacinak will be presenting on two separate panels. First, Chris will join Yvonne Ng (Archivist, WITNESS) and Jane Mandelbaum (IT Project Manager at Library of Congress) on the Free Open Source Tools panel where he will speak about Fixity, MDQC, Interstitial, and AVCC, four free digital preservation applications that AVPreserve has released in the past year. On Saturday Chris will be speaking with Elizabeth Walters (Program Officer for Audiovisual Materials, Weissman Preservation Center, Harvard Library) on The End of Analog Audiovisual Media: The cost of inaction & what you can do about it. He will discuss our new Cost of Inaction Calculator — a tool that helps archives quantify and effectively articulate what is lost in the way of access, intellect and finances by not acting to reformat collections — and our Catalyst inventory tool which was used to help the New Jersey Network efficiently create an item level inventory of 100,000 audiovisual items so they could begin to prioritize and plan for preservation.
MDQC Case Studies
14 March 2014
By Alex Duryee
You can also download a PDF version of these case studies here.
INTRODUCTION
The sheer quantity of digital assets created in migration projects and large-scale ingests often overwhelm library staff. Outside of digitization on demand, objects are typically digitized at scale in order to capitalize on efficiencies of volume. In such cases it is not uncommon for small archival teams to handle many thousands of digital assets, each of which must go through an ingest workflow. The most important part of the ingest workflows, quality control of incoming preservation masters, is oftentimes the most time consuming step for digital archivists. These assets are typically reviewed manually at the item level. As such, a bottleneck is created as the rate at which quality control is performed falls behind the rate at which newly digitized assets are created or digital collection are acquired.
Quality verification also tends to be an ineffective use of staff time. Despite being important, it is tedious and a poor use of skilled labor. Digitization projects and departments can sink unanticipated amounts of valuable time and resources into item-level quality control, thus detracting from other services (both real and potential). All told, asset quality control is a step in archival workflows that is ripe for improvement.
Tool development
AVPreserve developed MDQC (Metadata Quality Control) to address these bottlenecks and expedite digitization workflows. MDQC is an open source tool based on Exiftool and MediaInfo that allows users to set baseline rules for specifications related to digital media asset quality (such as resolution, framerate, colorspace, et al) and embedded metadata (such as date formatting, completed fields, standard values, et al). Once a set of rules is created, it can be applied across an entire collection at once, reporting any assets that fail to meet the quality standard (e.g. wrong colorspace, below minimum resolution, gaps in descriptive metadata, wrong sample rate). From these reports, which are generated utilizing nearly no staff time, an archivist can separate problematic assets from those that do meet the required specifications. As such, MDQC tremendously expedites the quality control of digital media collections, replacing a manual item-level task with an automated collection-level one.
CASE STUDIES
Overview
During the development of MDQC, AVPreserve worked with two organizations to test and implement MDQC in a production setting. The Digital Lab at the American Museum of Natural History applied MDQC in a large-scale image digitization project, and successfully used it to greatly expedite their processing workflow. Similarly, the Carnegie Hall Archives used MDQC to rapidly verify if vendor-generated assets were meeting the preservation quality specified in the statement of work.
The following brief case studies outline how the two organizations implemented MDQC and its effect on their digital asset workflows.
Unsupervised Image Digitization: American Museum of Natural History
Background and practices
The Digital Lab at the American Museum of Natural History (AMNH) is working on an ambitious project digitizing historical photonegatives, with the goal of scanning each one – over one million in total – and making them accessible in a public digital asset management system for research use. Currently, the AMNH is digitizing these photonegatives using a volunteer force, which generates roughly 200-300 images per week, in tandem with a small team of archivists performing quality control and image processing. Due to the nature of volunteer labor, changing standards over time, and turnover, quality control is tremendously important to the Digital Lab’s project. Traditionally, this was performed on a per-image basis, where scans were loaded into Photoshop and visual/technical assessments were performed. This process was slow and repetitive, and was a bottleneck in the imaging workflow.
Selection and implementation
AVPreserve selected the Digital Lab as a pilot partner for MDQC, as its scenario was ideal for testing and implementing the tool. The Digital Lab was able to set its imaging quality standard for resolution, color space, file format, compression, and bits per sample. While this does not capture every aspect of image quality control—a brief visual inspection is still needed for alignment, cropping, et al—it allows for rapid automated testing for basic technical standards. This tremendously expedites the image review step in the digitization workflow, as images can now be assessed hundreds at a time for technical quality.
One area in which MDQC had unexpected success was in legacy asset management. The Digital Lab, when first embarking on its project, did not have established standards or workflows for its volunteer scanning efforts. As such, there were an overwhelming number of images – approximately sixty thousand – that were created without a standard specification in mind. These images may or may not meet the current standard, and may or may not need to be reprocessed. Manually performing quality control on these legacy images would be overly arduous and a poor use of staff time, creating backlogs in the new images requiring quality control that are being created every day. By automating technical quality control, MDQC has allowed the Digital Lab to bring these legacy assets under control. The archivist can set their current imaging standard into a rule template and apply it across thousands of images at once, and thus automatically sort between images meeting specification and those failing to do so. As of writing, MDQC has helped the Digital Lab to bring three thousand legacy assets forward into their workflow, saving the Lab weeks of labor.
Benefits to the organization
Excitingly, MDQC has allowed for expanding the AMNH digitization services and production processes into new realms. Due to the sheer number of images to be processed, the Digital Lab is always looking for new scanning sources. The Lab has recently implemented project-based digitization, where researchers scan sub-collections of images for both personal use and to contribute to the general collection. Previously, this was a difficult service to implement in production workflows, as it required additional processing and review for a collection of images outside of the standard workflow and expected weekly volume.
By employing MDQC, the Digital Lab is able to very quickly assess the researcher’s images for baseline quality and bring them into their production workflow. MDQC has also allowed the archivists in the Digital Lab to develop a training program on image processing for interns, as there are now plenty of verified images to work with and prepare for final deposit, as well as no pressing backlog for image review by the staff.
Vendor Performed Mass Digitization: Carnegie Hall
Background and practices
In 2012, the Carnegie Hall Archives launched the Digital Archives Project, a comprehensive digitization program, to digitize and store a majority of their media holdings. Due to the scope and speed of the project, the Archives used a vendor service to digitize manuscripts, audio, and video recordings, which were returned in bulk on hard disks. As the vendor-supplied materials will be the digital masters for these assets, the archivists at Carnegie Hall implemented a quality control workflow for returning assets.
Previous to implementing MDQC, the workflow involved a technician opening each file in Adobe Bridge and comparing technical metadata against a set standard in an Excel spreadsheet. This step is important in guaranteeing that the minimum standard for quality was met by the vendor, but is also tremendously time-consuming. The archivist estimates that 70-100 images per hour were processed by a technician, with a total of 35,816 images digitized. This would have required roughly 400 hours of labor to perform quality control for the images alone, not to mention 1,235 audio and 1,376 video assets also in the pipeline.
Selection and implementation
The Digital Archives project (DAP) was developing backlogs of material to review, making MDQC a natural fit in their production and ingest workflow. The manual step of verifying technical quality could be automated via MDQC by establishing baseline rules (as outlined in the service contract with the vendor) and testing returned assets against those rules. This fit neatly into the Archive’s workflow, as returned assets could be scanned in-place on the hard drives before further action was taken.
Benefits to the organization
As a result of MDQC, the Carnegie Hall Archives tremendously expedited their digitization workflow. Returned batches of digitized assets were assessed for technical quality (resolution, compression, format, colorspace) within minutes instead of weeks or months. While there is still a need for human analysis of assets (for issues such as digital artifacts and playback problems), these can be performed more efficiently by automating the analysis of technical quality. As such, the Archives were able to accelerate their workflow and make remarkable headway on this aspect of DAP in a very short time.
CONCLUSIONS
MDQC has allowed for our pilot organizations to greatly accelerate their digitization workflows. By automating technical quality control, it has allowed these organizations to focus their time and labor on more fruitful tasks. Technicians are able to focus on processing and ingest instead of technical standards, and interns can be trained on more valuable tasks than the rote checking of technical metadata. Additionally, by expediting the previously slow process of quality control, more assets can go through production than ever before. As such, we are excited for the possibilities of MDQC in increasing digitization throughput and archival productivity.
The most surprising and exciting development from our pilot program was how dramatically MDQC could affect an organization: by automating a tedious and time-intensive task, it opened the door to new services as well as expediting existing ones. The AMNH was able to use MDQC to offer new research services by applying it to patron-generated assets, thus creating a new source of materials for their digital archive. This came about due to how quickly MDQC allows for the quality control of assets – verifying a small batch requires minimal additional work by the archivist, and can thus easily be done as part of a daily workflow. We hope that other organizations find similar success with MDQC and are excited to see what springs from it.
MDQC is a free application developed by AVPreserve. Download and further information can be found at https://www.avpreserve.com/avpsresources/tools/, along with many other free digital preservation resources.
ACKNOWLEDGEMENTS
AVPreserve would like to extend special thanks to the following staff and stakeholders for their contributions and generous feedback that made these case studies possible:
Testing Support:
Miwa Yokoyama, Carnegie Hall
Anna Rybakov, AMNH
Jen Cwiok, AMNH
Development Support:
Phil Harvey (Exiftool)
Jerome Martinez (MediaInfo)
This Is Your Thesis; This Is Your Thesis On CD-R
14 March 2014
I was recently contacted by one of my alma maters about my masters thesis. The school required deposit (in duplicate) with the Library as part of the application for graduation, but luckily by this time they were accepting (perhaps enforcing — who wants to manage all those bound theses?) submission of an electronic copy on CD-R. This was the second time in 10 years that someone had written to me about my thesis. Pretty awesome for a Humanities masters thesis, so I was starting to feel awfully big-headed.
Don’t Blame Poor Records Management On Overpreservation
7 March 2014
There’s been a lot of good discussion lately about the meaning of the word archive(s) and its use by those outside of the profession. Much of this discourse is focused on the relation of the issue to the profession, questioning what impact the broad (and what some might think as improper) application of the term(s) has on the public’s view of the role of archives and archivists.