Reading In: Analyzing Embedded Metadata in Digital Images
July 6, 2015
When the news came out about accused South Carolina church shooter Dylann Roof’s website and the photographs that had been posted there, I visited the site and downloaded the package of digital images out of curiosity, as an archivist, to take a look inside the images. I was interested to see the metadata, to break open the files — so to speak — and evaluate the bytes themselves. As archivists, that is exactly what we do with any collection of digital objects. We look under the hood. We look for patterns and information that will help us understand the context in which the files were created, and to help us identify any evidence that might inform us of the provenance, the originality, the history of the documentary resources we have acquired.
As an archivist, working in a contemporary information landscape, I find that digital files are ultimately the things in which I am most interested. These are the containers of the information we often seek to preserve and make accessible. At a high level of abstraction, a digital file is a stored block of information that is available to a computer program. Computer operating systems consider files as a sequence of bytes, while application software interprets the binary data as, say, text characters, image pixels, or audio samples.
If you find yourself in the library or archive world then you have probably long since tired of hearing the word metadata. Metadata undefined is a useless word. In this article, I am focused on metadata embedded in a file header (most file types also allocate a few bytes for metadata, which allows a file to carry some basic information about itself separate of the binary payload). This is all the information contained within the file that is used to help a piece of software understand what the file is and how it should be interpreted in order to decode the bits so that a human can understand them how they were intended to be understood. This can also include embedded chunks of data that provide additional information (descriptive or ancillary) about the contents of the file. This information is very unlikely to change when copying the files to new storage environments.
There are a variety of tools that make it easier to extract packets of embedded metadata for certain file types. A hex-editor can help you look at the actual bytes, but without a clear understanding of a particular file format specification at the level of bytes and offsets, it is extremely difficult to make sense of the embedded information. A few developers have taken initiative to build automated tools that help users translate the embedded bytes into human-readable information. Examples include ExifTool (optimized for still image files), MediaInfo (optimized for audio and video files), and ffprobe (optimized for video files). For this evaluation, I used ExifTool because its strength is the extraction of metadata from digital images.
What follows is an embedded metadata evaluation of the digital images posted to the website of Dylann Roof (http://lastrhodesian.com/). Downloaded 2015–06–22. Available for download from the Internet Archive here.
Overview of the Files
Zipped file downloaded from website: 103600296_19.zip.
Decompressed on local MacBook Air to folder entitled 103600296_19.
Folder contains 60 digital images.
Contents:
Evaluation of the Files
Upon inspection with ExifTool (using the following command: exiftool -csv -a -r ./ > out.csv), the following observations are made.
These 60 images were likely taken with two different cameras. Two cameras are actually visible in 100_1611.JPG, but the extracted metadata corroborates this observation (e.g., similar metadata fields for one set of files, as opposed to another set, different file name logic for the same two sets of files).
Observation is that 34 images were extracted directly from what we will call Camera 1 and 26 images came from what we will call Set 2 (being confident that these images were not created with Camera 1 but not knowing for sure if all of these images came from the same camera).
Camera 1 Files
Images from Camera 1 have the following create date/modified data:
From this set, one photo was taken in 2014: 2014–08–03 @ 16:56:55 to be precise [100_1443.JPG]. The remaining photos were taken over the span of three months, from 2015–03–18 to 2015–05–11.
Four of these images from Camera 1 were modified using Microsoft Windows Photo Viewer 6.3.9600.16384 on 2015–06–17 at the following times:
14:45:38–05:00 (CDT) [100_1611.JPG]
14:48:10–05:00 (CDT) [100_1688.JPG]
14:49:30–05:00 (CDT) [100_1706.JPG]
14:51:46–05:00 (CDT) [100_1808.JPG]
Set 2 Files
Images from Set 2 have no date of creation in the files themselves, but file modification dates exist:
Twenty-two of the twenty-six images were last modified (likely extracted from a camera and transformed to JPEG with computer software) on 2015–03–16 between the time of 21:30:10–05:00 and 21:34:18–05:00 — four minutes to extract and convert 24 images.
Two images — 103753459_18.jpg, and 103753459_21.jpg — were last modified on 2015–03–24 at 23:01:50–05:00 and 23:02:10–05:00 respectively. These happen to be the same images (modified) as 100_1636.JPG and 100_1644.JPG from Camera 1. Comparing file modification dates, it appears that Camera 1 images are the original and that Set 2 images are modified versions of the two Camera 1 photographs.
Two images — 103600296_4.jpg, and 103600296_3.jpg — were last modified on 2015–06–17 at 16:53:10–05:00 and 16:53:20–05:00 respectively. Exif metadata supports this observation. Both files were adjusted with Microsoft Windows Photo Viewer 6.3.9600.16384 (likely the orientation was changed from horizontal to vertical).
Additionally, color coded above, it appears there are three separate batches of photographs generated from Set 2, based on the machine-generated filenames (and the modify dates).
Overview of Cameras Used by Dylann Roof
Camera 1: Blue 14-MP C1530 EasyShare Digital Camera
Set 2: Based on Exif data for ExifByteOrder that read “Big Endian (Motorola MM),” it is possible that images from Set 2 could be from one of these cameras:
Also, based on Exif data for two of the images associated with Set 2, it is likely that these photos were imported into a PC with a Microsoft operating system and saved as JPEGs using Microsoft software: PhotoViewer.dll 6.3.9600.16384.
Additional Analysis
Six of the 60 files were modified the day of Roof’s terrorist actions. Four from Camera 1 and two from Set 2.
The most recent photographs taken in the batch were from 2015–05–11. There were three photographs taken that day.
The earliest photograph in the batch is from 2014–08–03 and was taken with Camera 1.
Create and modification dates suggest that most images from Set 2 were taken on or before 2015–03–16, whereas images from the Camera 1 set begin being generated as of 2015–03–17.
I stop here because these are not my photographs. These are not photographs that I intend to collect. However, I include a link to the raw extracted metadata for others to evaluate and examine.