FIELD GUIDE / DATA MANAGEMENT
last update: July 19 2016
All data transfers should be verified with checksums.
Regardless of the operating system or file manager used (Windows Explorer, OSX's Finder), data transfers are generally not verified. Transfer errors do happen and stay undetected without a verification. Clips might be rendered unusable. Especially unstable consumer interfaces like thunderbolt, firewire and USB can produce transfer errors, but also defective drives, memory, processors or any other component in a computer can produce data corruption. One likely problems presents the fragile micro USB3 B-plug used in the small 2,5" external hard drives.
These programs use checksums (or "hashes") to verify if the copy is identical to the source. A checksum is a short string of characters unique to one file. This string is computed by applying a mathematical function to the content of a file. Common functions are: md5, crc32, sha1, xxhash, blake2. The Checksum for a file is usually stored in simple text file next to it.
Example blake2-checksum: 46d53ca028b087187f2c7b1987eb22ca
A checksum is different, when only one bit of a file changed. Checksum functions are not perfect. A cleverly chosen or highly unlikely random change of bits might produce the same checksum. Checksum functions differ in calculation speed and security of the uniqueness (collision rate) of the resulting string.
md5 is the most used function, but limited to 300MB/s processing speed. The faster xxhash eliminates this bottleneck, when working with SSDs or RAID storage. When dealing with individual HDDs, xxhash will be slower than md5. Blake2 is certainly overkill in regard to collisions, but faster on a powerful systems than md5 as well.
The alternative to checksum verification is a simple bit comparison were both source and copy are read simultaneous and compared bit by bit. But nothing remains from such process. A higher-end production might need logs of transfers for insurgence purposes. Offload-programs produce logs and drop the respective checksums next to the transfered files or directories.
Checksums should travel from the camera medium, through post, to the archive.
Digital storage technologies, magnetic hard disk drives and video or data tape and flash (SSDs, thumb drives) suffer from data degradation. Keeping the original checksums along the material allows this decay to be detected through regular verification intervals and be recovered from other copies.
EXAMPLES FOR MANUAL DATA MANAGEMENT
Windows Explorer and the OSX's Finder offer only the basic functions. A 3rd party file manager is a necessity for efficient data management. File managers like Totalcommander for Windows can queue up transfer operations, create checksums, verify and compare files, find duplicates, offer advanced search functions, batch rename files, allow constant overview over directory sizes and storage usage and other functions.
While Totalcommander can create checksums as well and use them for verification, an elegant program puristically called Checksum offers higher performance and advanced functionally incl. logging of the verification process.
Project data often travels through different system during a production. For example, the LTFS file system of the LTO tape technology does not understand all symbols allowed under Apple's OSX.
For interoperability between different programs and operating systems/file systems, is it good practice to choose simple file & directory names without symbols and language related characters.
Example card(reel) directory name: 001_2016_06_28_muenchen
EXAMPLE 1: MANUAL ON-SET DATA MANAGEMENT FOR A DOCUMENTARY (WINDOWS)
SOFTWARE: totalcommander, checksum
HARDWARE ALTERNATIVE 1:
external LTO-6 drive IBM TS2260 SAS/USB3, 2 LTO-6 cartridges (2x 2,5TB)
+ workflow (already archived)
+ price (if drive is exists)
- not for rough conditions, dusty environments and extreme heat
- weight (drive)
- price (generally only used on fiction film sets)
- not for screen footage (notebooks internal drive must be large enough)
HARDWARE ALTERNATIVE 2:
2-3x HGST Ultrastar 2TB-4TB (3,5" enterprise)
or sealed Helium drives for high altitudes
or SSDs for rough conditions (shock, vibration, climate, magnetic fields)
Fantec HDD-Sneaker USB 3.0 e.g.
or simply USB3-SATA adapter (with 2nd USB3 connector to power SSDs)
2-3x 2.5"/3,5" CRU DriveBox or ORICO PHX-35
- (3,5" drives: needs external power)
HARDWARE ALTERNATIVE 3:
2-3x G-Technology G-Drive ev ATC 1TB, USB 3.0/Thunderbolt 1
+ powered by interface
+ reliability in rough conditions (also SSD G-Drive available)
HARDWARE ALTERNATIVE 4:
external 2.5" HGST USB3 drives
+ powered by interface
- reliability (connection)
- capacity (only up to 1,8 TB)
0 Switch camera medium to "write protected" when removing it from the camera
1 TotalCommander: copy entire content of each camera medium (e.g. SSD, SD card, CFast) to
a separate directory called the number of the card first:
001_date_location, 002_date_location, ... to allow order when sorting a directory by name
2 TotalCommander: select entire content of the camera medium, right click one directory, hold
"shift" when selecting "create checksum" and the program Checksum starts with the options
dialog. (Checksum must be integrated into the windows shell).
3 Checksum: Select "one-file root checksum, type in the corresponding "output directory",
select the Blake2 algorithm. Checksum is now computing the checksums...
4 Checksum created one .hash file containing all calculated Blake2-checksums of all files of
the camera medium.
The content of the hash file of an FS7 XQD card looks like this:
# made with checksum.. point-and-click hashing for windows (64-bit edition).
# from corz.org.. http://corz.org/windows/software/checksum/
5 Verify: navigate to the destination folder of the copy and open the hash file to start
the program Checksum
6 Create at least 2 copies.
Backup at least daily.
Consider the use of 2,5" drives with an attached cable for a more stable connection. Such drives are offered by G-Tech, WD or Lacie. G-Tech is owned by HGST. It is common knowledge that HGST drives offer highest reliability, supported by the Backblaze statistics.
Use sealed 3,5" HGST helium drives + Thunderbolt or USB3 HDD docking station for high altitude locations.
EXAMPLE 2: MANUAL ARCHIVING OF A PROJECT TO LTO TAPE (WINDOWS)
used programs: totalcommander, checksum by corz
0 If the project size exceeds the tape capacity, split the project in directories. For each
1 Create one checksum file, like in example 1, for each of these directories
2 Open LTFS Configuration to format and mount the tape by "Select an unused drive letter" and
3 A new drive letter appears, drive is ready.
4 Sort directories by name and copy data to tape incl. the checksum files.
5 Verify data with checksums by opening.
6 If verification is positive, reopen LTFS Configuration and click "Remove" and under
"Cartridge utilities..." click "Eject" or press the eject button on the drive. Don't leave
tapes in the drive.
7 Repeat. Minimum are 2 copies.
COMPRESSION OF UNCOMPRESSED MEDIA
Uncompressed master image sequences (DPX, TIF) take very long to transfer to tape, since they can consist of 100.000+ files. The RAR-Archiver for example, compresses them to about 50% into one file. RAR's feature "recovery record" with maybe 3% percent redundancy brings additional protection from corruption. Any random 3% damage of that archive can be corrected.
Before writing to tape, an archive should be tested after creation, either by checking "test archived files" during creation or after from RAR's program menu. Compression and creation of the recovery information of a 2k 10bit master can take a day.
EXAMPLE 3: CHECK OF A TAPE/DISK (WINDOWS)
used programs: Checksum by corz
0 Mount tape / connect disk
1 Open the hash file by holding shift, to start the options dialog of Checksum
to customize logging. Or right click a folder with several checksum files, click "verify
checksums" and checksum will find and verify automatically all checksum files it understands.
2 Checksum verifies the tape automatically.
3 Overwrite damaged files from other copies if necessary.
EXAMPLE 4: VERIFIED TRANSFER OVER THE INTERNET (WINDOWS, OSX)
Over the Internet are the transfers that most likely can corrupt a file. Files should either travel with checksums or packed into rar or zip files. (An archive file, like rar or zip, includes a checksum and is used to test the integrity of the archive during decompression)
Prominent examples are WAV files of an audio mix or a mp4 for small festival. Any DCP has a built-in sha1 checksum, but since it consists of several files, it should be send in an archive as well.
0 Click "checksum" in the context menu of the file to be transfered and an md5 is created.
0 Open FF-md5drop.
1 Drop file into the program window.
+49 (0)30 9836 1160