A Quick Guide to eDiscovery Archive Files [7z, ZIP, RAR And TAR]
Takeaway: Archives make it easy to store and send groups of files. They combine, compress, and (sometimes) encrypt data without altering it. You’ll find hundreds of archive formats out there, though, so choose a versatile eDiscovery application that can handle the more popular ones.
eDiscovery becomes simpler when we use file archives for productions.
File archives are ‘container’ files into which you can put your data so that it’s easy to store and share. They’re especially useful when you have folders within folders because they keep this folder hierarchy intact. They also ‘compress’ your files, shrinking them down to take up less space. And they allow you to encrypt your data, too. These kinds of features make them ideal for sharing eDiscovery productions via the Cloud, email, etc.
All archives group your files into a ‘container’. But some do more than just that.
There’s a different archive type to suit every need. Some archives simply group files, others compress them, a few encrypt them, and all help you conveniently store your productions. But most archives process your data using the following steps.
1. Embedding additional information that helps make sense of your data.
The following information needs to be embedded into each archive so that your eDiscovery software can make sense of it.
- File system data. Raw data is just a chunk of information with no indication where one piece of data stops and the other begins. A ‘file system’ is the set of logic rules that processes your raw data and decides how to store and retrieve it. (It’s similar to how we would sort through and organize a random stack of paper files on our desk.) Archives need to embed this file system data so your eDiscovery software can read and unpack your files properly.
- Metadata. When you create a document on your computer, the application you’re using (e.g., Microsoft Word) records a bunch of information about it. Things like who created it, when they created it, when it was last opened, etc. This ‘data about data’ (i.e. metadata) is a digital footprint that tracks the history of the document. And just like with file system data, archives need to embed metadata to give your files meaning and context.
2. Assigning a ‘checksum’ to make sure your data stays intact.
Archives need to make sure their files don’t get altered – either by mistake or on purpose. And checksums help with this. They’re a string of numbers and letters given as an ID to an archive. And they can be crosschecked later on to verify that the archive is intact. A checksum is generated through special algorithms (MD5, SHA-1, SHA-256, etc.), and is highly specific. (A checksum value changes if you alter just a single character in a document.) After you’ve shared or received an archive, your software will re-generate the checksum and compare it to the original. Here’s an example of a checksum generated using the MD5 algorithm: bc527343c7ffc103111f3a694b004e2.
3. Compressing data so it takes up less space and is quicker to transfer.
Electronic data is essentially a string of 0s and 1s. So, the longer the string, the more space it takes up. Archives compress files by creatively removing redundancies in this string, thereby shortening it. This is called ‘lossless’ compression in the data compression world because none of the vital original data is removed. (In contrast, a ‘lossy’ compression nips away data we can afford to ignore – for example, removing subsonic sounds in an MP3 file. We won’t hear the difference, but we’ve still changed the file.) Depending on the type of compression, an archive can shrink to up to half its size through lossless compression.
4. Encrypting data to keep it safe from prying eyes.
To encrypt data, archives use two sets of algorithms. First, an encryption algorithm scrambles the data, and then a ‘key’ algorithm unscrambles it. The key can be kept with the receiver or sent along with the encrypted archive. If you’re okay sharing the key with someone else, then you can use the same algorithm to encrypt and decrypt the archive (a process called ‘symmetric encryption’). But if you don’t want to share the key, you can ‘asymmetric encryption’ which uses two keys – one public (to encrypt the archive) and one private (to decrypt it). That way, anyone can encrypt the archive using the public key, but only the recipient ever has access to the private key. So it doesn’t need to be passed around.
When choosing your eDiscovery software, make sure it can handle some of these popular archive formats.
There are more than 250 types of archives, but these are the most popular ones.
- ZIP archives were created in the late 80s and are still very common. You can use them to just group files together or to both group and compress them. They’re quick and easy to use, and the parent app WinZip comes free with Windows. Their main drawback, though, is that they don’t compress files as well as the other archives in this list. As a fix for this, the ZIPX archive was released. It does a better job with compression but as a tradeoff, it’s slower than ZIPs.
- RAR archives are the most well-known rivals to ZIPs. They’re better at compressing files, are impressively fast, and are open source – so you can use them for free. But to use them, you’ll need to download the WinRAR software (for Windows) or Unarchiver (for Mac).
- TAR archives were created for Unix/Linux systems in the late 70s. They were originally designed for ‘tape drives’ that would store your data on magnetic tapes. In fact, that’s how they got the name ‘TAR’ which stands for Tape Archive. Today, they’re the best way for Unix/Linux users to group files for backups or sharing. TARs will group files without compressing them, but you can compress them later on – converting them into TAR.BZ2 or TAR.GZ variants. [Side note: TARs are often called ‘tarballs’ because they collect files just as a sticky tarball would.]
- 7Z (or 7-Zip) archives are popular because they’re compatible with a range of other compression software. And this is even though they first came out back in 1999. The 7z format stands out because it can handle massive files – up to 16 billion GB (in theory). But, like RAR, their parent software doesn’t come built into Windows or Mac, and you’ll need to download the free 7-Zip software to use them.
Looking for eDiscovery software that can handle all popular archives? Try GoldFynch.
GoldFynch is an eDiscovery service that is perfect for small- and midsize law firms and companies. It’s great with file archives and has other things going for it too.
- It costs just $27 a month for a 3 GB case: That’s significantly less than most comparable software. With GoldFynch, you know what you’re paying for exactly – its pricing is simple and readily available on the website. (Note: You’ll get a free 512 MB trial case to sample, first)
- It’s easy to budget for. GoldFynch charges only for storage (processing is free). So, choose from a range of plans (3 GB to 150+ GB) and know up front how much you’ll be paying. It takes just a few clicks to move from one plan to another, and billing is prorated – so you’ll pay only for the time you spend on any given plan. With legacy software, pricing is much less predictable.
- It’s simple to use. Many eDiscovery applications take hours to master. GoldFynch takes minutes. It handles a lot of complex processing in the background, but what you see is minimal and intuitive. Just drag-and-drop your files into GoldFynch and you’re good to go. Plus, it’s designed, developed, and run by the same team. So you get prompt and reliable tech support.
- It keeps you flexible. To build a defensible case, you need to be able to add and delete files freely. Many applications charge to process each file you upload, so you’ll be reluctant to let your case organically shrink and grow. And this stifles you. With GoldFynch, you get unlimited processing for free. So, on a 3 GB plan, you could add and delete 5 GB of data at no extra cost – as long as there’s only 3 GB in your case at any point. And if you do cross 3 GB, your plan upgrades automatically and you’ll be charged for only the time spent on each plan. That’s the beauty of prorated pricing.
- Access it from anywhere. And 24/7. All your files are backed up and secure in the Cloud.
Want to learn more about GoldFynch?
For related posts about eDiscovery, check out the following links.
- Why Your eDiscovery Software Should Offer Automatic Case-Upgrades
- The Smart Way to Free Up eDiscovery Storage Space
- Is It Worth Paying for eDiscovery Analytics?
- Small Case Vs Big Case eDiscovery: There’s Such a Difference!
- eDiscovery Pricing Comparison for Smaller, In-House Cases
- How to Use eDiscovery ‘Tag’ Macros For Lightning-Quick Work!
- Is Social Media the Future of eDiscovery?