How To Choose The Right File Format For eDiscovery? (PDF, Native, etc.)

18 October 2023 by Anith eDiscovery pdf native-format file-format tiff

Takeaway: You ideally want a native load-file eDiscovery production. And if not, a PDF load-file production is also great. You still have options beyond this, but they’re not ideal.

File formats matter with eDiscovery, and here’s how to choose the best one for you.

Navigating eDiscovery processes means making important decisions about which file format to use. Choosing the right format makes the process more efficient and accurate, ensuring that vital data is accessible and meaningful. So, here’s a step-by-step guide to help you select the right eDiscovery format.

First, understand your file format options.

Understanding your file format options is crucial for eDiscovery because different formats come with their own sets of advantages, limitations, and use cases. Here’s what that translates to for day-to-day eDiscovery.

a. Native files.

Native files are files in their original creation format, like .docx for Word or .xlsx for Excel. They’re great because they preserve all metadata (which can be crucial for legal cases), retain embedded content such as hyperlinks/embedded files, and are easy to index and search (using an eDiscovery search engine). But they have drawbacks, too. For instance, you’ll likely need the original ‘parent’ software to open native files (e.g., you’ll need Microsoft Word to open .docx files). Also, the layout of your documents might change if they’re in their native form and you view them across different devices and software versions.

b. TIFF files.

TIFFs are a raster graphics format that can store multiple images, including multi-page documents. (A raster graphic consists of tiny, uniformly sized pixels arranged in a grid of columns and rows.) TIFFs are great because they look the same across devices, can be easily annotated, and ‘flatten’ images to minimize the risk of overlooking hidden data. (Higher-end graphics applications like Photoshop offer a multi-layer file format so you can manipulate each layer independently. TIFFs ‘flatten’ these multiple layers into a single easily sharable equivalent.) But TIFFs have their weaknesses, too. For example, they’re not ‘searchable’ (using an eDiscovery search engine) unless they have an associated text file. Also, since they’re ‘images’ of a document, you can’t interact with (e.g., highlight) a TIFF image the way you would with the original document. Also, TIFFs take up a lot of space compared to other image formats.

c. PDF files.

PDFs are a file-sharing format that freezes the layout of any source document. This means a PDF’s layout stays the same even if you access it on a different computer with new settings. Also, you can search a PDF for keywords and password-protect its contents. The main drawback with PDFs is that you might lose some metadata when converting a native file into a PDF. And ‘dynamic’ content like Excel formulas becomes ‘static’ (i.e., unchangeable and unresponsive) in a PDF. Also, PDFs often take up more space than regular text files.

d. PST & OST files.

PSTs (Personal Storage Tables) and OSTs (Offline Storage Tables) are email storage formats used by Microsoft Outlook. They’re great because they keep the original hierarchical structure of emails and folders in an inbox. And in addition to storing emails, they store attachments, address books, calendar events, and more. The main issue with them, though, is how easily they get corrupted the larger they become. Also, you’ll need specialized proprietary software (e.g., Microsoft Outlook) to view them. Still, they’re useful if yours is an email-heavy case where you’ll review full email threads and attachments.

Once you’ve reviewed your file format options, figure out your priorities.

The type of file format you choose will depend on what you want the format to do for you. I.e., you’ll need to figure out your priorities. Here are some to consider.

a. Searchability.

How quickly and effectively do you want to be able to search file content and metadata? If this is a priority, choose a format that’s inherently searchable or can be made so with other readily available software.

b. Metadata preservation.

If your case depends on details like timestamps or geolocation, select a format that doesn’t compromise metadata.

c. Accessibility.

Will you be collaborating with a team regularly? If so, you’ll want a format that’s easily accessible. Ideally, an ‘open’ format (not tied into proprietary software) with wide compatibility.

d. Size and storage.

Consider storage costs and efficiency. Some formats, while superior in other respects, might be bulkier and less optimal for storage.

e. Security.

Given the rising concerns over data breaches, you might want a format with encryption, malware protection, and compatibility with DRM tools.

f. Durability.

Sometimes, you’ll need to prioritize formats that can stand the test of time, considering factors like adoption rate (i.e., how popular a software becomes) and backward compatibility (i.e., if a format works with legacy software).

g. Appearance.

If maintaining a document’s original layout and appearance is crucial, select a format that stays consistent across different platforms.

Now, match these needs with your file format options. For eDiscovery, here’s what we suggest.

We recommend the following formats to our clients, listed by decreasing preference:

Ideally, you want a native load-file production for eDiscovery.

Native formats are great because of how much file metadata they protect compared to other formats. So, when receiving productions, ask for them in native format where possible. Ideally, these native productions should come with a load file, which helps your software systematically organize data into an underlying database.

If native files aren’t available, your next best bet is a PDF load-file production.

PDF load files are gaining traction as industry norms, so requesting them won’t be unusual. However, insist that these PDFs are individual documents rather than one massive PDF of all case documents. (This consolidated PDF approach makes reviews so much more complicated. Consider how much time you’ll waste if searching a single document for keywords means having to search, by default, all the other documents stuffed into that PDF!)

If you can’t get PDFs, ask for a TIFF load-file production.

The TIFF format hasn’t been updated since 1992, but it’s still usable. However, know that TIFF productions might come with security vulnerabilities, reduced digital resolution, and aren’t feature-packed like natives and PDFs.

As a last-ditch option, you can work with large consolidated PDFs, TIFF assortments without load files, and paper files.

Processing these formats will undoubtedly be more labor-intensive but is workable if you insist on certain standards. For instance, you can ask for physical documents to be scanned at a resolution of 300 PPI (pixels per inch). And you can use optical character recognition (OCR) to convert the scanned image into text that your computer can ‘read’ and search.

Are you worried about struggling with the ‘wrong’ format? It won’t be an issue if you use the right eDiscovery software.

Whatever the format, top-tier eDiscovery software can help simplify things. Take GoldFynch, for example. It’s a subscription service with essential document review tools at an affordable price. And it can handle any type of eDiscovery file format! Here are some of its highlights:

  • It costs just $27 a month for a 3 GB case: That’s significantly less than most comparable software. With GoldFynch, you know exactly what you’re paying for: its pricing is simple and readily available on the website.
  • It’s easy to budget for. GoldFynch charges only for storage (processing files is free). So, choose from a range of plans (3 GB to 150+ GB) and know up-front how much you’ll be paying. You can upload and cull as much data as you want as long as you stay below your storage limit. And even if you do cross the limit, you can upgrade your plan with just a few clicks. Also, billing is prorated – so you’ll pay only for the time you spend on any given plan. With legacy software, pricing is much less predictable.
  • It takes just minutes to get going. GoldFynch runs in the Cloud, so you use it through your web browser (Google Chrome recommended). No installation. No sales calls or emails. Plus, you get a free trial case (0.5 GB of data and a processing cap of 1 GB) without adding a credit card.
  • It’s simple to use. Many eDiscovery applications take hours to master. GoldFynch takes minutes. It handles a lot of complex processing in the background, but what you see is minimal and intuitive. Just drag-and-drop your files into GoldFynch, and you’re good to go. Plus, you get prompt and reliable tech support (our average response time is 30 minutes).
  • Access it from anywhere, and 24/7. All your files are backed up and secure in the Cloud.

Want to find out more about GoldFynch?