DeNISTing Explained: A Step in Tackling Data Overload

03 January 2025 by Uday eDiscovery DeNISTing data-culling

Takeaway: When managing large datasets in legal investigations, it’s easy to feel overwhelmed by the sheer volume of information. One method of data culling in eDiscovery is DeNISTing, which helps streamline document review by removing irrelevant files. While effective, DeNISTing is not an essential component of every eDiscovery workflow. Understanding its definition, process, and challenges can help you determine if this method is the right fit for your case.

Think of DeNISTing as cleaning your digital closet.

So, what is DeNISTing? At its simplest, it’s a way to filter out unnecessary files—like system files, software updates, and temporary installation junk—so you can get to the good stuff. The process uses a special National Institute of Standards (NIST) database called the NIST list, which contains many known file types (think operating system files, application logs, etc.). By matching your dataset against this list, DeNISTing removes some of the clutter, leaving you with a smaller subset of files that might be more relevant to your case.

Before we dive deeper into DeNISTing, let’s talk about data culling and its importance for eDiscovery.

In eDiscovery, data culling is all about filtering out irrelevant information so you can focus on what’s important. Common culling methods include:

  • Keyword filters: Narrowing down files based on specific search terms.
  • Date filters: Focusing on files created during particular timeframes.
  • Deduplication: Removing duplicate copies of the same file.
  • DeNISTing: Tossing out those pesky system-generated files.

DeNISTing is just one of the tools in the data culling toolbox. It’s handy, but it’s not the end-all-be-all of eDiscovery efficiency.

But how does the process of DeNISTing work

The process of DeNISTing sounds technical, but don’t worry—it’s easier than you might think. Here’s how it typically plays out:

1. Get the NIST list

The DeNISTing list contains thousands of file signatures that identify non-user-generated files. It’s like a giant “do not call” list for your dataset.

2. Collect the data

The first step in eDiscovery is gathering all potentially relevant data from sources like email servers, laptops, mobile devices, and cloud platforms. This raw dataset is often massive and unwieldy, containing everything from business documents to system logs.

3. Generate hash values

Each file in the dataset is assigned a unique identifier called a hash value using algorithms like MD5 or SHA-1. These hash values act like digital fingerprints, uniquely identifying each file without requiring a manual review.

4. Compare with the NIST list

The hash values from the dataset are then compared against the NIST list. Files that match entries in the list—like system files or default application components—are flagged for exclusion.

5. Filter out irrelevant files

You can remove files that match the NIST list from the dataset. The remaining files are typically user-generated content, such as emails, contracts, and spreadsheets, which are more likely to contain relevant information for the case.

6. Review and validate

After DeNISTing, you must validate the results through sampling to prevent the exclusion of critical files. This step helps maintain confidence in the integrity of the dataset.

DeNISTing sounds highly technical, and you may wonder if it’s worth the effort.

Let’s look into why DeNISTing is a useful data-culling technique:

  • It shrinks the dataset: By eliminating unnecessary files, DeNISTing can reduce the size of your dataset significantly. That means less to review and faster results.
  • It reduces costs: Every file removed from the dataset means one less document to review. This efficiency translates into significant cost savings, especially in cases involving terabytes of data.
  • It keeps you focused: By removing the noise of system files and other junk, you can zero in on the files more likely to contain the information you need.

While DeNISTing is a valuable tool, it has challenges and limitations.

It’s important to know what you are getting yourself into before adding DeNISTing to your eDiscovery process:

  • Risk of over-culling: There’s always a slight risk of discarding relevant files if they match something on the DeNISTing list. For example, you could accidentally remove a user-created document embedded in a system file.
  • Staying current with the NIST list: The DeNISTing list is constantly updated. If you don’t use the latest version, you might miss some files or accidentally remove new ones.
  • Risk of false positives: Occasionally, files you review may share hash values with entries on the NIST list. This overlap is rare but highlights the importance of validating results through sampling.
  • It’s just one way of data-culling: DeNISTing doesn’t address all irrelevant files, such as duplicate emails or outdated drafts, which require other culling methods. It’s a foundational step but not a complete solution.**
  • Not a one-size-fits-all solution: Some cases (like software disputes) might need those system files. In those situations, DeNISTing could end up doing more harm than good.

Here’s the million-dollar question: Is DeNISTing worth it for your eDiscovery project?

The answer depends on your situation. If you’re working with a massive dataset where system files are clogging up the works, DeNISTing can be a lifesaver. But if your dataset is smaller or you suspect system files might be relevant, you might want to skip it.

Ask yourself these questions:

  • How big is the dataset? Larger datasets benefit more from DeNISTing.
  • Are system files relevant to the case? If yes, avoid DeNISTing—or at least proceed cautiously.
  • What’s the budget and timeline? DeNISTing can speed things up, but it comes with upfront costs.

Ultimately, DeNISTing is a tool, not a rule. It’s there to help, but it’s not always necessary.

DeNISTing is a method from the data-culling toolbox that cuts out the noise.

Removing non-user-generated files allows you to focus on the important stuff and save time and money in the process.

But let’s keep it real—DeNISTing isn’t essential for every case. It’s one of many ways to tackle data culling, and it works best when combined with other strategies like keyword and date filtering.

So, the next time you’re knee-deep in eDiscovery, take a moment to consider whether DeNISTing makes sense for your case if it does, great! If not, you have plenty of other tools in your arsenal to do the job.

Looking for eDiscovery Software to help with your case? Try GoldFynch

GoldFynch is an affordable, streamlined, secure eDiscovery service for small to mid-size organizations. It has a free trial that you can sign up for in seconds without a credit card.

  • It costs just $27 a month for a 3 GB case: That is significantly less than most comparable software. With GoldFynch, you know what youre paying for exactly – its pricing is simple and readily available on the website.
  • It’s easy to budget for. GoldFynch charges only for storage (processing is free). So, choose from a range of plans (3 GB to 150+ GB) and know upfront how much you’ll be paying. It takes just a few clicks to move from one plan to another, and billing is prorated – so you’ll pay only for the time you spend on any given plan. With legacy software, pricing is much less predictable.
  • It’s simple to use. Many eDiscovery applications take hours to master. GoldFynch takes minutes. It handles a lot of complex processing in the background, but what you see is minimal and intuitive. Just drag-and-drop your files into GoldFynch and you’re good to go. Plus, it’s designed, developed, and run by the same team. So you get prompt and reliable tech support.
  • It keeps you flexible. To build a defensible case, you need to be able to add and delete files freely. Many applications charge to process each file you upload, so you’ll be reluctant to let your case organically shrink and grow. And this stifles you. With GoldFynch, you get unlimited processing for free. So, on a 3 GB plan, you could add and delete 5 GB of data at no extra cost – as long as there’s only 3GB in your case at any point. And if you do cross 3GB, your plan upgrades automatically and you’ll be charged for only the time spent on each plan. That’s the beauty of prorated pricing.
  • Access it from anywhere. And 24/7. All your files are backed up and secure in the Cloud.

Want to learn more about GoldFynch?