What Does 'Normalizing' eDiscovery Data Mean?

04 July 2021 by Ross eDiscovery

Takeaway: Normalization is the process of making sure all your data is stored using a uniform set of rules. This might include having a specific format for phone numbers, fixed acronyms for job titles, and more. It might seem trivial, but most eDiscovery review tools would be useless without this. So, make sure to find eDiscovery software that offers reliable data normalization.

Think of normalization as making sure that data is stored uniformly across all your records.

Data normalization has been around for a while, long before eDiscovery. And it’s a way of making sure that information is stored using uniform rules. Say you’re in sales, with a list of prospective clients and notes about these clients. If you scribbled down this information on the go, it’ll probably be in different formats. For example, maybe you’ve written VP-Marketing for some clients, and Vice President of Marketing for others. Or maybe you’ve jotted down some phone numbers using the format (543) 210-0000 and others as 5432100000. When you’re just skimming these notes, these differences won’t matter. But for a well-ordered set of records, you’ll want to ‘normalize’ the formats. So, you’ll stick with VP-Marketing, and (543) 201-0000, perhaps. It doesn’t really matter what you choose, as long as you use the same rules everywhere.

Normalization in eDiscovery is an advanced version of the same concept.

eDiscovery normalization means applying uniform rules when filling in databases. So, what are these databases? Well, think of them as a highly structured way of organizing seemingly chaotic information. For example, the data you get from clients is usually a random mix of files like PDFs, emails, spreadsheets, and images. This is all useful information but it’s set up for humans to read – not for computers to process. Instead, computers need to slot all this information into behind-the-scenes databases before they can use it. (Imagine a database as being a spreadsheet with rows and rows of cells.) So, your eDiscovery software strategically breaks down uploaded files into information fragments that it then plugs into a database’s cells (also called ‘fields’).

Normalizing an eDiscovery database means making sure all its information fragments are structured the same.

It’s the digital version of the salesperson’s client information. That salesperson was tidying up their records just to make them more presentable. But if an eDiscovery database isn’t normalized, it becomes useless. That’s because tiny differences in format – which don’t bother us humans – will throw a computer off. For example, the word ‘cafe’ and ‘café,’ mean the same thing to us. But when translated into computer language, that tiny accent on top of the ‘e’ gives those two words vastly different ASCII values. And normalizing helps a computer sort this out. It’s the same for uppercase and lowercase letters. Normalizing how words are capitalized tells your software the difference between the custodian named ‘Sam’ and the job title acronym ‘SAM’ (Systems Administration Manager).

Normalization isn’t just for words, though. For example, images need to get normalized, too.

There are other things that need to be normalized aside from names and numbers. For instance, all the images in your case need to have consistent dimensions, orientations, resolutions, etc. Otherwise, there’ll be formatting problems down the line. (Perhaps some images will have blank space at the bottom for Bates stamps, but others won’t?) Even things like time zones have to be normalized or the same emails will show different time stamps depending on where in the world they were sent and received.

eDiscovery searches – one of the most vital tools available – wouldn’t be possible without normalization.

eDiscovery applications can make pretty advanced searches by combining keywords and metadata using Boolean connectors. For example, you can give your software a very specific search command, such as, “Show me the documents John Krasinski emailed Jenna Fischer. But only the ones with the keywords ‘Golden Ticket’ in them. And which were also sent before 2013.” This sort of complex multi-step search saves a lot of time, but your software will need to rapidly access database fields. And as we’ve seen, this wouldn’t work if those fields haven’t been normalized.

Poor-quality normalization can potentially ruin your data review. And that’s why it’s worth finding eDiscovery software you can trust.

At GoldFynch, we’ve tailored our eDiscovery service for small and midsize law firms. We focus on keeping our software simple, reliable, and affordable. And we’re continually refining eDiscovery basics like normalization. But there’s more about GoldFynch that might interest you.

  • It costs just $27 a month for a 3 GB case: That’s significantly less than most comparable software. With GoldFynch, you know what you’re paying for exactly – its pricing is simple and readily available on the website.
  • It’s easy to budget for. GoldFynch charges only for storage (processing is free). So, choose from a range of plans (3 GB to 150+ GB) and know up-front how much you’ll be paying. You can upload and cull as much data as you want, as long as you stay below your storage limit. And even if you do cross the limit, you can upgrade your plan with just a few clicks. Also, billing is prorated – so you’ll pay only for the time you spend on any given plan. With legacy software, pricing is much less predictable.
  • It takes just minutes to get going. GoldFynch runs in the Cloud, so you use it through your web browser (Google Chrome recommended). No installation. No sales calls or emails. Plus, you get a free trial case (0.5 GB of data and a processing cap of 1 GB), without adding a credit card.
  • It’s simple to use. Many eDiscovery applications take hours to master. GoldFynch takes minutes. It handles a lot of complex processing in the background, but what you see is minimal and intuitive. Just drag-and-drop your files into GoldFynch, and you’re good to go. Plus, you get prompt and reliable tech support.
  • Access it from anywhere, and 24/7. All your files are backed up and secure in the Cloud.

Want to find out more about GoldFynch?