What is 'Stemming'? And How Does It Help eDiscovery Searches?

04 August 2019 by Anith Mathai eDiscovery law-firm stemming

Takeaway: A good eDiscovery search is about more than just keywords. It’s about finding the information you need. Search features like ‘stemming’ keep eDiscovery searches flexible enough to find the important stuff that’ll help you build your case.

Every eDiscovery search begins with a keyword.

In the old days of ‘paper’ discovery, we’d spend weeks reading through hundreds (or thousands) of documents, noting important facts to help build our case. Now, eDiscovery search engines take just seconds to find files with our keywords. So, for example, if we’re looking for the word ‘acquitted’, the search engine shows us the places in our documents where the word ‘acquitted’ popped up.

But a search is about more than just a keyword. It’s about context.

Most of the time, we’re looking for more than just a particular word. We may search for ‘acquitted’, but are actually looking to prove that a client was considered ‘not guilty’ in an earlier case. So, now there are other related words that are equally important. For example: ‘acquits’, ‘acquit’, ‘acquitting’ and ‘acquittal’. Because they’re all part of the same general theme.

This is where ‘stemming’ comes in.

With stemming, your search engine adds a quick step before it starts to search. It trims a keyword down to its root – or stem – and then searches for variations of this stem. So, it’ll first trim ‘acquitted’ down to ‘acquit’, and then look for its variations. Which makes for more effective searches. (Note: Did you notice how Google suddenly became easier to use in the mid-2000s? One of the reasons is that it started stemming. Before 2004, Google wouldn’t have seen a connection between ‘acquit’ and ‘acquitted’.)

eDiscovery search engines stem words using a bunch of algorithms.

These algorithms use rules like:

  • “If the word has at least one vowel and consonant plus ‘EED’ ending, change the ending to ‘EE’”. So, ‘agreed’ becomes ‘agree.’
  • “Convert the plural form of a word to its singular form.” So, ‘children’ becomes ‘child’.
  • “Convert the past tense of a word to its present tense and remove the suffix ‘ing’.” So, ‘played’ becomes ‘playing’ which becomes ‘play’.

And stemming is just the beginning. There are loads of other eDiscovery search tools.

For example:

  • Boolean searches: Mix and match keywords using ‘Boolean’ operators like ‘AND’, ‘OR’ and ‘NOT’. So, for example, tell your eDiscovery software to “Find all emails John Anderson sent Sally Nedry, which mention the Pfizer meeting. And which were sent before 2015.” Learn more about Boolean searches.
  • Fuzzy searches: Automatically pull up words with almost-similar spellings. Perfect for catching misspellings. So it won’t matter if you type ‘Johnn’ by mistake.
  • Slop searches: Instead of searching for an exact phrase, your search engine pulls up results even where the keywords aren’t right next to each other. So, if you search for “Friday deposition”, your search engine will show you “Friday night’s deposition”, too. Even though it’s not an exact match. Perfect for when you want to broaden your search net. Learn more about how to do a slop searches.
  • ‘Stop lists’: There are some words that pop up often but don’t add meaning to a search phrase. For example, “the,” “and,” “a,” “them,” etc. Search engines make a list of these ‘stop’ words and exclude them from searches – which saves a lot of time. That’s why the search terms “deposition” and “the deposition” will get you the same results.
  • Technology-assisted review (TAR): Also called CAR – Computer Assisted Review, or predictive coding. Here, your search engine studies the files you mark as ‘relevant’ and learns what you’re looking for. It then starts pulling up similar documents for you to review. Which saves a lot of time. It’s like how YouTube learns your taste in videos and then suggests new clips that you might like to watch.
  • Metadata searches. The search engine uses metadata to find the files you need. With documents, that could be things like when they were created, who created them, etc. With emails, it could be when they were sent, when they were opened, and who opened them.

Looking for eDiscovery software that makes searching easy? Try GoldFynch.

  • It costs just $25 a month for a 3 GB case: That’s significantly less – every month – than the nearest comparable software. And hundreds of dollars less than many others. With GoldFynch, you know what you’re paying for exactly – its pricing is simple and readily available on the website.
  • It’s easy to budget for. GoldFynch has a flat, prorated rate. With legacy software, your bill changes depending on how much data you use.
  • It takes just minutes to get going. It runs in the Cloud, so you use it through your web browser (Google Chrome recommended). No installation. No sales calls or emails. Plus, you get a free, fully-functional trial case (0.5 GB of data and a processing cap of 1 GB), without adding a credit card.
  • It can handle even the largest cases. GoldFynch scales from small to large, since it’s in the Cloud. So, choose from a range of case sizes (3 GB to 150 GB, and more) and don’t waste money on space you don’t need.
  • You can access it from anywhere. And 24/7. All your files are backed up and secure in the Cloud. And you can monitor its servers here.
  • You won’t have to worry about technical stuff. It’s designed, developed, and run by the same team. So, its technical support isn’t outsourced. Which means you get prompt and reliable service.

Want to learn more about GoldFynch?