A Complete Glossary of eDiscovery Terms

Archive - A database where records and files are stored for long periods of time.

Attachment - A file connected to (and accompanying) another file. The attachment can be either separate (e.g., a document attached to an email) or embedded (e.g., a video in a PowerPoint presentation).

Batching - Dividing a case’s data into subgroups (i.e., batches) to speed up processing and review. Data is often batched based on who it’s from (i.e., the custodian) and what it’s about (i.e., the topic).

Bates Stamps (Also, Bates Numbers or Bates ID) - An identification system to keep track of produced documents. Each Bates stamp is a string of letters and numbers unique to a single page or image in a production. The numbers are generated in sequence by the eDiscovery software. E.g., DEF0000003. Learn more about Bates stamping .

Boolean Search - An advanced search process where you’ll use Boolean terms like AND, OR, and NOT to connect multiple keywords. This helps cut down the number of irrelevant results. Without Boolean, you’ll be limited to searching for single keywords (e.g., the name ‘John’). With Boolean, you can give a much more specific instruction like, ‘Find all emails John Anderson sent to Sally Nedry, which also mention the Pfizer meeting. And they should have been sent before 2015.” Learn more about Boolean searches.

Case/Matter - All the files in an eDiscovery case. These files have all been processed by the eDiscovery software and are ready for review. (Note: The term ‘matter’ means ‘legal proceedings’, but essentially means the same as ‘case.’)

Child Document - A file that is attached to, or embedded in, another file. E.g., an email attachment or an image in a PowerPoint presentation. (The file to which it’s attached/embedded is called the ‘parent’ document.)

Collection - The process of getting copies of all the relevant files in a case. This needs to be done carefully (often by a digital forensics provider) without destroying or altering precious file metadata.

Container File - A single file that stores a mix of other files and documents. Containers are a convenient way of transferring files or compressing them to use less storage space. E.g., Microsoft Outlook OST and PST files, ZIP files, forensic image files, etc. Note: You’ll need to ‘unpack’ container files before taking a file count of your case. Learn more about container archives.

Culling - Removing irrelevant files from a case before formally reviewing your data. eDiscovery software often culls some irrelevant files while processing them, and you’ll isolate and remove the rest at the review stage.

Custodian - Someone from whom you’ve collected (or will collect) potentially useful data. They are not necessarily the person who created the data.

Data Extraction - The process where your eDiscovery software slots all your data into behind-the-scenes databases. It happens automatically when you upload your files, and along with identifying text, it’s how the software isolates keywords, dates, email addresses, and metadata for you to search through later.

Deduplication - Removing identical copies of files from your case. Deduplicating files means you have less to review and you’ll be more consistent. For example, you won’t mistakenly give different tags to two copies of the same file.

DeNISTing - Filtering out irrelevant computer-generated files (e.g., system, EXE, and font files) from your case by comparing them against the list of such files maintained by the National Institute of Standards and Technology (NIST). Learn more about DeNISTing.

Digital Forensics - The science of finding and collecting digital evidence (emails, PDFs, word documents, etc.) from devices like PCs, laptops, smartphones, servers, etc. It especially focuses on evidence that someone might be trying to hide. A digital/computer forensics expert needs specialized tools and often has to reconstruct damaged data. Learn more about digital forensics.

Discovery - The process of finding out what evidence opposing counsel intends to use in a trial. Both parties exchange the information they have so that no one is ambushed with new facts. Practically, this involves identifying, collecting, processing, reviewing, and producing data that might be relevant to a case. ‘eDiscovery’ is the digital version of older paper-based discovery.

Discovery Request (or just ‘Request’) - An official request to deliver relevant documents for opposing counsel to review.

Document Stamping - A way of labeling and categorizing your produced files with document stamps such as Bates Numbers, tags and other custom information. Learn more about using and placing document stamps.

e-Disclosure - The name for eDiscovery in the European Union and the UK.

eDiscovery or Electronic Discovery - The process of exchanging electronically stored information (ESI) before a trial, deposition, hearing, or mediation. This ESI includes emails, documents, spreadsheets, audio, video, presentations, social media, and more. By exchanging information, attorneys can ‘discover’ the evidence opposing counsel intends to use. And since both sides now have the same data, they can each build the most compelling case possible.

Electronically Stored Information (ESI) - Information stored in an electronic (i.e., non-paper) format. For example, - emails, PDFs, word documents, spreadsheets, presentations, social media posts, text messages, etc. Learn more about ESI.

Email Threading - Linking emails that are part of an email conversation so that you can read them in order. Email threading gives context to each email in the thread.

ESI Protocol - A set of agreed-upon guidelines and procedures governing the management of electronically stored information during litigation. These protocols cover the identification, preservation, collection, processing, and production of ESI. Learn more about ESI protocol.

File Family (or Document Family) - A group of associated files. For example, an email (the ‘parent’) and its attachment (the ‘child’). Or a PowerPoint presentation (the ‘parent’) and its embedded videos (the ‘children’). Learn more about file families.

Filtering - Finding the files you want by using ‘filters’ to search your case. These filters might include keywords, phrases, dates, custodians, tags, etc. Often used to reduce the number of documents to be reviewed.

Hash Value - The ‘digital fingerprint’ of a file or document. It is a string of letters and numbers automatically generated by a hashing algorithm. Hash values are so specific that altering even a single comma in a document’s contents will change its hash value. So, they are very useful for deduplication. Examples of hash value algorithms include MD5 and SHA1. Learn more about hash values.

Hosting - The process where an eDiscovery provider stores your data on their platform for you to review and produce. They often used decentralized, cost-effective Cloud servers, which means you can use their tools and features without investing in any hardware or expensive downloadable software. Learn more about Cloud hosting.

Identification - The process of setting the scope for eDiscovery. This might include identifying who has all the data (i.e., the custodians), where the data is stored, what categories of data are available, etc.

Legacy Data - Data from older eDiscovery software. Legacy data is often hard to access, process, and use because it was created using technology that’s now being phased out.

Legal Hold (Also, ‘preservation order’ or ‘hold order’) - A notification sent to an organization to preserve and protect potentially relevant data. It prevents valuable evidence from being destroyed by the organization's default document retention/destruction policies.

Load File - A text file that tells eDiscovery software how all the incoming files are related, and which often contains metadata of the files. For example, a production may have a folder named ‘IMAGES,’ with a confusing list of files: 0001.TIF, 0002.TIF, 0003.TIF, etc. The load files would tell eDiscovery software that images 0001.TIF to 0009.TIF are pages of a single document named ‘Report.docx’ created by user ‘John.’ The reference name given for each category of metadata is often referred to as a “Header” or “Field.’ Learn more about load files.

Metadata - Information about the data in an electronic file. For example, the ‘sent’ and ‘received’ dates of an email, the ‘creator’ of a PDF, the ‘last modified’ date of a Word file, etc. Metadata adds context to a file, making it very valuable in litigation. For example, you can find out who leaked privileged information by looking at the ‘sent from’ metadata field of an email. There are hundreds of types of metadata, but it’s hard to find it without some technical know-how. Learn more about metadata.

Native Format - The format in which a file was originally created. E.g., Microsoft Word creates DOCX files (i.e., files with a ‘.docx’ extension), Excel creates XLSX files, and so on. You’ll often use valuable metadata when you convert files out of their native format into an easy-to-share alternative like PDF or TIFF. This metadata includes things like comments in a Word document or pivot tables in a spreadsheet. Learn more about native files.

Near-duplicates - Documents that are similar but not exact copies of each other. E.g., an edited version of a Word document with a few commas added. It’s harder for eDiscovery software to find ‘near’ duplicates than exact duplicates. (Finding exact duplicates is called deduplication.)

Normalization - Converting data into a common, standardized format so it can be plugged into a database. Different applications will have their own set of rules for normalizing files.

OCR (Optical Character Recognition) - A software tool used to extract text from scanned documents, PDFs, and image files like TIFFs. Although these files might have text in them, your eDiscovery software can’t process this text until it’s converted into a computer-friendly form. Learn more about OCR.

Parent Document - The primary document in a file family. A file family is a group of connected files -- for example, an email with an attachment or a PowerPoint presentation and an embedded video. The main document in this family (e.g., the email or the PPT) is the ‘parent,’ and the secondary, connected ones are the ‘children.’

Personal Storage Table (PST) - The file format Microsoft Outlook uses to store emails, calendar events, contacts, etc. PSTs are also called ‘personal folders.’ Learn more about PSTs.

Predictive Coding - Using ‘machine learning’ (i.e., a form of artificial intelligence) to get eDiscovery software to organize and code (i.e., tag or label) files for you. You’ll initially train the software by labeling a few files until its algorithm sees patterns in your choices. Then, it will be able to predict how you would code the remaining files and code them automatically. Learn more about predictive coding.

Privilege - The right to keep certain information private. For example, an email between attorney and client may have privileged information redacted in an eDiscovery production.

Privilege Log - The list of documents that have been deliberately excluded from a production because they have privileged information. E.g., documents with attorney-client privilege.

Presentation - Displaying your productions at trials, depositions, hearings, and/or mediations. eDiscovery applications aren’t used for this step.

Preservation - Making sure important data is not altered or destroyed. Preservation ensures that you have all the evidence you’ll need to build a case. Ideally, you’ll give a ‘hold order’ to preserve files as soon as you suspect there will be litigation.

Processing - Ingesting collected files into eDiscovery software and preparing them for review. This includes things like (1) converting files into a common format, (2) extracting text and image content/metadata and slotting it into a database, and (3) flagging system files and errored files. Learn more about eDiscovery processing.

Production - Converting relevant files into a specified format to share with opposing counsel. Formats may include native files, PDF, TIFF, etc., as well as load files and Bates stamping on each page of the production. Learn more about producing files.

Redaction or sanitization - Censoring sensitive information in productions. Your eDiscovery software will let you ‘black out’ words, sentences and/or entire pages that have privileged information that opposing counsel isn’t cleared to see. Learn more about redaction.

Repository - A database storing your files/documents and their metadata. It’s often called a ‘records store,’ ‘records archive,’ or ‘online repository.’

Responsiveness - A measure of how relevant a document is for a case. Both parties in a case decide what data will be considered responsive (e.g., only documents from certain custodians or those created within a particular date range).

Review - Going through all the processed data to build your case. This includes checking and marking processed data for responsiveness and privilege, and can be done much more efficiently if you use a document review platform to search, tag and redact.

Search - Finding information in your case documents using keywords, keyword combinations, and/or metadata. Some platforms let you create and refine advanced ‘search queries’ with multiple search times to find the evidence you need. Learn more about eDiscovery searches.

Slip sheet - A placeholder document that represents a native file in your case. Native files have a different format to PDFs and TIFFs, so they’ll disrupt the Bates numbering sequence in a PDFs/TIFF production. A slip sheet corrects this by standing in for a native file. It gets numbered along with the other PDFs/TIFFs but will have text saying, ‘This file has been produced in native format.’

Structured Data - Data that has been fit into a database. To prepare documents for review, your software will break them down into their text and image content, along with their metadata. It then slots these into specific database fields (these are similar to the cells in a spreadsheet). This carefully segmented and organized data in a database is called ‘structured’ data. Learn more about databases and data structuring.

System Files - Digital files that are created by an operating system, a piece of software, or a device driver. (For example, files with the .SYS extension.) System files are essential for a computer to run properly but not for eDiscovery. Some eDiscovery software runs deNISTing to help get rid of them.

Tagging - Labeling a file so that it can be grouped with other similar files. (E.g., labeling a file as ‘confidential.’) Tags are ‘attached’ to a file without altering its contents, and they make it easier to find and produce related files. Learn more about tagging.

TAR (Technology-Assisted Review) - Using technology like predictive coding and advanced machine learning to speed up reviews and cut costs. First, reviewers tag files in a subset of the case. Then, the eDiscovery application analyzes the reviewers’ tagging patterns to ‘learn’ what they consider important. Next, it starts tagging files using these patterns, while reviewers correct errors. And through this iterative process, the software reviews the entire case in a fraction of the time the reviewers would take. Learn more about technology-assisted reviews.

TIFF (Tagged Image File Format) - An image file type often used in eDiscovery productions. They’re similar to other image file types like JPEGs/BMPs, and scanned documents are often stored as TIFFs. Learn more about TIFF files.

Unitization - Stitching together individually scanned pages into a single document.

Unstructured Data - Data that hasn’t yet been fit into a database. To prepare documents for review, Your software will break them down into their text and image content, as well as their metadata. It then slots these into specific database fields (these are similar to the cells in a spreadsheet). This carefully segmented and organized data in a database is called ‘structured’ data. Data that hasn’t yet been fit into a database is ‘unstructured.’ For example, emails, PDFs, scanned documents, images, etc. Learn more about databases and data structuring.

Zero-byte file - A file that contains no file data. Usually caused by data corruption during creation or conversion. Learn more about zero-byte files.

eDiscovery Glossary:
Every Essential Term You’ll Need to Know

What would you like to look up?

What is eDiscovery?

The Stages of eDiscovery

Commonly Used eDiscovery Terms

A Complete Glossary of eDiscovery Terms

eDiscovery Glossary:Every Essential Term You’ll Need to Know

What would you like to look up?

What is eDiscovery?

The Stages of eDiscovery

Commonly Used eDiscovery Terms

A Complete Glossary of eDiscovery Terms

eDiscovery Glossary:
Every Essential Term You’ll Need to Know