GoldFynch's de-duplication function helps identify whether there are multiple copies of the same file present in a case, and flag such files with a special "DUPE" tag. You can use different Strategies(modes) to compare the files, and you can also select the Scope of the comparison. Once the selections are made an initial evaluation is executed and the detailed statistics related to the information that falls within your Strategy and Scope are displayed.
When a de-duplication operation is run, all duplicate documents are collected into groups and within these groups, one or more will be designated as a primary candidate and all the others will be duplicates
The de-duplication process can be run on a specific group of files as shown below;
Whole Case - All the duplicates in the case will be found
Whole Case vs. Folder - Compares all files in a single folder against the entire case to see if any of the files can be found.
Folder A vs. Folder B - Compares all the files in one folder (also called Target) against all the files in another folder (also called Source)
The method used to compare the files and identify the duplicates is known as the de-duplication strategy. The different strategies available on GoldFych are:
Hash-based Strategy - Compares the item hashes directly and can be used for all types of files. Learn more about MD5 hash values here.
Message-ID based Strategies - These strategies are primarily used to compare eml/msg files and looks are Email-IDs/Message-IDs to find duplicates. If an item does not have a Message-ID it will be ignored. The Message-ID based options listed below compare the following parameters and require them to be the same to be flagged as duplicates.
- Message-ID alone
- Message-ID and Email/Message Subject
- Message-ID, Email/Message Subject, and Time of the Email/Message
Step 1. Navigate to the
De-Dupe view by click on the button in the left pane
Step 2. Click on the
+New De-dupe Session button
Step 3. Enter a name for the de-dupe
Step 4. Click on the
Step 5. Select the De-Dupe Scope
- If you select the
Whole Caseoption, it is recommended you check the "Untag current case-wide DUPEs and start over" checkbox (5b.) to provide an accurate evaluation based on current dupes present in the case
- In case you have selected either the
Whole Case vs. Folder Aor the
Folder A vs. Folder Boptions, you will be prompted to select the folders you wish to compare. To do so, click on the corresponding
Browsebutton and select the folder to be used for the de-dupe process.
Step 6. Select the De-Dupe Strategy from the drop-down list
Step 7. Click on the
Save and Evaluate button. Once the evaluation process is completed a report of the specified datasets along with information about the duplicates present in them will be displayed.
- If no duplicates are found you will not be able to proceed further
- You can also save a de-dupe session and come back to it at a later time to apply it by clicking on the folder icon.
Step 8. Click on the
Apply.. button to run the final de-dupe process.
Once the de-duplication process is complete you will see a confirmatory message at the top of your screen with the Scope and Strategy used.
If a more recent de-dupe operation has been performed, the dedupe session will indicate this instead.
The system scans for conflicts within the selected file set(s) that may affect the de-dupe process and will display warnings in the following scenarios:
- When files within a de-dupe session have different processing states - For example, if a file that is still processing is compared to an identical, fully-processed file it will not register as a dupe. This can affect the hash-based algorithms.
- When two or more items within a group have redactions
- When applying a dedupe session if you are untagging case-wide dupes and starting over
Save and Re-evaluate a De-Duplication Session
You can re-evaluate a de-duplicate session as long as it has not been applied. The steps to do so, are given below:
Step 1. Navigate to the
Step 2. Click on the folder icon against the de-dupe session you want to re-evaluate
Step 3. Make any changes that you wish and click on the
Save and Re-evaluate button
Note: If you change the strategy or scope, then the message
Evaluation is required to compute new statistics. Your settings have changed since the last snapshot was taken will be displayed.
Delete a De-Duplication Session
Step 1. Navigate to the
Step 2. Click on the delete icon against the de-dupe session you want to delete
Step 3. Click on the
Delete button on the
Delete De-Dupe Session screen overlay
Note: Alternatively, you can delete a de-dupe session by navigating to a de-dupe session and then clicking on the
Delete button on the screen.