We recently hosted a webinar “Recognize, Reduce, Review: Techniques to defensibly reduce your document review.” During the 30-minute presentation, Percipient’s Head of Forensics Vaish Palavalli and Document Review Project Manager Adam Szulczewski discussed eight techniques they use to reduce data collections to manageable sizes for efficient document reviews.
Check out the video of the presentation and summaries of the culling techniques below.
Potential Data Culling Ideas:
1. File Types
“…we can potentially reduce by over a third of the documents for review.”
The first topic covered was culling by file types. Vaish began by pointing out that it is generally a good idea to “recognize file types that are not relevant to your project and pull them out of your review set.” This is a type of data culling that may not always be done due to technological limitations. If you have access to technology that allows filtering and removal by file type, that can greatly reduce the initial data set and start your savings on storage space from the beginning.
2. Date Ranges
“….recognizing documents that fall outside of your project’s timeline and eliminating them from your review.”
Not all data reduction techniques need to be sexy. Adam discussed an “oldie but goodie”: culling data between two predetermined dates. “This cuts down on costs significantly, because you’re only hosting data that falls within this date range period and not documents that fall outside that range that you’ll probably likely never look at.”
3. Search Terms
“…one of the first things we always do with our clients is create a list of search terms”
A pivotal part of effective data culling is the development of search terms. Search terms can be used to both identify responsive data that will need to be reviewed and also to search for and remove non-responsive data reducing the noise during your review.
“If we can withhold privileged documents for later and review things that are less likely to need privilege culls first, then we can get our production out faster.”
Culling by privilege is a great technique to expedite your review. Segregating potentially privileged files at the outset of a review may limit your overall review size. This helps better understand what is a priority to review and then review secondary data later.
5. Near-Duplicate Analysis
“…how amazingly helpful deduping data can be when it comes to reviewing fewer documents.”
Lessening potential data for review with near-duplicate analysis is another helpful technique. Using “near dupe” analysis, a document is chosen as the base or pivot document. Then the ediscovery software compares other documents to the pivot document. The software is looking for textual similarities up to a pre-selected percentage. It is a very efficient way to quickly review a large amount of similar data and when augmented with human review of sample sets it is highly accurate.
“…recognize irrelevant data using a list of domains that are extracted from our data.”
Domain culling is a very simple technique and can cut review time and costs tremendously. Running a domain report early in a project permits an easy opportunity to identify irrelevant email and related data so it does not make it to the review set.
7. Email Threading
“…group-related emails together, so the reviewer sees one coherent conversation.”
A great way to reduce your data storage costs by at least 25% is through the use of email threading and identifying the most inclusive email in a chain. Using this technique permits reviewers to view the last or most inclusive email in an exchange rather than reading each one by one (and maybe not in the right order).
8. Artificial Intelligence & TAR:
“…there’s multiple ways to leverage analytics to help you cull down your review set…”
The final topic discussed was the use of AI. Vaish covered two types of AI for data culling. The first was clustering, a method that uses AI to collect and organize related data into buckets. The other AI tool discussed was Technology-Assisted Review (TAR). With the use of TAR, a software algorithm is trained to identify files that are similar or related to others that have been previously reviewed and tagged as responsive, privileged, etc.
Other Articles You May Be Interested In