Skip to content

Menu

LexBlog, Inc. logo
CommunitySub-MenuPublishersChannelsProductsSub-MenuBlog ProBlog PlusBlog PremierMicrositeSyndication PortalsAboutContactResourcesSubscribeSupport
Join
Search
Close

What Can A Litigator Do When There are Hundreds of Thousands of Documents to Review in a Short Period of Time, and a Strict Litigation Budget is in Place?

By Katie Cole on April 26, 2018
Email this postTweet this postLike this postShare this post on LinkedIn

Traditional document review can be one of the most variable and expensive aspects of the discovery process.  The good news is that there are innumerable analytic tools available to empower attorneys to work smarter, whereby reducing discovery costs and allowing attorneys to focus sooner on the data most relevant to the litigation.   And, while various vendors have “proprietary” tools with catchy names, the tools available all seek to achieve the same results:  smarter, more cost effective review in a way that is defensible and strategic.

Today’s blog post discusses one of those various tools – predictive coding.  The next few blog posts will focus on other tools such as email threading, clustering, conceptual analytics, and keyword expansion.

Predictive Coding

Predictive coding is a machine learning process that uses software to take keyword searches / logic, entered by people, for the purpose of finding responsive documents, and applies it to much larger datasets to reduce the number of irrelevant and non-responsive documents that need to be reviewed manually.  While each predictive algorithm will vary in its actual methodology, the process at a very simplistic level involves the following steps:

  1.  Data most likely relevant to the litigation is collected. Traditional filtering and de-duplication is applied.  Then, human reviewers will identify a representative cross-section of documents, known as a “seed set,” from the remaining (de-duplicated) population of documents that need to be reviewed.   The number of documents in that seed set will vary, but it should be sufficiently representative of the overall document population.
  2.  Attorneys most familiar with the substantive aspects of the litigation code each document in the seed set responsive or non-responsive as appropriate. Mind you, many of the predictive coding software available allows users to perform classification for multiple issues simultaneously (i.e., responsiveness and confidentiality).  These coding results will then be input into the predictive coding software.
  3.  The predictive coding software analyzes the seed set and creates an internal algorithm to predict the responsiveness of other documents in the broader population.  It is critically important after this step that the review team who coded the seed set spend time sampling the results of the algorithm on additional documents and refine the algorithm by continually coding and inputting sample documents until desired results are achieved.  This “active learning” is important to achieve optimal results.  Simply stated, active learning is an iterative process whereby the seed set is repeatedly augmented by additional documents chosen by the algorithm and manually coded by a human reviewer. (This differs from “passive learning,” which is an iterative process that uses totally random document samples to train the machine until optimal results are achieved).

Once the team is comfortable with the results being returned, the software applies the refined algorithm to the entire review set and codes all remaining documents as responsive or unresponsive.

 

 

 

 

  • Posted in:
    E-Discovery
  • Blog:
    All About eDiscovery
  • Organization:
    Farrell Fritz, P.C.
  • Article: View Original Source

LexBlog, Inc. logo
Facebook LinkedIn Twitter RSS
Real Lawyers
99 Park Row
  • About LexBlog
  • Careers
  • Press
  • Contact LexBlog
  • Privacy Policy
  • Editorial Policy
  • Disclaimer
  • Terms of Service
  • RSS Terms of Service
  • Products
  • Blog Pro
  • Blog Plus
  • Blog Premier
  • Microsite
  • Syndication Portals
  • LexBlog Community
  • 1-800-913-0988
  • Submit a Request
  • Support Center
  • System Status
  • Resource Center

New to the Network

  • Boston ERISA & Insurance Litigation Blog
  • Stridon News and Insights
  • Taft Class Action & Consumer Insights
  • Labor and Employment Law Insights
  • Age of Disruption
Copyright © 2022, LexBlog, Inc. All Rights Reserved.
Law blog design & platform by LexBlog LexBlog Logo