To schedule a demonstration, call 1-800-998-4874

Themes are More Effective Than Keywords in Reducing Data During Early Case Assessment

The Way eDiscovery Should be Done



The Issue

A repeated problem in eDiscovery is determining the best method to reduce the volume of documents that must be reviewed by humans.  Tools like predictive coding and advanced search technology are two that are most often cited as effective ways to put documents into two separate piles: potentially relevant and not relevant.  However, many attorneys have problems in letting computers make these determinations for them, especially when judges hold them responsible for the work.  They are especially concerned with documents on the margin, that really should be examined, but can often times be reviewed quickly or in bulk.  So developing alternative tools to be used by the review teams and attorneys to achieve these same goals is an important consideration.

To offer an alternative approach, Cavo eD focused its attention on how to make better use of our unique document theme algorithm and created a workflow that can assist in quickly eliminating documents from the need to be reviewed by the full team.  First a quick explanation.  Rather than indexing only keywords from the text of all documents, Cavo eD uses an algorithm to capture document themes during processing, by using nouns, predicates and verbs to create a richer, more detailed source of information.   Additionally, the algorithm measures how dominant each them is in the document, based on the frequency and location within the document.

The approach focuses on the analytic results displayed in the Themes Count.  This report aggregates all the documents in a corpus by their shared themes, listing each theme along with the number of documents where it appears.  It does away with the challenges of a corpus being full of disorganized and unknown information by revealing themes that may not have been considered by the staff as important or even part of the language they were aware of.   This insightful road map rapidly accelerates the user’s basic knowledge about the data.  Users can access all the documents that contain a theme of interest with a single click of the mouse.

Theme Clustering takes the automatic grouping of documents a step further than keyword clustering.  Clustering can further leverage the themes form the Themes Count list by grouping them so that they have more meaning then merely clustering keywords.

Using both these tools together and reviewing a percentage of the documents that are contained within the theme categories can often lead to a reasonable determination about relevancy and then applying bulk tagging tools; the documents can be set aside from the documents that need to be reviewed.  This workflow is not a definitive method of parsing all documents correctly, but it can be used to remove large numbers of unrelated or unresponsive documents.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>