Mathematical sampling is a very strong tool to gain a solid representational understanding of a large set of documents. (This is not to be confused with Predictive Coding which will be covered in a future Blog). While it sounds advanced and can be difficult to deploy correctly, it can easily assist users in understanding what documents are in a database by providing a broad overview of the contents. Cavo eD has created a sampling process that can be executed on an iterative basis with no training at any point during the eDiscovery process. It is another form of analytics that can be used by a Case Administrator to understand how best to proceed in setting up a review strategy, determine tagging protocols and informing senior counsel about some of the broad concepts in the data corpus.
Sampling can be used to test the waters in a variety of other ways as well. In preprocessing, sampling can be used to determine that document collecting for the database has been thorough and relevant. Sampling can be used to whittle down the corpus by a process of search, sample, analyze, and assign. Additionally, sampling can test the search parameters proposed by each party in the Meet and Confer process in the case to drive the thorough collection and review procedures. It is a tool that Cavo eD allows users to deploy at any point in the eDiscovery process to assist in understanding the document corpus.
How does Sampling Help?
One of the greatest uses of statistics and sampling is to generalize from knowledge of sample data to (very likely accurate) knowledge of a larger population. It allows you to gain a quick understanding of what your document corpus contains. It is not a complete picture, but a representative idea that can be used to determine a methodology for document review. Using a built in Sampling Generator assures you that the results are consistently determined regardless of who is doing the work.
The randomizer built in to Cavo eD allows you to select documents according to conditions that you set and control, with each document in the population having an equal likelihood of being chosen as part of the sample. The Sampling Generator allows the user to control three key measures:
- Margin of Error
- Confidence Level
- Probability distribution
A Practical Application of Sampling
The use of sampling that I am going to discuss today is how to take a mathematical sampling of documents and email messages to help develop accurate Tag Profiles and Rules so that a review can proceed quickly and efficiently. Taking the time to do this type of analysis and review a sample set of documents will reduce the number of rules changes that need to be made once the review has started.
To sample the entire database or subsection of the database, you either select a review set or other collection of documents in the Case Navigator (they could be the entire corpus or the result of a specific search.) From the Reviewer module on the Ribbon Bar, Click on Current Results and then Take Sample from the dropdown menu.In the Take Sample panel. You will note that we have set the defaults for the most common sampling formulas that provide a good cross section of data, these can be adjusted on the fly.
- Margin of error defaults to 5%:
- The Confidence level defaults to 95%,
- Probability distribution defaults to 50%.
- The system then calculates the minimum sample that you should select in the Recommended Section based on the overall size of the collection that you are sampling and places that number in the Enter a Sample Size. This can be overriden based on the conditions required by the user.
- Click Take a Sample. The results display in a Document Grid.
- Examining this sample document set will give you an excellent cross-section of likely documents in the larger population. Reviewing the sample will allow you to create Tag Profiles and Rules that are relevant to this particular case, based on ideas, themes and concepts that you determine are important from the sample.