Processing data in eDiscovery is one of the most fundamental and important steps in preparing Electronically Stored Information for review by a legal team. Since it is such a fundamental requirement of eDiscovery, most people assume that “processing is processing”, it is very generic and they don’t pay attention to how much and what type of information is processed by various platforms and organizations. When a company processes a minimal amount of information to get data into a review platform faster, the processing speeds may seem artificially high. What that often means is that data must be processed a second time in order to perform more advanced analytics on data. This distinction is often not transparent to clients and limitations might not be apparent until it is too late and processing data a second time may result in significant delays. How should processing be handled in the first place?
At Cavo eD, when we quote a processing speed, we are quoting a true processing speed that includes extracting all the information that might be needed in one pass. It includes everything that is needed to prepare data for search, review and analytics. Cavo eD’s Processing Plus+ is in fact, one pass processing. It is all inclusive, containing a full range of capabilities and customization to fit virtually any need that might arise. We even include the load time it takes to copy the raw data to the Cavo platform for processing to begin. Our processing is fully distributed and scalable which allows us to simultaneously spread the data processing among multiple servers in the AWS Cloud or other Cloud configuration.
The Details and Benefits of Processing Plus+
- Processing Plus+ – We have achieved a processing speed of 57GB per hour using raw, unfiltered, unculled data from a major case. All steps are included in this speed rate: data loading, exploding PSTs and containers, customizable deNist lists, multiple deduplication options, boilerplate data removal, theme capturing, email threading, OCR with documents loaded and ready for further analytics or first pass review. Customizable settings include:
- Thematic Document Capture – stronger than keywords, our algorithms capture and rand multiple themes in each document, creating a thematic fingerprints which allow more comprehensive and targeted searches among documents.
- Boilerplate Option – Using the boilerplate template, users can eliminate language patterns that are unneeded such as standard disclaimer language that appears at the bottom of so many corporate email messages. This reduces the volume of false positive search results, allowing users to focus on truly relevant document content.
- DeNist – In addition to the standard NIST list available to exclude files from processing, customizable lists can be created to further reduce unneeded data files from being processed.
- De Duplication – Multiple deduplication options are included to provide fully customized options based on the specifics of the litigation. Duplicates can be displayed during the review or hidden from review, based on project needs.
- OCR – All non-text documents are automatically OCR’d to provide text for thematic searching.
- Embedded Objects – The maximum number and sizes of embedded objects to include in the searchable data and whether to display them within documents or as separate documents can be selected before processing.
- Hidden Data – System can set to display hidden data (e.g., white text, document changes, etc.) after processing
- Culling Settings – Data can be culled by type, custodian, size and folder for exclusion or processing priority.
The power of one pass processing puts the user in total control of how to customize what is needed in each case. Understanding your options before processing begins allows you to make the right decision with full knowledge and accelerate your understanding of the document corpus.
The Way eDiscovery Should be Done…