When you have been involved in a particular area of law for a long time (eDiscovery), you often forget that not everyone in the world has the same knowledge that we do. We tend to assume that everyone in the field has the same complete understanding that we do, while in fact many people are missing some of the fundamental building blocks that we think are common knowledge. This new periodic posting series is geared towards helping people who have limited exposure to the world of eDiscovery and may need help with specific terms and definitions.
The word “deNIST” is an important concept in eDiscovery and yet “NIST” is the root that must be understood first. NIST stands for the National Institute of Standards and Technology, a group that operates under the umbrella of the National Software Reference Library Project whose purpose is to promote the efficient use of technology in the investigation of crimes. The NIST list is a Reference Data Set (RDS) of information of every file type used in the computer industry and it is updated four times per year and is made available to anyone. The RDS is a collection of digital signatures of known, traceable software applications (now numbering over 28 million), which is defined by a unique hash value.*
The RSD list contains every file type that is not likely to contain relevant information for investigative purposes. Most software applications consist of multiple files, each with a unique hash value* . For instance when a program like Microsoft Word is installed on a computer, hundreds of standard files are copied to a hard drive so Microsoft will run. Multiply that by all the programs installed on a computer’s hard drive and it is easy to understand that there are thousands of files on every computer that have no evidentiary value, because they contain standard operating file requirements which are non-user files. Examples of file types many have heard of include:
.EXE – Executable files that launch software programs
.WIN – Windows system files
.DAT – DOS Basic files
.DLL – Dynamic Link files used for holding multiple codes for Windows programs
One of the primary goals of processing data in eDiscovery is to reduce the volume of information that must be examined; eliminating files reduces time to review and reduces cost of the review. So most eDiscovery processing software has built in the ability to use the NIST list to remove (“DeNIST”) all the files that do not need to be examined during the course of the litigation. The software compares all the Electronically Stored Information against the RDS and removes the ones unlikely to have relevant information.
Cavo Legal has taken DeNIST one step further by allowing a customized list of files to be added in addition to the standard NIST removal list. This was done for two primary reasons. First the NIST list is always several months behind in its updates, so by definition, the list may be slightly out of date. Secondarily, file types particular to your litigation may not be on the master list, so having the ability to customize the list allows users to meet the needs of each litigation.
* Hash value – A hash value is a numeric value of a fixed length that uniquely identifies data. Hash values represent large amounts of data as much smaller numeric values, so they are used with digital signatures. Every different software has a unique hash value that makes it unique and therefore traceable