E-Discovery is the process of collecting and exchanging electronically stored information (ESI) that is requested during the discovery phase in civil litigation. The need for a well devised e-Discovery process for litigation stems from the Federal Rules of Civil Procedure (FRCP), as several of these rules directly impact the e-Discovery process. The Electronic Discovery Reference Model (EDRM) was developed in 2005 to help create best practices and guidelines for attorneys, e-Discovery vendors, and anyone working within the e-Discovery field. The EDRM project has become the de-facto standard for navigating through the e-Discovery process and helping to adhere to the federal laws governing it. Further information regarding the EDRM model and projects can be found at www.edrm.net.
There are 9 phases in this model, which begins with the Information Management phase and ends with the Presentation phase. While these two phases are very important in e-Discovery, it is the middle seven phases that will be the focus of this article.
The Identification phase of the EDRM helps to determine the potential locations of ESI for the discovery plan. The most common locations for potential ESI that may need to be preserved are hard drives of custodian computers, e-mail, network home shares, network group or department shares, backup tapes, and external media such as CD/DVD's or flash drives. Other less common locations would be structured databases, documents in collaboration tools such as SharePoint, and instant message chat logs. During this phase the legal team may interact with the client's IT team and outside consultants to perform an early case assessment which will help to uncover additional locations of potentially relevant ESI and possible additional custodians who were not initially identified. Conducting interviews with key custodians, reviewing e-mail conversations, and sampling data from initially identified ESI locations are different techniques used while performing an early case assessment during the Identification phase to determine what needs to be preserved and collected.
The Preservation phase is where the duty to preserve evidence is initiated; preservation should begin when possible litigation is reasonably anticipated. The litigation hold management process is essential to ensure that the correct custodians and information were part of the collection process. As custodians may be added or removed from a matter based on the investigations during the Identification and Preservation phases this could require additional holds to be sent out or the hold to be removed from a custodian.
Litigation hold management includes maintaining accurate records of when the notices were sent, acknowledged by the custodian and Information Technology (IT) department, and the hold was removed. The company's IT department will also need to receive a copy of the litigation hold for each custodian as they will need to help ensure all information management and document retention policies are suspended for each custodian to ensure the data and its related metadata is not deleted or corrupted.
Metadata is essentially detailed information about an electronic file. Some of this information can be seen and edited by an individual when creating or using an electronic file, whereas other metadata is generated by the computer or application that created or used the file. Some examples of user created metadata are the document properties in Microsoft Office Word documents like author and title, while examples of computer or system generated metadata are the date and time a file was created, modified, and accessed.
The duty to preserve potentially relevant documents and its metadata is not one taken lightly by the courts. Sanctions have been levied in cases where a party does not fully comply with the discovery request. The examples of sanctions that have been brought against a party when preservation of requested data has not been effective have ranged from attorney fees and costs to awards in excess of $20 million dollars as in the case of Zubulake v. UBS Warburg. While the duty of preserving ESI is determined during the preservation phase, this data will not actually be captured until the Collection phase.
During the Collection phase, the ESI petitioned for in the discovery request is acquired. There are different methods of collecting ESI, some of which are self-collection and forensic imaging methods. Self-collection refers to individual custodians manually copying files or forwarding emails that they believe are relevant based on the litigation hold notice they received. This method can be used effectively for basic data sampling purposes during the Identification phase for early case assessment, as it is a very low cost method. For use as a complete collection method, self-collection may be very risky for several reasons: custodians may unknowingly or knowingly not forward all of their relevant data, the metadata of the files can possibly be altered during the collection, and the chain of custody documentation may not be complete.
In contrast, there are two methods of forensic imaging collection, which are generally performed by a third party vendor: the full forensic disk collection and the targeted forensic collection. Each of these methods consists of creating a forensically sound container to hold all of the data and its associated metadata.
A forensic image is a bit by bit copy of the entire physical media, where even the operating system (OS) related files and inactive data such as the unallocated disk space can be collected and preserved. The unallocated disk space of a hard drive consists of the area of hard disk where data has either not been written to, or that has been deleted and/or overwritten and is generally not accessible to a normal user. A full forensic disk collection may be necessary in cases related to trade secret misappropriation or intellectual property theft where in-depth forensic analysis may need to be performed due to a concern as to how the data was used.
A targeted forensic collection is a bit by bit copy of specifically targeted files from a custodian's computer or a network location. All of the metadata for each file that is collected is still forensically preserved, but the OS files and the unallocated space of the drive is generally not collected and preserved. A targeted forensic collection can help reduce the amount of data that will need to be processed and reviewed. Generally the targeted forensic collection can be performed over the network which allows custodians to retain the use of their computers and reduces down time and helps to mitigate business interruption.
Although both forensic imaging processes are typically more expensive because trained forensic professionals are required to perform a collection of this type, it is a lower risk method since chain of custody is created and maintained throughout, all metadata is preserved, and a hash value is automatically created not only for the individually collected files, but for the forensic image file as well. A hash value can be thought of as a digital fingerprint, and is used to determine whether two files are exact duplicates of one another. There are two main types of hash values that are commonly used during the e-Discovery process, MD5 or SHA-1. Both of these hash values are calculated by a mathematical algorithm that examines the content of a file and generates the value based on that content. When a file has any data added or deleted the hash value of that file will change.
The Preservation and Collection phases can be conducted simultaneously; as soon as the litigation hold for a custodian has been issued the collection can begin. Also the preservation and collection of ESI on network shares can begin as soon as they are identified. It is essential for attorney's to maintain an active and open dialog with their e-Discovery vendors and clients during the collection phase as it helps to ensure that the e-Discovery process stays on track, and on budget, while the teams are in a better position to handle any unexpected issues or concerns that may arise.
In the Processing phase the collected ESI is indexed, searched, and de-duplicated based on the requirements necessary for the litigation to reduce the number of non-relevant documents before the Review phase begins. There are several different methods that can be used during the processing phase to reduce the data set such as date range searching, keyword searching, removing known operating system files, and de-duplication. The metadata for the ESI can be extracted, indexed and searched during this phase as well.
Keyword searching is the most common method used to find documents pertaining to the subject of the litigation. The use of keywords searching and the terms to be used for searching are generally discussed and agreed upon during the Rule 26(f) Meet and Confer session. Some common keyword search techniques that can be used are Boolean, proximity, and stemming. Boolean searching uses the AND, OR, NOT operators to locate the specific keywords in a document, while proximity searching is the ability to search for keywords that appear within a certain number of words from each other in a document. Stemming allows for the grammatical variations of a keyword to be found.
While searching techniques can drastically reduce the amount of ESI that is filtered to the Review phase, it will not eliminate the need for attorneys to physically review documents that have met search criteria.
During the Review phase, ESI that has passed the filtering criteria from Processing is reviewed by the attorneys for privilege status, responsiveness and relevance to the litigation, and whether any redactions should be performed prior to the production of the files to the other party. An e-Discovery document review is usually performed as native file review or TIFF/PDF based review and can be done in a web based application or in a desktop based application.
A native file review allows the attorneys reviewing the ESI to see the data within its original application format such as Excel. However email files are generally converted to an html based file format for a native review. One downside to a native file review is the possibility that the metadata for the file can be changed when opened natively by a reviewer. To safeguard against this problem, most electronic hosting review platforms can place documents in a read only status, so that if the files are opened natively the metadata is not changed.
A TIFF/PDF review is where all of the documents have been converted to an image file, such as a PDF or TIFF. This type of review assures that the metadata of the original ESI cannot be altered but it does not allow the reviewer to see the file in its native application. Another type of review is a non-responsive document review which consists of reviewing a sample of the documents that are considered non-responsive by either the keyword culling, or by the review team. This is a good practice because it allows the lead attorney to verify that critical concepts are not being missed during the review, because a document may not contain a specific keyword but may still relate to the litigation. If a high number of documents are found to be responsive during this type of review, it may require a change in the culling techniques or a change of directive for the review team.
The Analysis phase of the EDRM should be taking place concurrently with Review. During Analysis, higher level attorneys or subject matter experts are needed to evaluate the processed data as this phase requires superior knowledge of the litigation subject matter. Analysis is not a stand-alone phase as it is iterative in nature; it works in conjunction with the Processing and Review phases until all of the data has been processed and reviewed.
There are different technologies such as concept searching tools that can be used to help bring forth key concepts and subject matter in the ESI. Concept searching enables a reviewer to examine the collected ESI for the meaning of phrases and the documents' subject matter, not just a specific word. It can be a powerful tool that is used to uncover relevant documents or potential keywords that may not have been found using a traditional keyword search. As with all of the culling techniques discussed, no method is perfect, but when used in conjunction with attorney analysis and review they can provide time and cost savings during the e-Discovery process.
The Production phase is tied to different FRCP Rules, which provide the guidelines to determine how, when, and what information is to be produced in response to a discovery request. Rule 34(b) allows for the requesting party to specify how the ESI should be formatted and how it will be produced to them. Types of possible ESI production are paper, native file or image file based.
- A paper based production requires that all of the relevant ESI be printed and produced to the other party. Due to the possibility of a large volume of data from a discovery request, the cost of printing all documents could be extensive.
- A native file based production contains the producible documents in an electronic format with additional electronic files containing formatting information.
- An image file based production is where the producible document is generally a PDF or TIFF image file. Image file is currently the most common type of production, as it easily allows for both the redaction of the documents without altering the native file, and allows the other party to easily ingest the files into their system.
With both the native file and image based productions, the data is generally stored in a load file which can contain the metadata, extracted text and formatting information of ESI being produced. Rule 26 also discusses when the producible data should be delivered to the requesting party. Options for delivery of the data included as a final production or a rolling production.
- The final production occurs after all of the collected data has been searched, reviewed and is delivered at one time to the requesting party.
- A rolling production allows the documents to be produced to the requesting party at different stages. This method can be used to provide data sampling, small test runs to ensure the requesting party has no concerns with the received data format, and the prioritization of custodian data.
The timing and production methods are generally agreed upon during the 26(f) conference but may be modified in a Rule 16(b) scheduling order issued by the court. The purpose of the Rule 16(b) scheduling order is to detail the time necessary to complete discovery and modify its extent, file motions, amend pleadings, and assert claims of privilege or protection after the data is produced.
The e-Discovery process like so many other processes has many techniques and technologies that can be used during the various phases to reduce the initial high volume of data to a smaller set of files, based on relevancy and privilege, which may ultimately reduce the time and cost of attorney review.
For everyone in this field it is extremely important to stay current on case law, and maintain a good understanding of the Federal Rules of Civil Procedure and its amendments to help ensure successfully litigated cases. Working with reputable vendors, using guidelines such as EDRM and resources such as the Sedona Conference, www.thesedonaconference.org, will allow you to navigate successfully through your e-Discovery cases to create a defendable, repeatable process and mitigate the risks associated with e-Discovery.