ICAIL 2013 Workshop on
Standards for Using Predictive Coding, Machine Learning,
and Other Advanced Search and Review Methods in E-Discovery
(DESI V Workshop)

June 14, 2013, Casa dell'Aviatore, viale dell'Universita 20, Rome, Italy

Purpose | Background | Agenda | Papers | Slides
DESI History | References | Important Dates | Submissions | Organizing Committee


The DESI V workshop is intended to focus on best practices and standards for using predictive coding, machine learning, and other advanced search and review methods in e-discovery. The scope of methods to be considered include applications of automation to any aspect of e-discovery (e.g., early case assessment, review for responsiveness, or review for privilege) with the goal of improving accuracy, reducing cost, or both. The workshop is intended to build upon past discussions in ICAIL/DESI forums in promoting the use of AI and other advanced forms of search techniques in legal settings, as cost-efficient alternatives to traditional Boolean and manual searching.


Lawyers around the world continue to face the problem of how to effectively and efficiently conduct searches for relevant documents, across increasingly complex, enterprise-wide collections within corporate and institutional settings. As noted by The Sedona Conference®, a leading legal think tank, "the legal profession is at a crossroads" in terms of its willingness to embrace automated, analytical, and statistical approaches that may help solve the twin problems of volume and complexity of data (The Sedona Conference, 2009). Rules development over the past half-decade governing the conduct and practice of civil litigation, particularly in the U.S. and U.K. has fostered new approaches by encouraging parties to "meet and confer" on matters involving "electronically stored information" (ESI), including issues involving continued storage, preservation, and most importantly, access to ESI in the parties' respective custodies. Until this year, however, there had been limited consideration in published opinions of advanced search methods familiar to the AI and information retrieval communities. That has now changed.

Indeed, the year 2012 marked a watershed in the development of legal practice in the United States, with published opinions in a total of three courts, and a multi-day evidentiary proceeding in a fourth court, all concerning the propriety of lawyers' use of advanced search techniques in civil litigation to find relevant documents in large evidentiary data sets. In particular, the judicial ruling in the federal case of da Silva Moore v. Publicus Groupe (U.S. District Court, New York), has advanced the law in this area, where the magistrate judge found that lawyers need no longer consider themselves "guinea pigs" in coming into court to justify the use of advanced search techniques such as predictive coding, as the court was willing to give a judicial blessing to the use of such advanced techniques in the name of greater efficiency and efficacy in results. The Moore court specifically took judicial notice of past research in the information retrieval area, including published reports arising out of the TREC Legal Track and related endeavors. Other rulings have followed, including a recent case in which the Court on its own initiative ordered the use of predictive coding without a prior motion from either party.

Notwithstanding these early judicial blessings as well as the widespread attention presently being given to the use of advanced search techniques during document review processes, there is no widely agreed-upon set of standards or best practices with respect to how lawyers actually go about using these new techniques. Indeed, two of the "joint protocols" entered into in both the Moore case and another case, In re Actos (U.S District Court, Louisiana), have proven to be controversial in requiring a level of transparency in the sharing of information between opposing parties to carry out quality control sampling as well as in judging what constitutes relevant or non-relevant evidence. Past DESI workshops, as well as leading organizations such as The Sedona Conference®, have also recognized the lack of best practice standards in this area. (DESI IV, 2011; The Sedona Conference, 2009).

The purpose of the DESI V workshop will be to continue to advance the discussion by providing a platform for the consideration of best practices in using predictive coding and other forms of machine learning algorithms. We invite participation from e-discovery stakeholders and practitioners from the law, government, and industry, along with researchers on process quality, information retrieval, human language technology, human-computer interaction, artificial intelligence, and other fields connected with e-discovery. The dialogue at the workshop will be expected to center around how lawyers are currently using such techniques, how better protocols can be developed that will satisfy the interests of the legal community, and what open questions exist that would benefit from further research into optimizing the use of these techniques in a variety of legal and investigatory settings. (See Call for Submission for further details.)


A final schedule and speaker bios are now available.


Research Papers

Additional Papers

Slides and Other Materials

Keynote Addresses

Discussant Notes

Research Presentations

Other Presentations

DESI History

At the beginning of the second decade of the 21st century, lawyers continue to increasingly face the problem of how to effectively and efficiently conduct searches for relevant documents across increasingly complex, enterprise-wide collections within corporate and institutional settings. Hundreds of millions of documents are now routinely subject to searches across a wide spectrum of litigation and investigatory contexts (e.g., the Lehman Brothers investigation necessitated a review of 350 million pages or 3.5 petabytes of material). In recognition of this situation, in 2006 the United States adopted new rules governing civil litigation in federal courts. These courts have recognized "electronically stored information" (ESI) as a term of art, embracing all forms of electronic documents made subject to the civil discovery process. Under the new rules, opposing parties in federal court litigation now have an early "meet and confer" duty to discuss a range of electronic discovery ("e-discovery") issues, including the continued storage, preservation, and access to ESI in their respective physical and legal custodies.

DESI V follows four successful prior DESI (Discovery of Electronically Stored Information) Workshops: at ICAIL 2007 (DESI I, Palo Alto), ICAIL 2009 (DESI III, Barcelona), ICAIL 2011 (DESI IV, Pittsburgh) (and an intermediate workshop (DESI II) sponsored by University College London in 2008. In DESI I, a wide array of individuals came together for perhaps the first time to foster engagement between e-discovery practitioners and a broad range of research communities who might contribute to the development of new technologies to support the e-discovery process. The DESI II and III workshops broadened the scope of this discussion to include comparisons of requirements between differing national settings and legal environments. DESI IV built on these efforts, in having a first-of-its-kind general discussion of standard-setting for the legal profession through contemplation of ISO 9001 frameworks as well as capability maturity models. The present workshop will greatly benefit from all of the past discussions.


Much has been published on E-Discovery generally, so no list of references could hope to be complete. Here are a few papers that we know of that we believe would be useful as background reading for the focus of this workshop. Please send recommended additions for this list to oard@umd.edu.

  1. K. Ashley, Can AI & Law Contribute to Managing Electronically Stored Information in Discovery Proceedings? Some Points of Tangency, DESI I Workshop, Palo Alto, June 4 (2007).
  2. K. Ashley and W. Bridewell, Emerging AI & Law approaches to automating analysis and retrieval of electronically stored information in discovery proceedings , Artificial Intelligence and Law 18(4), pp. 311-320 (2010)
  3. J. Baron, Law in the Age of Exabytes: Some Further Thoughts on 'Information Inflation' and Current Issues in E-Discovery Search, Richmond Journal of Law and Technology, 17(3), Spring (2011)
  4. J. Conrad, E-discovery revisited: the need for artificial intelligence beyond information retrieval , Artificial Intelligence and Law 18(4), pp. 321-345 (2010).
  5. J. Conrad, E-Discovery Revisited: A Broader Perspective for AI Researchers, DESI I Workshop, Palo Alto, June 4, 2007,
  6. M. Grossman and G. Cormack, Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, Richmond Journal of Law and Technology, 17(3), Spring (2011)
  7. J. Krause, Human-Computer Assisted Search in EDD, Law Technology News, December 20 (2010).
  8. D. Lewis, Afterword: data, knowledge and e-discovery , Artificial Intelligence and Law 18(4), pp. 481-486 (2010).
  9. D. Oard, J. Baron, B. Hedin, D. Lewis and S. Tomlinson, Evaluation of information retrieval for e-discovery , Artificial Intelligence and Law 18(4), pp. 347-386 (2010).
  10. D. Oard and W. Webber, Information Retrieval and E-Discovery, Foundations and Trends in Information Retrieval, 7(2-3)99-237 (2013).
  11. N. Pace and L. Zakaras, Where The Money Goes: Understanding Litigant Expenditures for Producing E-Discovery, RAND Publication (2012)
  12. H. Roitblat, A. Kershaw and P. Oot., Document Categorization in Legal Electronic Discovery: Computer Classification vs. Manual Review, Journal of the American Society for Information Science and Technology, 61(1)70-80 (2010).
  13. The Sedona Conference®, Commentary on Achieving Quality in E-Discovery (2009).
  14. TREC Legal Track, (containing TREC 2006 through TREC 2011 overview papers)
  15. W. Webber, Re-examining the Effectiveness of Manual Review, SIGIR 2011 Information Retrieval for E-Discovery (SIRE) Workshop, Beijing, China (2011).
Relevant Case Law
  1. da Silva Moore v. Publicis Groupe, 2012 WL 607412 (S.D.N.Y. Feb. 24, 2012), approved and adopted, 2012 WL 1446534 (S.D.N.Y. Apr. 26, 2012)
  2. EORHB v. HOA Holdings, Civ. Ac. No. 7409-VCL (Del. Ch. Oct. 15, 2012)
  3. Global Aerospace Inc., et al. v. Landow Aviation, L.P., et al., 2012 WL 1431215 (Va. Cir. Ct. Apr. 23, 2012).
  4. In re Actos (Pioglitazone) Products, 2012 WL 3899669 (W.D. La. July 27, 2012)
  5. Kleen Products, LLC v. Packaging Corp. of America, 10 C 5711 (N.D. Ill.) (Nolan, M.J.)

Important Dates

Submissions (archived)

We encourage the submission of research papers and position papers on both supporting technologies for e-discovery (search, text classification, etc.) and on efforts to develop best practices and standards for use of these technologies. Accepted position papers and accepted research papers will be made available on the Workshop's Web page and distributed to participants on the day of the event, and some speakers may be selected from among those submitting position papers. See the Call for Submissions for further details.

Organizing Committee

Jason R. Baron, University of Maryland, USA
Jack G. Conrad, Thomson Reuters, Switzerland
Dave Lewis, David D. Lewis Consulting, USA
Debra Logan, Gartner Research, UK
Douglas W. Oard, University of Maryland, USA
Fabrizio Sebastiani, Istituto di Scienza e Tecnologia dell'Informazione, Italy
Last Update: January 17, 2013