ICAIL
2013 Workshop on
Standards for Using Predictive Coding, Machine Learning,
and Other Advanced Search and Review Methods in E-Discovery
(DESI V
Workshop)
June 14, 2013, Casa dell'Aviatore, viale dell'Universita 20, Rome, Italy
Purpose
The DESI V workshop is intended to focus on best practices and
standards for using predictive coding, machine learning, and other
advanced search and review methods in e-discovery. The scope of
methods to be considered include applications of automation to any
aspect of e-discovery (e.g., early case assessment, review for
responsiveness, or review for privilege) with the goal of improving
accuracy, reducing cost, or both. The workshop is intended to build
upon past discussions in ICAIL/DESI forums in promoting the use of AI
and other advanced forms of search techniques in legal settings, as
cost-efficient alternatives to traditional Boolean and manual
searching.
Background
Lawyers around the world continue to face the problem of how to
effectively and efficiently conduct searches for relevant documents,
across increasingly complex, enterprise-wide collections within
corporate and institutional settings. As noted by The Sedona
Conference®, a leading legal think tank, "the legal profession is at a
crossroads" in terms of its willingness to embrace automated,
analytical, and statistical approaches that may help solve the twin
problems of volume and complexity of data (The Sedona Conference,
2009). Rules development over the past half-decade governing the
conduct and practice of civil litigation, particularly in the U.S. and
U.K. has fostered new approaches by encouraging parties to "meet and
confer" on matters involving "electronically stored
information" (ESI), including issues involving continued storage,
preservation, and most importantly, access to ESI in the
parties' respective custodies. Until this year, however, there had
been limited consideration in published opinions
of advanced search methods familiar to the AI
and information retrieval communities. That has now changed.
Indeed, the year 2012 marked a watershed in the development of legal
practice in the United States, with published opinions in a total of
three courts, and a multi-day evidentiary proceeding in a fourth
court, all concerning the propriety of lawyers' use of advanced search
techniques in civil litigation to find relevant documents in large
evidentiary data sets. In particular, the judicial ruling in the
federal case of da Silva Moore v. Publicus Groupe (U.S. District
Court, New York), has advanced the law in this area, where the
magistrate judge found that lawyers need no longer consider themselves
"guinea pigs" in coming into court to justify the use of advanced
search techniques such as predictive coding, as the court was willing
to give a judicial blessing to the use of such advanced techniques in
the name of greater efficiency and efficacy in results. The Moore
court specifically took judicial notice of past research in the
information retrieval area, including published reports arising out of
the TREC Legal Track and related endeavors. Other rulings have
followed, including a recent case in which the Court on its own
initiative ordered the use of predictive coding without a prior motion
from either party.
Notwithstanding these early judicial blessings as well as the
widespread attention presently being given to the use of advanced
search techniques during document review processes, there is no widely
agreed-upon set of standards or best practices with respect to how
lawyers actually go about using these new techniques. Indeed, two of
the "joint protocols" entered into in both the Moore case and another
case, In re Actos (U.S District Court, Louisiana), have proven to be
controversial in requiring a level of transparency in the sharing of
information between opposing parties to carry out quality control
sampling as well as in judging what constitutes relevant or
non-relevant evidence. Past DESI workshops, as well as leading
organizations such as The Sedona Conference®, have also recognized the
lack of best practice standards in this area. (DESI IV, 2011; The
Sedona Conference, 2009).
The purpose of the DESI V workshop will be to continue to advance the
discussion by providing a platform for the consideration of best
practices in using predictive coding and other forms of machine
learning algorithms. We invite participation from e-discovery
stakeholders and practitioners from the law, government, and industry,
along with researchers on process quality, information retrieval,
human language technology, human-computer interaction, artificial
intelligence, and other fields connected with e-discovery. The
dialogue at the workshop will be expected to center around how lawyers
are currently using such techniques, how better protocols can be
developed that will satisfy the interests of the legal community, and
what open questions exist that would benefit from further research
into optimizing the use of these techniques in a variety of legal and
investigatory settings. (See Call for Submission for further
details.)
Agenda
A final schedule and speaker bios are now available.
Papers
Research Papers
- Lawrence Chapin, Simon Attfield and Efeosasere Moibi Okoro,
Predictive Coding, Storytelling and God: Narrative Understanding
in e-Discovery
- Jianlin Cheng, Amanda Jones, Caroline Privault and Jean-Michel Renders,
Soft Labeling for Multi-Pass
Document Review
- Manfred Gabriel, Chris Paskach and David Sharpe,
The
Challenge and Promise of Predictive Coding for Privilege
- Michael Sperling, Rong Jin, Illya Rayvych, Jianghong Li and Jinfeng Yi,
Similar Document
Detection and Electronic Discovery: So
Many Documents, So Little Time
Additional Papers
- Thomas I. Barnett and Michael Sperling,
Predictive Coding:
Turning Knowledge into Power
- Deborah Baron, Angela Bunting and Brian J. Krupczak,
Turning Back Time:
The Application of Predictive Technology to Big Data
- Jason R. Baron and Jesse B. Freeman,
Cooperation,
Transparency, and the Rise of Support Vector Machines in E-Discovery:
Issues Raised by the Need to Classify Documents as Either
Responsive or Nonresponsive
- Macyl A. Burke, A
Cost-Effective Approach to Quality
- William P. Butterfield, Conor R. Crowley and Jeannine Kenney,
Reality Bites:
Why TAR's Promises Have Yet to be Fulfilled
- Jianlin Cheng and Amanda Jones,
Variability
in Technology Assisted Review and Implications for Standards
- David Graus, Zhaochun Ren, Maarten de Rijk, David van Dijk,
Hans Henseler and Nina van der Knaap,
Semantic Search in E-Discovery:
An Interdisciplinary Approach
- Ali Hadjarian, Jianping Zhang and Shuxing Cheng, An Emperical Analysis of the Training and Feature Set Size in Text Categorization for e-Discovery
- Bruce Hedin, Dan Brassil and Christopher Hogan,
Toward a Meaningful
E-Discovery Standard:
On the Place of Measurement in an E-Discovery Standard
- Gilbert S. Keteltas, Bridging
the Technical and Legal Divide: Information Retrieval
Process Quality Standards for Counsel
- Ben Klaber, Artificial
Intelligence and Transactional Law: Automated M&A
Due Diligence
- R. T. Oehrle and E. A. Johnson,
The Structure of Predictive Coding:
A Guide for the Perplexed
- Dan Regard and Tom Matzen,
A Re-Examination of
Blair & Maron (1985)
- Karl Schieneman, The
Fall of the Berlin Wall and its Parallels to E-Discovery
- Johannes C. Scholtes, Tim van Cann, Mary Mack,
The Impact of Incorrect
Training Sets and Rolling Collections on Technology-Assisted
Review
- Meaghan Zore, Dialing Back Disclosure:
Best Practices for Balancing Cooperation and Client Interests
Slides and Other Materials
Keynote Addresses
Discussant Notes
Research Presentations
Other Presentations
DESI History
At the beginning of the second decade of the 21st century, lawyers
continue to increasingly face the problem of how to effectively and
efficiently conduct searches for relevant documents across
increasingly complex, enterprise-wide collections within corporate and
institutional settings. Hundreds of millions of documents are now
routinely subject to searches across a wide spectrum of litigation and
investigatory contexts (e.g., the Lehman Brothers investigation
necessitated a review of 350 million pages or 3.5 petabytes of
material). In recognition of this situation, in 2006 the United
States adopted new rules governing civil litigation in federal
courts. These courts have recognized "electronically stored
information" (ESI) as a term of art, embracing all forms of electronic
documents made subject to the civil discovery process. Under the new
rules, opposing parties in federal court litigation now have an early
"meet and confer" duty to discuss a range of electronic discovery
("e-discovery") issues, including the continued storage, preservation,
and access to ESI in their respective physical and legal custodies.
DESI V follows four successful prior DESI (Discovery of Electronically
Stored Information) Workshops: at ICAIL 2007 (DESI I, Palo Alto),
ICAIL 2009 (DESI
III, Barcelona), ICAIL 2011 (DESI IV,
Pittsburgh) (and an intermediate workshop (DESI
II) sponsored by University College London in 2008. In DESI I, a
wide array of individuals came together for perhaps the first time to
foster engagement between e-discovery practitioners and a broad range
of research communities who might contribute to the development of new
technologies to support the e-discovery process. The DESI II and III
workshops broadened the scope of this discussion to include
comparisons of requirements between differing national settings and
legal environments. DESI IV built on these efforts, in having a
first-of-its-kind general discussion of standard-setting for the legal
profession through contemplation of ISO 9001 frameworks as well as
capability maturity models. The present workshop will greatly benefit
from all of the past discussions.
References
Much has been published on E-Discovery generally, so no list of
references could hope to be complete. Here are a few papers that we
know of that we believe would be useful as background reading for the
focus of this workshop. Please send recommended additions for this
list to oard@umd.edu.
- K. Ashley, Can
AI & Law Contribute to Managing Electronically Stored Information
in Discovery Proceedings? Some Points of Tangency, DESI I
Workshop, Palo Alto, June 4 (2007).
- K. Ashley and W. Bridewell,
Emerging AI & Law approaches to automating analysis and retrieval
of electronically stored information in discovery proceedings
, Artificial Intelligence and Law 18(4), pp. 311-320 (2010)
- J. Baron, Law in the Age
of Exabytes: Some Further Thoughts on 'Information Inflation' and
Current Issues in E-Discovery Search, Richmond Journal of Law
and Technology, 17(3), Spring (2011)
- J. Conrad,
E-discovery revisited: the need for artificial intelligence
beyond information retrieval , Artificial Intelligence and
Law 18(4), pp. 321-345 (2010).
- J. Conrad, E-Discovery
Revisited: A Broader Perspective for AI Researchers, DESI I
Workshop, Palo Alto, June 4, 2007,
- M. Grossman and G. Cormack, Technology-Assisted
Review in E-Discovery Can Be More Effective and More Efficient Than
Exhaustive Manual Review, Richmond Journal of Law and Technology,
17(3), Spring (2011)
- J. Krause, Human-Computer
Assisted Search in EDD, Law Technology News, December 20 (2010).
- D. Lewis,
Afterword: data, knowledge and e-discovery , Artificial
Intelligence and Law 18(4), pp. 481-486 (2010).
- D. Oard, J. Baron, B. Hedin, D. Lewis and S. Tomlinson,
Evaluation of information retrieval for e-discovery ,
Artificial Intelligence and Law 18(4), pp. 347-386 (2010).
- D. Oard and W. Webber, Information
Retrieval and E-Discovery, Foundations and Trends in Information
Retrieval, 7(2-3)99-237 (2013).
- N. Pace and L. Zakaras, Where The Money
Goes: Understanding Litigant Expenditures for Producing
E-Discovery, RAND Publication (2012)
- H. Roitblat, A. Kershaw and P. Oot., Document
Categorization in Legal Electronic Discovery: Computer Classification
vs. Manual Review, Journal of the American Society for Information
Science and Technology, 61(1)70-80 (2010).
- The Sedona Conference®, Commentary
on Achieving Quality in E-Discovery (2009).
- TREC Legal Track,
(containing TREC 2006 through TREC 2011 overview papers)
- W. Webber, Re-examining the
Effectiveness of Manual Review, SIGIR 2011 Information Retrieval
for E-Discovery (SIRE) Workshop, Beijing, China (2011).
Relevant Case Law
- da Silva Moore v. Publicis Groupe, 2012 WL 607412
(S.D.N.Y. Feb. 24, 2012), approved and adopted, 2012 WL 1446534
(S.D.N.Y. Apr. 26, 2012)
- EORHB v. HOA Holdings, Civ. Ac. No. 7409-VCL (Del. Ch. Oct. 15,
2012)
- Global Aerospace Inc., et al. v. Landow Aviation, L.P., et al.,
2012 WL 1431215 (Va. Cir. Ct. Apr. 23, 2012).
- In re Actos (Pioglitazone) Products, 2012 WL 3899669
(W.D. La. July 27, 2012)
- Kleen Products, LLC v. Packaging Corp. of America, 10 C 5711
(N.D. Ill.) (Nolan, M.J.)
Important Dates
- Research papers due: May 1, 2013
- Position papers due: May 8, 2013
- Accept/Reject notification for research papers: May 15, 2013
- Preliminary Agenda posted: May 22, 2013
- Camera-ready research papers due: May 22, 2013
- ICAIL Conference: June 10-13, 2013
- DESI V Workshop: June 14, 2013
Submissions (archived)
We encourage the submission of research papers and position papers on
both supporting technologies for e-discovery (search, text
classification, etc.) and on efforts to develop best practices and
standards for use of these technologies. Accepted position papers and
accepted research papers will be made available on the Workshop's Web
page and distributed to participants on the day of the event, and some
speakers may be selected from among those submitting position
papers. See the Call for Submissions
for further details.
Organizing Committee
Jason R. Baron, University of Maryland, USA
Jack G. Conrad, Thomson Reuters, Switzerland
Dave Lewis, David D. Lewis Consulting, USA
Debra Logan, Gartner Research, UK
Douglas W. Oard, University of Maryland, USA
Fabrizio Sebastiani, Istituto di Scienza e Tecnologia dell'Informazione, Italy
Last Update: January 17, 2013