0;136;0c0;136;0c
ICAIL 2015 DESI VI Workshop
ICAIL
2015 Workshop on
Using Machine Learning and Other Advanced Techniques to Address
Legal Problems in E-Discovery and Information Governance
(DESI VI Workshop)
June 8, 2015, University of San Diego, San Diego, California
Purpose
The sixth workshop on Discovery of Electronically Stored Information
(DESI VI) workshop aims to bring together researchers and
practitioners to explore innovation and the development of best
practices for application of search, classification, language
processing, data management, visualization, and related techniques to
institutional and organizational records in e-discovery, information
governance, public records access, and other legal settings.
Questions addressed include:
- What combinations of machine learning and other techniques can
best categorize information in accordance with existing records
management policies?
- Do effective methods exist for performing sentiment analysis and
personal information identification in a legally useful way in e-mail
and other records of interpersonal communication?
- How well can we estimate the end-to-end costs of workflows that
combine artificial intelligence and human coding to accomplish legal
tasks on a broad range of content types?
- Can proactive insider threat detection leverage information
already being collected for records management purposes, and what
would be the ethical and legal fallout of such approaches?
- Are approaches available to reduce the perceived conflicts
between privilege and transparency in labeling data for
Technology-Assisted Review (TAR) in e-discovery and public records
access applications?
- What technical, procedural, and legal issues arise from recent
proposals to shift the focus of e-discovery from relevance to
materiality?
- Where do recent legal cases point to the need for new research to
better inform the decision of courts and the practices of parties?
- What lessons can we draw from recent shared-task evaluations such
as TREC and EDI, and how can future shared-task evaluations best be
structured?
- How can current techniques for issue coding be applied to
compliance tasks (e.g., in regulatory, enforcement, and investigations
settings), and what capability gaps exist that call for new research?
- What implications do emerging technologies such as deep learning
and fine-grained access to behavioral traces have for e-discovery,
business intelligence, and records and information management
purposes?
Participation is invited from all interested parties, including those
with interests in:
- Archives
- Artificial intelligence
- Cognitive science
- Computational linguistics
- Corpus analysis
- Digital forensics
- Document image understanding
- E-government
- Human-computer interaction
- Human computation and crowdsourcing
- Information governance
- Information retrieval
- Information seeking behavior
- Knowledge management
- Law
- Legal informatics
- Litigation support
- Machine learning
- Privacy-preserving data mining
- Records management
- Speech processing
- Standards
- Statistical quality control
- Text mining
- Visual analytics
Outcomes
A report from the Workshop is now available.
Agenda
A final agenda (last updated June 5) is now
available.
Papers and Abstracts
Keynote Speakers
Refereed Papers
- William C. Dimm, Information Retrieval Performance Measurement Using Extrapolated Precision (slides: PPT)
- Amanda Jones, Marzieh Bazrafshan, Fernando Delgado, Tania Lihatsh and Tamara Schuyler, The Role of Metadata in Machine Learning for Technology Assisted Review (slides: PPTX)
- David J. Marcos, How a Bill Becomes a Bit: Engineering Compliance
- R. T. Oehrle and E. A. Johnson, Statistical Context Analysis and Search Quality
- Jeremy Pickens, An Exploratory Analysis of Control Sets for Measuring E-Discovery Progress
- James A. Sherer, Jenny Le and Amie Taal, Big Data Discovery, Privacy, and the Application of Differential Privacy Mechanisms
- David van Dijk, David Graus, Zhaochun Ren, Hans Henseler and
Maarten de Rijke, Who is Involved?
Semantic Search for E-Discovery
Position Papers
Important Dates
- April 10, 2015: Refereed paper submissions due
- April 28, 2015: Decisions on refereed papers returned [originally: April 27]
- May 1, 2015: Unrefereed position papers requested [late submissions will be accepted]
- May 15, 2015: Draft program posted
- June 8, 2015: DESI VI Workshop
Background
With data continuing to double worldwide every 18 months (EMC/IDC
Study 2014), institutions of all kinds (public and private) are
turning to the use of powerful analytics to provide greater visibility
into data sets for multiple purposes. Against this backdrop, lawyers
around the world continue to face the problem of how to effectively
and efficiently conduct searches for relevant documents, across
increasingly complex, enterprise-wide collections within corporate and
institutional settings. Lawyers are interested in doing so, both for
purposes of responding to litigation demands, as well as to solve a
variety of other legal issues arising in the workplace. As noted by
The Sedona Conference®, a leading legal think tank, "the legal
profession has passed a crossroads" in terms of its willingness to
embrace automated, analytical, and statistical approaches that may
help solve the twin problems of volume and complexity of data (The
Sedona Conference, 2013). Spurred by a growing body of case law
acknowledging the legitimacy of using predictive coding and other
forms of advanced search in e-discovery, legal professionals are
increasingly interested in applying the tools and techniques developed
in litigation settings to solving other types of legal issues,
including but not limited to: providing legal advice on employment
issues, mergers and acquisitions, and whistleblower allegations, after
searching large collections of email, text messages, and other forms
of electronic communication, including social media.
The year 2012 marked a watershed in the development of legal practice
in the United States, with published opinions in a total of three
courts, and a multi-day evidentiary proceeding in a fourth court, all
concerning the propriety of lawyers’ use of advanced search techniques
in civil litigation to find relevant documents in large evidentiary
data sets. In particular, the judicial ruling in the federal case of
da Silva Moore v. Publicus Groupe (U.S. District Court, New York), has
advanced the law in this area, establishing that lawyers need no
longer consider themselves "guinea pigs" coming into court to justify
the use of advanced search techniques such as predictive coding, as
the court gave judicial blessing to the use of such advanced
techniques in the name of greater efficiency and efficacy in
results. The Moore court specifically took judicial notice of past
research in the information retrieval area, including published
reports arising out of the TREC Legal Track and similar
endeavors. Other rulings have followed, including recent cases in
which courts on their own initiative have ordered the use of
predictive coding without a prior motion from either party.
Notwithstanding these early judicial blessings, as well as the
widespread attention presently being given to the use of advanced
search techniques during document review processes, there is no widely
agreed upon set of standards or best practices with respect to how
lawyers actually go about using these new techniques, in e-discovery
or otherwise. Indeed, two of the "joint protocols" entered into in
both the Moore case and another case, In re Actos (U.S District Court,
Louisiana), have proven to be controversial by requiring transparency
in the sharing of information between opposing parties to carry out
quality control sampling and assessment of what constitutes relevant
or non-relevant evidence. More recently, a lively dialogue has arisen
among proponents of varying types of machine learning techniques,
particularly with respect to variants of active learning, and the most
efficient method for "seeding" or "training" the software
(E-DiscoveryTeam Blog 2014). Past DESI workshops, as well as leading
organizations such as The Sedona Conference®, have recognized the lack
of best practice standards in this area. (DESI IV, 2011; DESI V 2013;
The Sedona Conference, 2013b).
DESI History
DESI VI follows five successful prior DESI (Discovery of Electronically
Stored Information) Workshops: at ICAIL 2007 (DESI I, Palo
Alto), ICAIL 2009 (DESI III,
Barcelona), ICAIL 2011 (DESI IV, Pittsburgh) and
ICAIL 2013 (DESI V, Rome), and an intermediate
workshop (DESI
II) at University College London in 2008. In DESI I, a
wide array of individuals came together for perhaps the first time to
foster engagement between e-discovery practitioners and a broad range
of research communities who might contribute to the development of new
technologies to support the e-discovery process. The DESI II and III
workshops broadened the scope of this discussion to include
comparisons of requirements between differing national settings and
legal environments. DESI IV built on these efforts, in having a
first-of-its-kind general discussion of standard-setting for the legal
profession through contemplation of ISO 9001 frameworks as well as
capability maturity models. Most recently, DESI V extended the
discussion of standards to include the question of what standards
could and should be made applicable to the use of predictive coding
and other advanced techniques, that were at the time beginning to be
cited in U.S. case law. The DESI VI workshop in San Diego will
benefit from all of the past discussions, but will aim to broaden the
scope of legal issues to which advanced data analysis and
classification technologies might credibly be applied, beyond
ediscovery to a fuller range of information governance applications.
Ideally, the aim of the DESI workshop series has been to foster a
continuing dialogue leading to the adoption of further best practice
guidelines or standards in using machine learning, most notably in the
ediscovery space. Past DESI research papers have contributed to
thought leadership, including being cited in the academic literature
(e.g., RAND 2012), as well as informing a current effort to craft an
ISO standard on e-discovery. The DESI VI workshop is intended to have
an expanded focus, to be of interest to actors in the technology
sector interested in further adopting machine learning and other
advanced techniques not only for e-discovery but but in a wider
variety of legal settings where working with and categorizing data in
large data sets is increasingly important.
Submissions (archived; deadlines have passed)
We invite refereed papers describing research or practice. After peer
review, accepted papers will be posted on the DESI VI website and
distributed to workshop participants. Authors of accepted refereed
papers will be invited to present their work either as an oral or a
poster presentation. Refereed papers should be 4 to 10 pages; longer
papers may be returned without review.
We also invite unrefereed position papers describing individual
interests for inclusion (without review) on the DESI VI Web site and
distribution to workshop participants. Position papers should
typically be on the order of 2-3 pages.
Participation in the DESI VI workshop is open. Submission of papers is
encouraged, but not required.
Submissions should be sent by email to Doug Oard (oard@umd.edu) with
the subject line DESI VI POSITION PAPER or DESI VI RESEARCH PAPER. All
submissions received will be acknowledged within 3 days.
The Call for Submissions is also available
as a PDF document.
References
Much has been published on E-Discovery generally, so no list of
references could hope to be complete. Here are a few papers and cases
that we know of that we believe would be useful as background reading
for the focus of this workshop. Please send recommended additions for
this list to oard@umd.edu.
Papers:
- Ashley, Kevin D., “Can AI & Law Contribute to Managing
Electronically Stored Information in Discovery Proceedings? Some
Points of Tangency,” paper presented at DESI Workshop,
- Ashley, Kevin D. & W. Bridewell, “Emerging AI & Law approaches to
automating analysis and retrieval of electronically stored
information in discovery proceedings,” 18 Artificial Intelligence
and Law 311 (2011)
- Baron, Jason R., “Law in the Age of Exabytes: Some Further Thoughts
on ‘Information Inflation’ and Current Issues in E-Discovery Search,
17 Richmond J. of Law & Tech. 3 (2011),
http://jolt.richmond.edu/v17i3/article9.pdf
- Baron, Jason R., "Toward A Federal Benchmarking Standard for
Evaluating Information Retrieval Products Used in E-Discovery,” 6
Sedona Conference Journal 237-246 (2005) (available on Westlaw,
Lexis)
- Borden, Bennett B. & J.R. Baron, “Finding the Signal in the Noise:
Information Governance, Analytics, and the Future of Legal
Practice,” 20 Richmond J. of Law & Tech. 7 (2014),
http://jolt.richmond.edu/v20i2/article7.pdf
- Conrad, Jack G., “E-Discovery revisited: the need for artificial
intelligence beyond information retrieval,” 18 Artificial
Intelligence and Law 4 (2010).
- Conrad, Jack G., “E-Discovery Revisited: A Broader Perspective for
AI Researchers,” paper presented at DESI Workshop, Workshop on
Supporting Search and Sensemaking For Electronically Stored
Information in Discovery Proceedings Eleventh International
Conference on Artificial Intelligence and Law, Palo Alto, June 4,
2007, http://www.umiacs.umd.edu/~oard/desiws/
- Cormack, Gordon V., M.R. Grossman, “Evaluation of Machine-Learning
Protocols for Technology Assisted Review in E-Discovery,” SIGIR
2014,
http://plg2.cs.uwaterloo.ca/~gvcormac/calstudy/study/sigir2014-cormackgrossman.pdf
- E-discoveryteam Blog, “Talking Turkey,” guest blog by M. Grossman &
G. Cormack (Sept. 7, 2014),
http://e-discoveryteam.com/2014/09/07/guest-blog-talking-turkey/
- EMC/IDC Digital Universe Study 2014,
http://www.emc.com/leadership/digitaluniverse/ index.htm
- Grossman, M. and G. Cormack, “Technology-Assisted Review in E-Discovery
Can Be More Effective and More Efficient Than Exhaustive Manual
Review,” 17 Richmond J. of Law and Technology 3 (2011)
- Lewis, David D., “Afterword: data, knowledge and e-discovery,” 18
Artificial Intelligence and Law 481 (2010).
- TREC Legal Track web page, http://trec-legal.umiacs.umd.edu
(containing TREC 2006 through TREC 2011 overview papers)
- Oard, Douglas W., J.R. Baron, B. Hedin, D.D. Lewis, S. Tomlinson,
“Evaluation of information retrieval for E-Discovery,” 18 Artificial
Intelligence and Law 347 (2010).
- Oard, Douglas W., W. Webber, “Information Retrieval and E-Discovery,”
6 Foundations and Trends in Information Retrieval
http://ediscovery.umiacs.umd.edu/pub/ow12fntir.pdf
- Pace, Nicholas M., L. Zakaras, “Where The Money Goes: Understanding
Litigant Expenditures for Producing E-Discovery,” RAND Publication
(2012) http://www.rand.org/pubs/monographs/MG1208.html
- Paul, George L. and J.R. Baron, “Information Inflation: Can The Legal
System Cope?,” 13 Richmond Journal of Law and Technology (2007),
http://law.richmond.edu/jolt/v13i2/article10.pdf.
- The Sedona Conference, The Sedona Best Practices Commentary on the Use
of Search and Information Retrieval in E-Discovery (2013 revised
ed.)(2013a),
http://www.thesedonaconference.org/content/miscFiles/publications_html
- The Sedona Conference, Commentary on Achieving Quality in the
E-Discovery Process (2013 revised ed.) (2013b),
http://www.thesedonaconference.org/content/miscFiles/publications_html
- Webber, William, “Re-examining the Effectiveness of Manual Review,”
SIGIR 2011 Information Retrieval for E-Discovery (SIRE) Workshop,
Beijing, China (2011), http://www.umiacs.umd.edu/~oard/sire11/#Papers
Cases:
- da Silva Moore v. Publicis Groupe, 2012 WL 607412 (S.D.N.Y. Feb. 24,
2012), approved and adopted, 2012 WL 1446534 (S.D.N.Y. Apr. 26, 2012)
- Digicel (St. Lucia) Limited v. Cable & Wireless PLC (England and Wales
High Court (Chancery Division), [2008] EWHC 2522 (Ch) (23 October
2008).
- EORHB v. HOA Holdings, Civ. Ac. No. 7409-VCL (Del. Ch. Oct. 15, 2012)
- Global Aerospace Inc., et al. v. Landow Aviation, L.P., et al., 2012
WL 1431215 (Va. Cir. Ct. Apr. 23, 2012).
- In re Actos (Pioglitazone) Products, 2012 WL 3899669 (W.D. La. July
27, 2012)
- Kleen Products, LLC v. Packaging Corp. of America, 10 C 5711
(N.D. Ill.) (Nolan, M.J.)
- United States v. O’Keefe, 537 F. Supp. 2d 14 (D.D.C. 2008) Victor
Stanley, Inc. v. Creative Pipe, Inc., 250 F.R.D. 251 (D. Md. 2008)
Organizing Committee
- Jason R. Baron, Drinker Biddle & Reath LLP; University of Maryland
- Jack G. Conrad, Thomson Reuters
- Amanda Jones, H5
- Dave Lewis, David D. Lewis Consulting
- Douglas W. Oard, University of Maryland
Doug Oard
Last modified: Wed Sep 9 12:03:18 2015