LREC 2020 Workshop on
Cross-Language Search and Summarization of Text and Speech
Originally Scheduled for May 16, 2020
Palais du Pharo, Marseilles, France
LREC has announced that the conference is cancelled. Reviewing for this workshop will continue, and the proceedings will be published.
Purpose
In today’s global world, people may need access to information that
only appears online in a language they do not speak. Cross-Language
Information Retrieval (CLIR) enables end users to issue queries in
their own language, but provides results from multiple languages
around the world, often using translation so that the end user can
quickly understand whether the retrieved results are relevant.
Cross-lingual summarization can make it easier for an end user to
determine if a document is relevant by providing a summary in English
of the foreign language document, highlighting the evidence for
relevance. When the foreign language is a low-resource language,
cross-lingual search and summarization are more difficult; translation
capabilities may be poor and the lack of resources make it difficult
to train CLIR and summarization systems. To complicate matters even
more, when the collection contains speech as well as text, producing
accurate search results and generating interpretable summaries is even
more difficult.
This workshop aims to stimulate collection and provision of resources
that can improve systems that perform cross-lingual search and
summarization. To facilitate dissemination of information about
existing resources, the workshop will feature keynote speeches and
panels by people who have worked in this area, have cross-lingual
resources to share, or can describe ongoing research programs and
shared tasks. In addition, we will have a call that solicits papers
describing recent and current research in these areas, that describe
relevant resources, or that stake out positions on the directions in
which the authors think the field should move.
The motivation of the workshop is to stimulate the sharing of
resources for the tasks of cross-lingual search and summarization over
low resource languages. The lack of such resources hinders research
that focuses on development of such systems. While there have been
workshops on multi-lingual summarization, the languages addressed have
been quite limited, with a focus on English-Chinese. Much of the
summarization field focuses now on neural net approaches, which
require large amounts of data. While such data has been made available
for English news and a few other genres, large scale resources for
cross-lingual summarization are virtually non-existent.
Evaluation poses particular challenges for CLIR from low-recourse
languages because representative and redistributable digital text or
speech can be difficult to obtain in the needed quantities, performing
relevance judgments requires specialized linguistic expertise, and the
resulting costs may be amortized across fewer research uses than for
high-resource languages.
Thus, there is a huge need for the development, sharing and use of
affordable cross-lingual resources in this space. To set the stage,
the organizers will provide two small spoken language test collections
that include waveforms, transcriptions, queries, and relevance
judgments. These are conversational genres, one in Somali (a very-low
resource language) and the other in Bulgarian (a moderate-resource
language). We will welcome papers that provide results on these test
collections as well as any datasets that are available from by ELDA,
LDC, or other repositories. Participants are also free to describe
other datasets that they have access to and to report results on
these.
We welcome papers on research that broadly relates to supporting
information access to lower-resource languages. It may include, but is
not limited to:
- Test collections for evaluating CLIR
- Development of new cross-lingual resources
- Datasets for cross-lingual summarization
- Methods for CLIR
- CLIR over speech
- Evidence generation for CLIR
- Methods for cross-lingual summarization
- Methods for cross-lingual query-focused summarization
- Snippet generation
- Speech summarization
- Multilingual language generation
- Zero-shot learning and domain adaptation
- Explainable methods for cross-lingual NLP
Data
IARPA has released some data in support of the workshop, which is
discussed in a README file. To obtain
this data, follow the instructions below:
- Download the data
agreement. Fill out all the information, sign, and return it to
material_poc@nist.gov.
- Once the information is received, NIST will give access to a
Google drive that contains the data. In order to be added to the
Google drive, the email address given in the data agreement must be
associated to a Google account but it doesn’t have to be a gmail
address.
- Contact material_poc@nist.gov if you have any question or
issue.
Accepted Papers
The Proceedings
is now available as a PDF file.
- Piyush Arora, Dimitar Shterionov, Yasufumi Moriya, Abhishek
Kaushik, Daria Dzendzik and Gareth Jones, An Investigative Study of
Multi-Modal Cross-Lingual Retrieval
- Joel Barry, Elizabeth Boschee, Marjorie Freedman and Scott
Miller, SEARCHER: Shared Embedding Architecture for Effective
Retrieval
- Petra Galuscakova, Douglas Oard, Joe Barrow, Suraj Nair, Shing
Han-Chin, Elena Zotkina, Ramy Eskander and Rui Zhang, MATERIALizing
Cross-Language Information Retrieval: A Snapshot
- Zhuolin Jiang, Amro Jaroudi, William Hartmann, Damianos Karakos
and Lingjun Zhao, Cross-lingual Information Retrieval with BERT
- Damianos Karakos, Rabih Zbib, William Hartmann, Richard Schwartz
and John Makhoul, Reformulating Information Retrieval from Speech and
Text as a Detection Problem
- Carl Rubino, The Effect of Linguistic Parameters in CLIR
Performance
- Richard Schwartz, John Makhoul, Lee Tarlin and Damianos Karakos,
What Set of Documents to Present to an Analyst?
- David Wan, Zhengping Jiang, Chris Kedzie, Elsbeth Turcan and
Peter Bell, Subtitles to Segmentation: Improving Low-Resource
Speech-to-TextTranslation Pipelines
- Ilya Zavorin, Aric Bills, Cassian Corey, Michelle Morrison,
Audrey Tong and Richard Tong, Corpora for Cross-Language Information
Retrieval in Six Less-Resourced Languages
- Le Zhang, Damianos Karakos, William Hartmann, Manaj Srivastava,
Lee Tarlin, David Akodes, Sanjay Krishna Gouda, Numra Bathool, Lingjun
Zhao, Zhuolin Jiang, Richard Schwartz and John Makhoul, A
Cross-lingual Information Retrieval System for Low-Resource Languages
- Elaine Zosa, Mark Granroth-Wilding and Lidia Pivovarova, A
Comparison of Unsupervised Methods for Ad hoc Cross-Lingual Document
Retrieval
Important Dates
- February 28, 2020 (extended!): Paper submissions due
- March 23, 2020: Notification
- April 13, 2020: Camera ready paper deadline
Submissions
Blind submissions can be made using the START Conference Management Tool by February 28, 2020 (Anywhere on Earth).
LaTeX, Word and Open Office templates
for submissions are available.
Organizing Committee
- James Allan, University of Massachusetts at Amherst (USA)
- Lu Wang, Northeastern University (USA)
- Kathy McKeown, Columbia University (USA)
- Douglas W. Oard, University of Maryland (USA)
- Steve Renals, University of Edinburgh (UK)
- Richard Schwartz, Raytheon BBN Technologies (USA)
Program Committee
- Eneko Agirre, University of the Basque Country (Spain)
- Piyush Arora, American Express Big Data Labs (India)
- Mohit Bansal, University of North Carolina (USA)
- Nicola Ferro, University of Padua (Italy)
- Petra Galuscakova, University of Maryland (USA)
- Jan Hajic, Charles University (Czech Republic)
- Gareth Jones, Dublin City University (Ireland)
- Damianos Karakos, Reytheon BBN Technologies, (USA)
- Jonathan May, University of Southern California Information Sciences Institute (USA)
- Jessica Ouyang, University of Texas at Dallas (USA)
- Pavel Pecina, Charles University (Czech Republic)
- Kay Peterson, NIST (USA)
- Dragomir Radev, Yale University (USA)
- Hussein Suleman, University of Cape Town (South Africa)
- Audrey Tong, NIST (USA)
- Xabier Saralegi Urizar, Elhuyar Foundation (Spain)
- Ilya Zavorin, Bluemont Technology (USA)
- Rui Zhang, Yale University (USA)
Doug Oard
Last modified: Sun May 17 16:22:16 2020