LREC 2020 Workshop on
Cross-Language Search and Summarization of Text and Speech

Originally Scheduled for May 16, 2020
Palais du Pharo, Marseilles, France

LREC has announced that the conference is cancelled. Reviewing for this workshop will continue, and the proceedings will be published.

Purpose |

Data |

Papers |

Important Dates |

Submissions |

Organizing Committee |

Program Committee

Purpose

In today’s global world, people may need access to information that only appears online in a language they do not speak. Cross-Language Information Retrieval (CLIR) enables end users to issue queries in their own language, but provides results from multiple languages around the world, often using translation so that the end user can quickly understand whether the retrieved results are relevant. Cross-lingual summarization can make it easier for an end user to determine if a document is relevant by providing a summary in English of the foreign language document, highlighting the evidence for relevance. When the foreign language is a low-resource language, cross-lingual search and summarization are more difficult; translation capabilities may be poor and the lack of resources make it difficult to train CLIR and summarization systems. To complicate matters even more, when the collection contains speech as well as text, producing accurate search results and generating interpretable summaries is even more difficult.

This workshop aims to stimulate collection and provision of resources that can improve systems that perform cross-lingual search and summarization. To facilitate dissemination of information about existing resources, the workshop will feature keynote speeches and panels by people who have worked in this area, have cross-lingual resources to share, or can describe ongoing research programs and shared tasks. In addition, we will have a call that solicits papers describing recent and current research in these areas, that describe relevant resources, or that stake out positions on the directions in which the authors think the field should move.

The motivation of the workshop is to stimulate the sharing of resources for the tasks of cross-lingual search and summarization over low resource languages. The lack of such resources hinders research that focuses on development of such systems. While there have been workshops on multi-lingual summarization, the languages addressed have been quite limited, with a focus on English-Chinese. Much of the summarization field focuses now on neural net approaches, which require large amounts of data. While such data has been made available for English news and a few other genres, large scale resources for cross-lingual summarization are virtually non-existent.

Evaluation poses particular challenges for CLIR from low-recourse languages because representative and redistributable digital text or speech can be difficult to obtain in the needed quantities, performing relevance judgments requires specialized linguistic expertise, and the resulting costs may be amortized across fewer research uses than for high-resource languages.

Thus, there is a huge need for the development, sharing and use of affordable cross-lingual resources in this space. To set the stage, the organizers will provide two small spoken language test collections that include waveforms, transcriptions, queries, and relevance judgments. These are conversational genres, one in Somali (a very-low resource language) and the other in Bulgarian (a moderate-resource language). We will welcome papers that provide results on these test collections as well as any datasets that are available from by ELDA, LDC, or other repositories. Participants are also free to describe other datasets that they have access to and to report results on these.

We welcome papers on research that broadly relates to supporting information access to lower-resource languages. It may include, but is not limited to:

Test collections for evaluating CLIR
Development of new cross-lingual resources
Datasets for cross-lingual summarization
Methods for CLIR
CLIR over speech
Evidence generation for CLIR
Methods for cross-lingual summarization
Methods for cross-lingual query-focused summarization
Snippet generation
Speech summarization
Multilingual language generation
Zero-shot learning and domain adaptation
Explainable methods for cross-lingual NLP

Data

IARPA has released some data in support of the workshop, which is discussed in a README file. To obtain this data, follow the instructions below:

Download the data agreement. Fill out all the information, sign, and return it to material_poc@nist.gov.
Once the information is received, NIST will give access to a Google drive that contains the data. In order to be added to the Google drive, the email address given in the data agreement must be associated to a Google account but it doesn’t have to be a gmail address.
Contact material_poc@nist.gov if you have any question or issue.

Accepted Papers

The Proceedings is now available as a PDF file.

Piyush Arora, Dimitar Shterionov, Yasufumi Moriya, Abhishek Kaushik, Daria Dzendzik and Gareth Jones, An Investigative Study of Multi-Modal Cross-Lingual Retrieval
Joel Barry, Elizabeth Boschee, Marjorie Freedman and Scott Miller, SEARCHER: Shared Embedding Architecture for Effective Retrieval
Petra Galuscakova, Douglas Oard, Joe Barrow, Suraj Nair, Shing Han-Chin, Elena Zotkina, Ramy Eskander and Rui Zhang, MATERIALizing Cross-Language Information Retrieval: A Snapshot
Zhuolin Jiang, Amro Jaroudi, William Hartmann, Damianos Karakos and Lingjun Zhao, Cross-lingual Information Retrieval with BERT
Damianos Karakos, Rabih Zbib, William Hartmann, Richard Schwartz and John Makhoul, Reformulating Information Retrieval from Speech and Text as a Detection Problem
Carl Rubino, The Effect of Linguistic Parameters in CLIR Performance
Richard Schwartz, John Makhoul, Lee Tarlin and Damianos Karakos, What Set of Documents to Present to an Analyst?
David Wan, Zhengping Jiang, Chris Kedzie, Elsbeth Turcan and Peter Bell, Subtitles to Segmentation: Improving Low-Resource Speech-to-TextTranslation Pipelines
Ilya Zavorin, Aric Bills, Cassian Corey, Michelle Morrison, Audrey Tong and Richard Tong, Corpora for Cross-Language Information Retrieval in Six Less-Resourced Languages
Le Zhang, Damianos Karakos, William Hartmann, Manaj Srivastava, Lee Tarlin, David Akodes, Sanjay Krishna Gouda, Numra Bathool, Lingjun Zhao, Zhuolin Jiang, Richard Schwartz and John Makhoul, A Cross-lingual Information Retrieval System for Low-Resource Languages
Elaine Zosa, Mark Granroth-Wilding and Lidia Pivovarova, A Comparison of Unsupervised Methods for Ad hoc Cross-Lingual Document Retrieval

Important Dates

February 28, 2020 (extended!): Paper submissions due
March 23, 2020: Notification
April 13, 2020: Camera ready paper deadline

Submissions

Blind submissions can be made using the START Conference Management Tool by February 28, 2020 (Anywhere on Earth).
LaTeX, Word and Open Office templates for submissions are available.

Organizing Committee

James Allan, University of Massachusetts at Amherst (USA)
Lu Wang, Northeastern University (USA)
Kathy McKeown, Columbia University (USA)
Douglas W. Oard, University of Maryland (USA)
Steve Renals, University of Edinburgh (UK)
Richard Schwartz, Raytheon BBN Technologies (USA)

Program Committee

Eneko Agirre, University of the Basque Country (Spain)
Piyush Arora, American Express Big Data Labs (India)
Mohit Bansal, University of North Carolina (USA)
Nicola Ferro, University of Padua (Italy)
Petra Galuscakova, University of Maryland (USA)
Jan Hajic, Charles University (Czech Republic)
Gareth Jones, Dublin City University (Ireland)
Damianos Karakos, Reytheon BBN Technologies, (USA)
Jonathan May, University of Southern California Information Sciences Institute (USA)
Jessica Ouyang, University of Texas at Dallas (USA)
Pavel Pecina, Charles University (Czech Republic)
Kay Peterson, NIST (USA)
Dragomir Radev, Yale University (USA)
Hussein Suleman, University of Cape Town (South Africa)
Audrey Tong, NIST (USA)
Xabier Saralegi Urizar, Elhuyar Foundation (Spain)
Ilya Zavorin, Bluemont Technology (USA)
Rui Zhang, Yale University (USA)

Doug Oard

Last modified: Sun May 17 16:22:16 2020

LREC 2020 Workshop on Cross-Language Search and Summarization of Text and Speech

Originally Scheduled for May 16, 2020 Palais du Pharo, Marseilles, France

Purpose

Data

Accepted Papers

Important Dates

Submissions

Organizing Committee

Program Committee

LREC 2020 Workshop on
Cross-Language Search and Summarization of Text and Speech

Originally Scheduled for May 16, 2020
Palais du Pharo, Marseilles, France