LREC Workshop on
Cross-Language Search and Summarization of Text and Speech

May 16, 2020
Palais du Pharo, Marseilles, France

Purpose | Important Dates | Data | Submissions | Program | Registration | Organizing Committee | Program Committee

Purpose

In today’s global world, people may need access to information that only appears online in a language they do not speak. Cross-Language Information Retrieval (CLIR) enables end users to issue queries in their own language, but provides results from multiple languages around the world, often using translation so that the end user can quickly understand whether the retrieved results are relevant. Cross-lingual summarization can make it easier for an end user to determine if a document is relevant by providing a summary in English of the foreign language document, highlighting the evidence for relevance. When the foreign language is a low-resource language, cross-lingual search and summarization are more difficult; translation capabilities may be poor and the lack of resources make it difficult to train CLIR and summarization systems. To complicate matters even more, when the collection contains speech as well as text, producing accurate search results and generating interpretable summaries is even more difficult.

This workshop aims to stimulate collection and provision of resources that can improve systems that perform cross-lingual search and summarization. To facilitate dissemination of information about existing resources, the workshop will feature keynote speeches and panels by people who have worked in this area, have cross-lingual resources to share, or can describe ongoing research programs and shared tasks. In addition, we will have a call that solicits papers describing recent and current research in these areas, that describe relevant resources, or that stake out positions on the directions in which the authors think the field should move.

The motivation of the workshop is to stimulate the sharing of resources for the tasks of cross-lingual search and summarization over low resource languages. The lack of such resources hinders research that focuses on development of such systems. While there have been workshops on multi-lingual summarization, the languages addressed have been quite limited, with a focus on English-Chinese. Much of the summarization field focuses now on neural net approaches, which require large amounts of data. While such data has been made available for English news and a few other genres, large scale resources for cross-lingual summarization are virtually non-existent.

Evaluation poses particular challenges for CLIR from low-recourse languages because representative and redistributable digital text or speech can be difficult to obtain in the needed quantities, performing relevance judgments requires specialized linguistic expertise, and the resulting costs may be amortized across fewer research uses than for high-resource languages.

Thus, there is a huge need for the development, sharing and use of affordable cross-lingual resources in this space. To set the stage, the organizers will provide two small spoken language test collections that include waveforms, transcriptions, queries, and relevance judgments. These are conversational genres, one in Somali (a very-low resource language) and the other in Bulgarian (a moderate-resource language). We will welcome papers that provide results on these test collections as well as any datasets that are available from by ELDA, LDC, or other repositories. Participants are also free to describe other datasets that they have access to and to report results on these.

We welcome papers on research that broadly relates to supporting information access to lower-resource languages. It may include, but is not limited to:

Important Dates

Data

IARPA has released some data in support of the workshop. To obtain this data, follow the instructions below:

Submissions

LaTeX, Word and Open Office templates for submissions are now available.

Program

TBA

Registration

Participants should register on the LREC Web site.

Organizing Committee

Program Committee


Doug Oard
Last modified: Sat Jan 18 20:21:56 2020