|Purpose |||Important Dates |||Data |||Submissions |||Organizing Committee |||Program Committee|
In today’s global world, people may need access to information that
only appears online in a language they do not speak. Cross-Language
Information Retrieval (CLIR) enables end users to issue queries in
their own language, but provides results from multiple languages
around the world, often using translation so that the end user can
quickly understand whether the retrieved results are relevant.
Cross-lingual summarization can make it easier for an end user to
determine if a document is relevant by providing a summary in English
of the foreign language document, highlighting the evidence for
relevance. When the foreign language is a low-resource language,
cross-lingual search and summarization are more difficult; translation
capabilities may be poor and the lack of resources make it difficult
to train CLIR and summarization systems. To complicate matters even
more, when the collection contains speech as well as text, producing
accurate search results and generating interpretable summaries is even
This workshop aims to stimulate collection and provision of resources that can improve systems that perform cross-lingual search and summarization. To facilitate dissemination of information about existing resources, the workshop will feature keynote speeches and panels by people who have worked in this area, have cross-lingual resources to share, or can describe ongoing research programs and shared tasks. In addition, we will have a call that solicits papers describing recent and current research in these areas, that describe relevant resources, or that stake out positions on the directions in which the authors think the field should move.
The motivation of the workshop is to stimulate the sharing of resources for the tasks of cross-lingual search and summarization over low resource languages. The lack of such resources hinders research that focuses on development of such systems. While there have been workshops on multi-lingual summarization, the languages addressed have been quite limited, with a focus on English-Chinese. Much of the summarization field focuses now on neural net approaches, which require large amounts of data. While such data has been made available for English news and a few other genres, large scale resources for cross-lingual summarization are virtually non-existent.
Evaluation poses particular challenges for CLIR from low-recourse languages because representative and redistributable digital text or speech can be difficult to obtain in the needed quantities, performing relevance judgments requires specialized linguistic expertise, and the resulting costs may be amortized across fewer research uses than for high-resource languages.
Thus, there is a huge need for the development, sharing and use of affordable cross-lingual resources in this space. To set the stage, the organizers will provide two small spoken language test collections that include waveforms, transcriptions, queries, and relevance judgments. These are conversational genres, one in Somali (a very-low resource language) and the other in Bulgarian (a moderate-resource language). We will welcome papers that provide results on these test collections as well as any datasets that are available from by ELDA, LDC, or other repositories. Participants are also free to describe other datasets that they have access to and to report results on these.
We welcome papers on research that broadly relates to supporting information access to lower-resource languages. It may include, but is not limited to: