Leveling J.,Centre for Next Generation Localisation
ACM International Conference Proceeding Series | Year: 2013
This paper presents results for DCU's second participation in the SMS-based FAQ Retrieval task at FIRE. For FIRE 2012, we submitted runs for the monolingual English and Hindi and the crosslingual English to Hindi subtasks. Com- pared to our experiments for FIRE 2011, our system was simplified by using a single retrieval engine (instead of three) and using a single approach for detection of out of domain queries (instead of three). In our approach, the SMS queries are transformed into a normalized, corrected form and sub- mitted to a retrieval engine to obtain a ranked list of FAQ results. A classifier trained on features extracted from the training data then determines which queries are out of do- main and which are not. For our crosslingual English to Hindi experiments, we trained a statistical machine transla- Tion system for Hindi to English translation to translate the full Hindi FAQ documents into English. The retrieval then operates on the corrected English input and retrieves results from the translated Hindi FAQ documents. Our best experiments achieved an MRR of 0.949 for the monolingual English subtask, 0.880 for themonolingual Hindi subtask, and 0.450 for the crosslingual subtask. © 2013 ACM.
Larson M.,Technical University of Delft |
Jones G.J.F.,Centre for Next Generation Localisation
Foundations and Trends in Information Retrieval | Year: 2011
Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR. © 2012 M. Larson and G. J. F. Jones.
Jones G.J.F.,Centre for Next Generation Localisation
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2013
Language and multimedia technology research often relies on large manually constructed datasets for training or evaluation of algorithms and systems. Constructing these datasets is often expensive with significant challenges in terms of recruitment of personnel to carry out the work. Crowdsourcing methods using scalable pools of workers available on-demand offers a flexible means of rapid low-cost construction of many of these datasets to support existing research requirements and potentially promote new research initiatives that would otherwise not be possible. © 2013 Springer-Verlag.
Morrissey S.,Centre for Next Generation Localisation
Machine Translation | Year: 2013
This article explores the application of data-driven machine translation (MT) to sign languages (SLs). The provision of an SL MT system can facilitate communication between Deaf and hearing people by translating information into the native and preferred language of the individual. In this paper we address data-driven SL MT predominantly for Irish SL (ISL) but also for German SL (DGS/Deutsche Gebärdensprache). We take two different purpose-built corpora to feed our MaTrEx MT system and in a set of experiments translating both to and from the SLs, we investigate the effects of SL data on statistical MT (SMT). Exploiting the bidirectionality of the MaTrEx system, we demonstrate how additional modules, such as recognition and SL animation, can potentially build a full SL MT model for spoken and SL communication in addition to promising evaluation scores. A secondary focus of the article is on the two main issues affecting SL MT, those of transcription and evaluation. We offer a discussion on both these common problems before concluding. © 2013 Springer Science+Business Media Dordrecht.
Deemter K.V.,University of Aberdeen |
Gatt A.,University of Tilburg |
Sluis I.V.D.,Centre for Next Generation Localisation |
Power R.,Open University Milton Keynes
Cognitive Science | Year: 2012
A substantial amount of recent work in natural language generation has focused on the generation of ''one-shot'' referring expressions whose only aim is to identify a target referent. Dale and Reiter's Incremental Algorithm (IA) is often thought to be the best algorithm for maximizing the similarity to referring expressions produced by people. We test this hypothesis by eliciting referring expressions from human subjects and computing the similarity between the expressions elicited and the ones generated by algorithms. It turns out that the success of the IA depends substantially on the ''preference order'' (PO) employed by the IA, particularly in complex domains. While some POs cause the IA to produce referring expressions that are very similar to expressions produced by human subjects, others cause the IA to perform worse than its main competitors; moreover, it turns out to be difficult to predict the success of a PO on the basis of existing psycholinguistic findings or frequencies in corpora. We also examine the computational complexity of the algorithms in question and argue that there are no compelling reasons for preferring the IA over some of its main competitors on these grounds. We conclude that future research on the generation of referring expressions should explore alternatives to the IA, focusing on algorithms, inspired by the Greedy Algorithm, which do not work with a fixed PO. © 2011 Cognitive Science Society, Inc.