Centre for Next Generation Localisation

Dublin, Ireland
Time filter
Source Type

Deemter K.V.,University of Aberdeen | Gatt A.,University of Tilburg | Sluis I.V.D.,Centre for Next Generation Localisation | Power R.,Open University Milton Keynes
Cognitive Science | Year: 2012

A substantial amount of recent work in natural language generation has focused on the generation of ''one-shot'' referring expressions whose only aim is to identify a target referent. Dale and Reiter's Incremental Algorithm (IA) is often thought to be the best algorithm for maximizing the similarity to referring expressions produced by people. We test this hypothesis by eliciting referring expressions from human subjects and computing the similarity between the expressions elicited and the ones generated by algorithms. It turns out that the success of the IA depends substantially on the ''preference order'' (PO) employed by the IA, particularly in complex domains. While some POs cause the IA to produce referring expressions that are very similar to expressions produced by human subjects, others cause the IA to perform worse than its main competitors; moreover, it turns out to be difficult to predict the success of a PO on the basis of existing psycholinguistic findings or frequencies in corpora. We also examine the computational complexity of the algorithms in question and argue that there are no compelling reasons for preferring the IA over some of its main competitors on these grounds. We conclude that future research on the generation of referring expressions should explore alternatives to the IA, focusing on algorithms, inspired by the Greedy Algorithm, which do not work with a fixed PO. © 2011 Cognitive Science Society, Inc.

Leveling J.,Centre for Next Generation Localisation
ACM International Conference Proceeding Series | Year: 2013

This paper presents results for DCU's second participation in the SMS-based FAQ Retrieval task at FIRE. For FIRE 2012, we submitted runs for the monolingual English and Hindi and the crosslingual English to Hindi subtasks. Com- pared to our experiments for FIRE 2011, our system was simplified by using a single retrieval engine (instead of three) and using a single approach for detection of out of domain queries (instead of three). In our approach, the SMS queries are transformed into a normalized, corrected form and sub- mitted to a retrieval engine to obtain a ranked list of FAQ results. A classifier trained on features extracted from the training data then determines which queries are out of do- main and which are not. For our crosslingual English to Hindi experiments, we trained a statistical machine transla- Tion system for Hindi to English translation to translate the full Hindi FAQ documents into English. The retrieval then operates on the corrected English input and retrieves results from the translated Hindi FAQ documents. Our best experiments achieved an MRR of 0.949 for the monolingual English subtask, 0.880 for themonolingual Hindi subtask, and 0.450 for the crosslingual subtask. © 2013 ACM.

Pahl C.,Centre for Next Generation Localisation
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2012

The essence of cloud computing is the provision of software and hardware services to a range of users in different locations. The aim of cloud service localisation is to facilitate the internationalisation and localisation of cloud services by allowing their adaption to different locales. We address the lingual localisation by providing service-level language translation techniques to adopt services to different languages and regulatory localisation by providing standards-based mappings to achieve regulatory compliance with regionally varying laws, standards and regulations. The aim is to support and enforce the explicit modelling of aspects particularly relevant to localisation and runtime support consisting of tools and middleware services to automating the deployment based on models of locales, driven by the two localisation dimensions. We focus here on an ontology-based conceptual information model that integrates locale specification in a coherent way. © 2012 Springer-Verlag.

Ghorab M.R.,Centre for Next Generation Localisation | Zhou D.,Centre for Next Generation Localisation | O'Connor A.,Centre for Next Generation Localisation | Wade V.,Centre for Next Generation Localisation
User Modelling and User-Adapted Interaction | Year: 2013

Information Retrieval (IR) systems assist users in finding information from the myriad of information resources available on the Web. A traditional characteristic of IR systems is that if different users submit the same query, the system would yield the same list of results, regardless of the user. Personalised Information Retrieval (PIR) systems take a step further to better satisfy the user's specific information needs by providing search results that are not only of relevance to the query but are also of particular relevance to the user who submitted the query. PIR has thereby attracted increasing research and commercial attention as information portals aim at achieving user loyalty by improving their performance in terms of effectiveness and user satisfaction. In order to provide a personalised service, a PIR system maintains information about the users and the history of their interactions with the system. This information is then used to adapt the users' queries or the results so that information that is more relevant to the users is retrieved and presented. This survey paper features a critical review of PIR systems, with a focus on personalised search. The survey provides an insight into the stages involved in building and evaluating PIR systems, namely: information gathering, information representation, personalisation execution, and system evaluation. Moreover, the survey provides an analysis of PIR systems with respect to the scope of personalisation addressed. The survey proposes a classification of PIR systems into three scopes: individualised systems, community-based systems, and aggregate-level systems. Based on the conducted survey, the paper concludes by highlighting challenges and future research directions in the field of PIR. © 2012 Springer Science+Business Media B.V.

Zhou D.,Centre for Next Generation Localisation | Lawless S.,Centre for Next Generation Localisation | Wade V.,Centre for Next Generation Localisation
Information Retrieval | Year: 2012

Social tagging systems have gained increasing popularity as a method of annotating and categorizing a wide range of different web resources. Web search that utilizes social tagging data suffers from an extreme example of the vocabulary mismatch problem encountered in traditional information retrieval (IR). This is due to the personalized, unrestricted vocabulary that users choose to describe and tag each resource. Previous research has proposed the utilization of query expansion to deal with search in this rather complicated space. However, non-personalized approaches based on relevance feedback and personalized approaches based on co-occurrence statistics only showed limited improvements. This paper proposes a novel query expansion framework based on individual user profiles mined from the annotations and resources the user has marked. The underlying theory is to regularize the smoothness of word associations over a connected graph using a regularizer function on terms extracted from top-ranked documents. The intuition behind the model is the prior assumption of term consistency: the most appropriate expansion terms for a query are likely to be associated with, and influenced by terms extracted from the documents ranked highly for the initial query. The framework also simultaneously incorporates annotations and web documents through a Tag-Topic model in a latent graph. The experimental results suggest that the proposed personalized query expansion method can produce better results than both the classical non-personalized search approach and other personalized query expansion methods. Hence, the proposed approach significantly benefits personalized web search by leveraging users' social media data. © 2012 Springer Science+Business Media, LLC.

Sah M.,Centre for Next Generation Localisation | Wade V.,Centre for Next Generation Localisation
HT 2011 - Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia | Year: 2011

Personalized search and browsing is increasingly vital especially for enterprises to able to reach their customers. Key challenge in supporting personalization is the need for rich metadata such as cognitive metadata about documents. As we consider size of large knowledge bases, manual annotation is not scalable and feasible. On the other hand, automatic mining of cognitive metadata is challenging since it is very difficult to understand underlying intellectual knowledge about documents automatically. To alleviate this problem, we introduce a novel metadata extraction framework, which is based on fuzzy information granulation and fuzzy inference system for automatic cognitive metadata mining. The user evaluation study shows that our approach provides reasonable precision rates for difficulty, interactivity type, and interactivity level on the examined 100 documents. In addition, proposed fuzzy inference system achieves improved results compared to a rule-based reasoner for document difficulty metadata extraction (11% improvement). © 2011 ACM.

Larson M.,Technical University of Delft | Jones G.J.F.,Centre for Next Generation Localisation
Foundations and Trends in Information Retrieval | Year: 2011

Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR. © 2012 M. Larson and G. J. F. Jones.

Jones G.J.F.,Centre for Next Generation Localisation
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2013

Language and multimedia technology research often relies on large manually constructed datasets for training or evaluation of algorithms and systems. Constructing these datasets is often expensive with significant challenges in terms of recruitment of personnel to carry out the work. Crowdsourcing methods using scalable pools of workers available on-demand offers a flexible means of rapid low-cost construction of many of these datasets to support existing research requirements and potentially promote new research initiatives that would otherwise not be possible. © 2013 Springer-Verlag.

Doherty S.,Centre for Next Generation Localisation | O'Brien S.,Centre for Next Generation Localisation
International Journal of Human-Computer Interaction | Year: 2014

This article reports on the results of a project that aimed to investigate the usability of raw machine translated technical support documentation for a commercial online file storage service. Adopting a user-centered approach, the ISO/TR 16982 definition of usability-goal completion, satisfaction, effectiveness, and efficiency- is utilized and eye-tracking measures that are shown to be reliable indicators of cognitive effort are applied along with a posttask questionnaire. The study investigated these measures for the original user documentation written in English and in four target languages: Spanish, French, German, and Japanese, all of which were translated using a freely available online statistical machine translation engine. Using native speakers for each language, the study found several significant differences between the source and MT output, a finding that indicates a difference in usability between well-formed content and raw machine translated content. One target language in particular, Japanese, was found to have a considerably lower usability level when compared with the original English. © 2014 Copyright Taylor and Francis Group, LLC.

Loading Centre for Next Generation Localisation collaborators
Loading Centre for Next Generation Localisation collaborators