Sanchez-Cartagena V.M.,Prompsit Language Engineering |
Perez-Ortiz J.A.,University of Alicante |
Sanchez-Martinez F.,University of Alicante
Journal of Artificial Intelligence Research | Year: 2016
We describe a hybridisation strategy whose objective is to integrate linguistic resources from shallow-transfer rule-based machine translation (RBMT) into phrase-based statistical machine translation (PBSMT). It basically consists of enriching the phrase table of a PBSMT system with bilingual phrase pairs matching transfer rules and dictionary entries from a shallow-transfer RBMT system. This new strategy takes advantage of how the linguistic resources are used by the RBMT system to segment the source-language sentences to be translated, and overcomes the limitations of existing hybrid approaches that treat the RBMT systems as a black box. Experimental results confirm that our approach delivers translations of higher quality than existing ones, and that it is specially useful when the parallel corpus available for training the SMT system is small or when translating out-of-domain texts that are well covered by the RBMT dictionaries. A combination of this approach with a recently proposed unsupervised shallow-transfer rule inference algorithm results in a significantly greater translation quality than that of a baseline PBSMT; in this case, the only hand-crafted resource used are the dictionaries commonly used in RBMT. Moreover, the translation quality achieved by the hybrid system built with automatically inferred rules is similar to that obtained by those built with hand-crafted rules. © 2016 AI Access Foundation. All rights reserved.
Forcada M.L.,University of Alicante |
Ginesti-Rosell M.,University of Alicante |
Nordfalk J.,Copenhagen University |
ORegan J.,Eolaistriu Technologies |
And 5 more authors.
Machine Translation | Year: 2011
Apertium is a free/open-source platform for rule-based machine translation. It is being widely used to build machine translation systems for a variety of language pairs, especially in those cases (mainly with related-language pairs) where shallow transfer suffices to produce good quality translations, although it has also proven useful in assimilation scenarios with more distant pairs involved. This article summarises the Apertium platform: the translation engine, the encoding of linguistic data, and the tools developed around the platform. The present limitations of the platform and the challenges posed for the coming years are also discussed. Finally, evaluation results for some of the most active language pairs are presented. An appendix describes Apertium as a free/open-source project. © 2011 Springer Science+Business Media B.V.
Duran J.,University of Barcelona |
Villarejo L.,University of Barcelona |
Farrus M.,University of Barcelona |
Ortiz S.,Prompsit Language Engineering |
Ramirez G.,Prompsit Language Engineering
Studies in Computational Intelligence | Year: 2013
The Universitat Oberta de Catalunya (Open University of Catalonia, UOC), is a public university based in Barcelona. The UOC is characterised by three main factors: (a) it is a virtual university based in an e-Learning model, (b) it is based in a strongly Spanish-Catalan bilingual region, and (c) students come from around the world, so that linguistic and cultural diversity is a crucial factor. Within this context, it becomes essential to meet the UOC's linguistic needs taking into account its particular characteristics. One of the tools created to this end is the adaptation of Apertium, a free/open-source rule-based machine translation platform, which can be found under http://apertium.uoc.edu/ , customised to the translation needs of the institution in order to offer the best possible service to their user community. In order to continue adapting and adding value to the existing tool for generalisable large-scale applications, the UOC's translation system has recently implemented a semantic filter based on subject fields aimed at improving the translation quality and at better fitting the university needs. The paper will explain all the steps of this adaptive process, as well as a demonstration of the resulting tool: (a) the choice of the subject fields according to the university studies, (b) the design and implementation of the dictionaries used to extract the required information to filter and disambiguate homonym and polysemous terms, including source code in the dictionaries, and (c) the design and implementation of the corresponding web interface. © Springer-Verlag Berlin Heidelberg 2013.