Time filter
Source Type

Salway A.,Dublin City University | Kelly L.,Dublin City University | Skadina I.,Tilde | Jones G.J.F.,Dublin City University
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2010

A novel fact extraction task is defined to fill a gap between current information retrieval and information extraction technologies. It is shown that it is possible to extract useful partially structured facts about different kinds of entities in a broad domain, i.e. all kinds of places depicted in tourist images. Importantly the approach does not rely on existing linguistic resources (gazetteers, taggers, parsers, etc.) and it ported easily and cheaply between two rather different languages (English and Latvian). Previous fact extraction from the web has focused on the extraction of structured data, e.g. (Building-LocatedIn-Town). In contrast we extract richer and more interesting facts, such as a fact explaining why a building was built. Enough structure is maintained to facilitate subsequent processing of the information. For example, the partial structure enables straightforward template-based text generation. We report positive results for the correctness and interest of English and Latvian facts and for their utility in enhancing image captions. © 2010 Springer-Verlag Berlin Heidelberg.

Deksne D.,Tilde | Skadina I.,Tilde | Skadins R.,Tilde
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2014

This paper reports on the implementation of grammar checkers and parsers for highly inflected and under-resourced languages. As classical context free grammar (CFG) formalism performs poorly on languages with a rich morphological feature system, we have extended the CFG formalism by adding syntactic roles, lexical constraints, and constraints on morpho-syntactic feature values. The formalism also allows to assign morpho-syntactic feature values to phrases and to specify optional constituents. The paper also describes how we are implementing the grammar checker by using two sets of rules - rules describing correct sentences and rules describing grammar errors. The same engine with a different rule set can be used for the different purposes - to parse the text or to find the grammar errors. The paper also describes the implementation of Latvian and Lithuanian parsers and grammar checkers and the quality measurement methods used for the quality assessment. © 2014 Springer-Verlag Berlin Heidelberg.

Pinnis M.,Tilde | Goba K.,Tilde
Communications in Computer and Information Science | Year: 2011

In this work we describe a statistical morphological tagger for Latvian, Lithuanian and Estonian languages based on morphological tag disambiguation. These languages have rich tagsets and very high rates of morphological ambiguity. We model distribution of possible tags with an exponential probabilistic model, which allows to select and use features from surrounding context. Results show significant improvement in error rates over the baseline, the same as the results for Czech. In comparison with the simplified parameter estimation method applied for Czech, we show that maximum entropy weight estimation achieves considerably better results. © 2011 Springer-Verlag.

Salimbajevs A.,Tilde | Pinnis M.,Tilde
Frontiers in Artificial Intelligence and Applications | Year: 2014

In this paper, the authors present the results of ongoing research on Large Vocabulary Automatic Speech Recognition for the Latvian language. The paper describes the initial acoustic model, phoneme set, filler and noise models, and grapheme-to-phoneme modelling. The second part of this work is focused on language modelling. Different word and class-based n-gram models are evaluated in terms of perplexity and word error rate in a speech recognition task. The authors also train a recurrent neural network language model and use it for n-best rescoring. © 2014 The Authors and IOS Press.

Pinnis M.,Tilde | Pinnis M.,University of Latvia
Frontiers in Artificial Intelligence and Applications | Year: 2014

Transliteration dictionaries are an important resource for the development of machine transliteration systems. The paper describes and analyses a large multilingual transliteration dictionary extracted from probabilistic dictionaries for 24 European languages containing approximately 1.25 million transliterated word pairs. The transliteration dictionary is evaluated: 1) manually for the Latvian-English language pair and 2) automatically within a statistical machine translation based transliteration task for all 23 language pairs. © 2014 The Authors and IOS Press.

Auzina I.,University of Latvia | Pinnis M.,Tilde | Dargis R.,University of Latvia
Frontiers in Artificial Intelligence and Applications | Year: 2014

Grapheme to phoneme modelling is one of the key features in automated speech recognition and speech synthesis. In this paper, the authors compare two different approaches: a statistical machine translation based method using the phonetically transcribed Latvian Speech Recognition Corpus and a rule-based method for phonetic transcription of words from grammatically correct forms. The paper provides 10-fold cross-validation results and error analysis for both methods. © 2014 The Authors and IOS Press.

Peisenieks J.,University of Latvia | Skadins R.,Tilde
Frontiers in Artificial Intelligence and Applications | Year: 2014

This paper reports on the viability of using machine translation (MT) for determining the original sentiment of tweets, when translating tweets made in internationally less used language into more frequently used ones. The results of the study show that it is possible to use MT and sentiment analysis (SA) systems to produce SA results with significant precision. © 2014 The Authors and IOS Press.

Vira I.,Tilde | Vasiljevs A.,Tilde
Frontiers in Artificial Intelligence and Applications | Year: 2014

In this paper we present two prototypes of 3D based virtual agents: one chatbot which in addition to the ability to hold a conversation can perform translation from English into Spanish, Russian, and French; and another which supplies currency conversion (lats to euro and euro to lats) in the Latvian language. Both chatbots are voice controlled, with natural mimicry and representations of human-like emotions. We describe the motivation, development process, design and architecture of these mobile applications. The evaluation of both applications and their usage in selected scenarios is also presented. © 2014 The Authors and IOS Press.

Pinnis M.,Tilde | Skadins R.,Tilde
Frontiers in Artificial Intelligence and Applications | Year: 2012

In this paper the authors present various techniques of how to achieve MT domain adaptation with limited in-domain resources. This paper gives a case study of what works and what not if one has to build a domain specific machine translation system. Systems are adapted using in-domain comparable monolingual and bilingual corpora (crawled from the Web) and bilingual terms and named entities. The authors show how to efficiently integrate terms within statistical machine translation systems, thus significantly improving upon the baseline. © 2012 The Authors and IOS Press.

Salimbajevs A.,Tilde | Strigins J.,Tilde
International Conference Recent Advances in Natural Language Processing, RANLP | Year: 2015

Developing a large vocabulary automaticspeech recognition system is a very difficult task, due to the high variations in domain andacoustic variability. This task is even more difficult for the Latvian language, which is veryrich morphologically and in which one word can have dozens of surface forms. Althoughthere is some research on speech recognition for Latvian, Latvian ASR remains behind "big" languages such as English, German etc. In order to improve the performance of LatvianASR, it is important to understand what errors does it make and why. In this paper, the authorsanalyze the most common errors of Latvian ASR. Based on this, baseline system WER isimproved from 30.94% to 28.43%.

Loading Tilde collaborators
Loading Tilde collaborators