Ljubesic N.,University of Zagreb |
Stefanescu D.,Research Institute for Artificial Intelligence |
Tadic M.,University of Zagreb
Proceedings of the 10th Terminology and Knowledge Engineering Conference: New Frontiers in the Constructive Symbiosis of Terminology and Knowledge Engineering, TKE 2012 | Year: 2012
Although term extraction has been researched for more than 20 years, only a few studies focus on under-resourced languages. Moreover, bilingual term mapping from comparable corpora for these languages has attracted re-searchers only recently. This paper presents methods for term extraction, term tagging in documents, and bilingual term mapping from comparable corpora for four under-resourced languages: Croatian, Latvian, Lithuanian, and Romanian. Methods described in this paper are language independent as long as language specific parameter data is provided by the user and the user has access to a part of speech or a morpho-syntactic tagger.
Badea I.,Polytechnic University of Bucharest |
Trausan-Matu S.,Polytechnic University of Bucharest |
Trausan-Matu S.,Research Institute for Artificial Intelligence
Proceedings - RoEduNet IEEE International Conference | Year: 2014
The paper presents a research towards the combination of text mining and time series analysis in order to develop tools for the analysis of sequences of time-related documents, like chat logs. It is described an experiment of using such techniques applied on chats, for determining correlations among most frequently used words, considering their time occurrences and computing the correlations between the rhythmicities of interventions with high frequency appearance, using the time series model. © 2014 IEEE.
Jimenez-Madrid A.,CRN Consultores |
Carrasco F.,University of Malaga |
Martinez C.,Geological Survey of Spain |
Gogu R.C.,Technical University of Civil Engineering Bucharest |
Gogu R.C.,Research Institute for Artificial Intelligence
Quarterly Journal of Engineering Geology and Hydrogeology | Year: 2013
This paper proposes a new method, called DRISTPI, to evaluate the intrinsic vulnerability to contamination of different types of aquifers. Taking the DRASTIC method as a starting point, we high-light the need to define two scenarios to differentiate karst materials from the rest of the study area. The changes made in DRISTPI, with respect to DRASTIC, include the elimination of factors that are mainly related to the movement of water through the saturated zone of the aquifer (the original A and C factors) because the aim of this new method is to protect the groundwater (the resource) rather than the water supply (the source). Furthermore, the DRISTPI method incorporates a new factor called PI to character-ize areas of preferential infiltration. Specifically, the vulnerability of two European aquifers with different geological, hydrogeological and climatic characteristics was evaluated using the DRISTPI method, and the results were compared with those obtained using DRASTIC, PI, COP, the Slovene Approach and PaPRIKa methods. These results were statistically analysed by confronting spatial autocorrelation coefficients to measure the cross-correlation between pairs of vulnerability maps. © 2013 The Geological Society of London.
Ciuca S.,Polytechnic University of Bucharest |
Vlad A.,Polytechnic University of Bucharest |
Vlad A.,Research Institute for Artificial Intelligence |
Mitrea A.,Polytechnic University of Bucharest
UPB Scientific Bulletin, Series A: Applied Mathematics and Physics | Year: 2012
The paper focuses on a mathematical comparison between several single author corpora looking to give an answer to an open problem in literature: if and what are the terms one can speak of a general linguistic model or the author variability is too influent and we can only have separate author models. For the comparisons, an original procedure advanced by the authors in some previous studies was used, here extended and adapted for various forms of the corpora. That procedure implies the determination of the probability with a representative confidence interval for every investigated linguistic event in each analyzed corpus. The decision of determining the representative interval for probability is based on the probability estimation with statistical confidence intervals and also on tests verifying the hypothesis that the probability belongs to a certain interval. The final decision is also supported by the accuracy of the results considering the two types of error probability involved in the statistical tests. The experimental study is done on five independently built corpora, each of them being made of novels written by only one author. For each of them a detailed linguistic event analysis was made.
Boros T.,Research Institute for Artificial Intelligence
International Conference Recent Advances in Natural Language Processing, RANLP | Year: 2013
General natural language processing and text-to-speech applications require certain (lexical level) processing steps in order to solve some frequent tasks such as lemmatization, syllabification, lexical stress prediction and phonetic transcription. These steps usually require knowledge of the word's lexical composition (derivative morphology, inflectional affixes, etc.). For known words all applications use lexicons, but there are always out-of-vocabulary (OOV) words that impede the performance of NLP and speech synthesis applications. In such cases, either rule based or data-driven techniques are used to automatically process these OOV words and generate the desired results. In this paper we describe how the above mentioned tasks can be achieved using a Perceptron with the Margin Infused Relaxed Algorithm (MIRA) and sequence labeling.