HP Labs Russia

Saint Petersburg, Russia

HP Labs Russia

Saint Petersburg, Russia
SEARCH FILTERS
Time filter
Source Type

Gareev R.,Kazan Federal University | Tkachenko M.,Saint Petersburg State University | Solovyev V.,Kazan Federal University | Simanovsky A.,Hp Labs Russia | Ivanov V.,National University of Science and Technology "MISIS"
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2013

Current research efforts in Named Entity Recognition deal mostly with the English language. Even though the interest in multi-language Information Extraction is growing, there are only few works reporting results for the Russian language. This paper introduces quality baselines for the Russian NER task. We propose a corpus which was manually annotated with organization and person names. The main purpose of this corpus is to provide gold standard for evaluation. We implemented and evaluated two approaches to NER: knowledge-based and statistical. The first one comprises several components: dictionary matching, pattern matching and rule-based search of lexical representations of entity names within a document. We assembled a set of linguistic resources and evaluated their impact on performance. For the data-driven approach we utilized our implementation of a linear-chain CRF which uses a rich set of features. The performance of both systems is promising (62.17% and 75.05% F1 measure), although they do not employ morphological or syntactical analysis. © 2013 Springer-Verlag.


Sapozhnikov G.,Saint Petersburg State University | Ulanov A.,HP Labs Russia
Algorithms | Year: 2012

In this paper we present the PHOCS-2 algorithm, which extracts a "Predicted Hierarchy Of ClassifierS". The extracted hierarchy helps us to enhance performance of flat classification. Nodes in the hierarchy contain classifiers. Each intermediate node corresponds to a set of classes and each leaf node corresponds to a single class. In the PHOCS-2 we make estimation for each node and achieve more precise computation of false positives, true positives and false negatives. Stopping criteria are based on the results of the flat classification. The proposed algorithm is validated against nine datasets. © 2012 by the authors.


Tkachenko M.,Saint Petersburg State University | Simanovsky A.,HP Labs Russia
11th Conference on Natural Language Processing, KONVENS 2012: Empirical Methods in Natural Language Processing - Proceedings of the Conference on Natural Language Processing 2012 | Year: 2012

We propose a domain adaptation method for supervised named entity recognition (NER). Our NER uses conditional random fields and we rank and filter out features of a new unknown domain based on the means of weights learned on known domains. We perform experiments on English texts from OntoNotes version 4 benchmark and see a statistically significant better performance on a small number of features and a convergence of performance to the maximum F 1-measure faster than conventional feature selection (information gain). We also compare with using the weights learned on a mixture of known domains.


Ulanov A.,HP Labs Russia | Shevlyakov G.,Saint Petersburg State Polytechnic University | Lyubomishchenko N.,HP Labs Russia | Mehraz P.,Inlogy Inc. | Polutin V.,HP Labs Russia
HP Laboratories Technical Report | Year: 2010

The problems of taxonomy evaluation criteria comparison and corresponding benchmark creation are considered. The classes of Primitive Ideal Taxonomies (PITs), their WordNet and disrupted versions are proposed as the sets of benchmark taxonomies for the comparison of taxonomy evaluation methods. For WordNet PITs and their perturbations, the performances of the structure-based PageRank, FloorRank, and the corpusbased Information Content criteria are studied in Monte Carlo experiment. It is shown that the proposed approach can be used for the ranking of taxonomy evaluation criteria. © Copyright WeBS 2010.


Ulanov A.,HP Labs Russia | Sapozhnikov G.,Saint Petersburg State Polytechnic University | Lyubomishchenko N.,HP Labs Russia | Polutin V.,HP Labs Russia | Shevlyakov G.,Saint Petersburg State Polytechnic University
TIR 2011 - 8th International Workshop on Text-Based Information Retrieval, in Conjunction with DEXA 2011 | Year: 2011

A novel algorithm of extracting hierarchies with the maximal F-measure for improving multilabel classification performance, the PHOCS, builds Predicted Hierarchy Of ClassifierS. Nodes contain classifiers, and each intermediate node corresponds to a set of labels, and a leaf node to a single label. Any classifier in the extracted hierarchy deals with a considerably smaller set of labels as compared to the number L of labels, and with a more balanced training distribution. This leads to an improved classification performance. Our method has linear training and logarithmic testing complexity with respect to L. The experiment was conducted on 4 multilabel datasets and it has confirmed the effectiveness of the PHOCS algorithm.


Ulanov A.,Hp Labs Russia | Shevlyakovy G.,Saint Petersburg State Polytechnic University | Lyubomishchenkoy N.,Saint Petersburg State Polytechnic University | Mehra P.,Hp Labs Russia | Polutin V.,Hp Labs Russia
Proceedings - 21st International Workshop on Database and Expert Systems Applications, DEXA 2010 | Year: 2010

The problems of taxonomy evaluation criteria comparison and corresponding benchmark creation are considered. The classes of Primitive Ideal Taxonomies (PITs), their WordNet and disrupted versions are proposed as the sets of benchmark taxonomies for the comparison of taxonomy evaluation methods. For WordNet PITs and their perturbations, the performances of the structure-based PageRank, FloorRank, and the corpus-based Information Content criteria are studied in Monte Carlo experiment. It is shown that the proposed approach can be used for the ranking of taxonomy evaluation criteria. © 2010 IEEE.


Kiseleva J.,HP Labs Russia | Simanovsky A.,HP Labs Russia
HP Laboratories Technical Report | Year: 2011

We describe results of experiments of extracting synonyms from large commercial site search engine query log. Our primary object is product search queries. The resulting dictionary of synonyms can be plugged into a search engine in order to improve search results quality. We use product database to extend the dictionary. © Copyright 2011 Hewlett-Packard Development Company.


Tkatchenko M.,Hp Labs Russia | Ulanov A.,Hp Labs Russia | Simanovsky A.,Hp Labs Russia
Proceedings - International Conference on Data Engineering | Year: 2011

Recognition of named entities (people, companies, locations, etc) is an essential task of text analytics. We address the subproblem of this task, namely, named entity classification. We propose a novel approach that constructs an effective fine-grained named entity classifier. Its key highlights are semi-automatic training set construction from Wikipedia articles and additional feature selection. We justify our solution by creating 18-class classifier and demonstrating its effectiveness and efficiency. © 2011 IEEE.

Loading HP Labs Russia collaborators
Loading HP Labs Russia collaborators