Time filter

Source Type

Manchester, United Kingdom

Miwa M.,University of Tokyo | Saetre R.,University of Tokyo | Kim J.-D.,University of Tokyo | Tsujii J.,University of Tokyo | And 2 more authors.
Journal of Bioinformatics and Computational Biology | Year: 2010

Biomedical Natural Language Processing (BioNLP) attempts to capture biomedical phenomena from texts by extracting relations between biomedical entities (i.e. proteins and genes). Traditionally, only binary relations have been extracted from large numbers of published papers. Recently, more complex relations (biomolecular events) have also been extracted. Such events may include several entities or other relations. To evaluate the performance of the text mining systems, several shared task challenges have been arranged for the BioNLP community. With a common and consistent task setting, the BioNLP'09 shared task evaluated complex biomolecular events such as binding and regulation.Finding these events automatically is important in order to improve biomedical event extraction systems. In the present paper, we propose an automatic event extraction system, which contains a model for complex events, by solving a classification problem with rich features. The main contributions of the present paper are: (1) the proposal of an effective bio-event detection method using machine learning, (2) provision of a high-performance event extraction system, and (3) the execution of a quantitative error analysis. The proposed complex (binding and regulation) event detector outperforms the best system from the BioNLP'09 shared task challenge. © 2010 2010 The Authors. Source

Wu X.,University of Tokyo | Matsuzaki T.,University of Tokyo | Tsujii J.,University of Tokyo | Tsujii J.,University of Manchester | Tsujii J.,National Center for Text Mining
Machine Translation | Year: 2010

This paper introduces deep syntactic structures to syntax-based Statistical Machine Translation (SMT). We use a Head-driven Phrase Structure Grammar (HPSG) parser to obtain the deepsyntacticstructures of a sentence, which include not only a fine-grained syntactic property description but also a semantic representation. Considering the abundant information included in the deep syntacticstructures,it is interesting to investigate whether or not they improve the traditional syntax-based translation models based on PCFG parsers. In order to use deep syntactic structures for SMT, this paperfocuses onextracting tree-to-string translation rules from aligned HPSG tree-string pairs. The major challenge is to properly localize the non-local relations among nodes in an HPSG tree. To localize thesemanticdependencies among words and phrases, which can be inherently non-local, a minimum covering tree is defined by taking a predicate word and its lexical/phrasal arguments as the frontier nodes.Starting fromthis definition, a linear-time algorithm is proposed to extract translation rules through one-time traversal of the leaf nodes in an HPSG tree. Extensive experiments on a tree-to-string translationsystem testifiedthe effectiveness of our proposal. © 2010 Springer Science+Business Media B.V. Source

Miwa M.,University of Tokyo | Pyysalo S.,University of Tokyo | Hara T.,University of Tokyo | Tsujii J.,University of Tokyo | And 2 more authors.
Coling 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference | Year: 2010

The detailed analyses of sentence structure provided by parsers have been applied to address several information extraction tasks. In a recent bio-molecular event extraction task, state-of-the-art performance was achieved by systems building specifically on dependency representations of parser output. While intrinsic evaluations have shown significant advances in both general and domain-specific parsing, the question of how these translate into practical advantage is seldom considered. In this paper, we analyze how event extraction performance is affected by parser and dependency representation, further considering the relation between intrinsic evaluation and performance at the extraction task. We find that good intrinsic evaluation results do not always imply good extraction performance, and that the types and structures of different dependency representations have specific advantages and disadvantages for the event extraction task. Source

Miwa M.,University of Tokyo | Saetre R.,University of Tokyo | Miyao Y.,National Institute of Informatics | Tsujii J.,University of Tokyo | And 2 more authors.
Coling 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference | Year: 2010

Relations between entities in text have been widely researched in the natural language processing and information extraction communities. The region connecting a pair of entities (in a parsed sentence) is often used to construct kernels or feature vectors that can recognize and extract interesting relations. Such regions are useful, but they can also incorporate unnecessary distracting information. In this paper, we propose a rule based method to remove the information that is unnecessary for relation extraction. Protein-protein interaction (PPI) is used as an example relation extraction problem. A dozen simple rules are defined on output from a deep parser. Each rule specifically examines the entities in one target interaction pair. These simple rules were tested using several PPI corpora. The PPI extraction performance was improved on all the PPI corpora. Source

Andrade D.,University of Tokyo | Matsuzaki T.,University of Tokyo | Tsujii J.,University of Manchester | Tsujii J.,National Center for Text Mining
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2011

Existing dictionaries may be effectively enlarged by finding the translations of single words, using comparable corpora. The idea is based on the assumption that similar words have similar contexts across multiple languages. However, previous research suggests the use of a simple bag-of-words model to capture the lexical context, or assumes that sufficient context information can be captured by the successor and predecessor of the dependency tree. While the latter may be sufficient for a close language-pair, we observed that the method is insufficient if the languages differ significantly, as is the case for Japanese and English. Given a query word, our proposed method uses a statistical model to extract relevant words, which tend to co-occur in the same sentence; additionally our proposed method uses three statistical models to extract relevant predecessors, successors and siblings in the dependency tree. We then combine the information gained from the four statistical models, and compare this lexical-dependency information across English and Japanese to identify likely translation candidates. Experiments based on openly accessible comparable corpora verify that our proposed method can increase Top 1 accuracy statistically significantly by around 13 percent points to 53%, and Top 20 accuracy to 91%. © 2011 Springer-Verlag. Source

Discover hidden collaborations