Nebot V.,Lenguajes Y Sistemas Informaticos |
Berlanga R.,Lenguajes Y Sistemas Informaticos
ACM International Conference Proceeding Series | Year: 2010
The Semantic Web has become a new environment that enables organizations to attach semantic annotations taken from ontologies to the information they generate. As a result, large amounts of complex, semi-structured and heterogeneous semantic data repositories are being made available, making necessary new data warehouse tools for analyzing the Semantic Web. In this paper, we present a semi-automatic method for the identification and extraction of valid facts aimed at analyzing semantic data expressed as instance stores in RDF/OWL. The starting point of the method is a multidimensional (MD) star schema (i.e., subject of analysis, dimensions and measures) designed by the analyst by picking up concepts and properties from the ontology. The method exploits the semantics and theoretical foundations of Description Logics to derive valid combinations of instances into fact tuples. Moreover, some specific index structures are applied to the ontology in order to reach scalability and effectiveness. © 2010 ACM.
Nebot V.,Lenguajes y Sistemas Informaticos |
Berlanga R.,Lenguajes y Sistemas Informaticos
ACM International Conference Proceeding Series | Year: 2012
The increasing amount of biomedical scientific literature published on the Web is demanding new tools and methods to automatically process and extract relevant information. Traditional information extraction has focused on recognizing well-defined entities such as genes or proteins, which constitutes the basis for extracting the relations between the recognized entities. Most of the work has focused on harvesting domain-specific, pre-specified relations, which usually requires manual labor and heavy machinery. The intrinsic features and scale of the Web demand new approaches able to cope with the diversity of documents, where the number of relations is unbounded and not known in advance. This paper presents a scalable method for the extraction of biomedical relations from text. The method is not geared to any specific sub-domain (e.g. protein-protein interactions, drugdrug interactions, etc.) and does not require any manual input or deep processing. Even better, the method uses the extracted relations to infer a set of abstract semantic relations and their signature types, which constitutes a valuable source of knowledge when constructing formal knowledge bases. We enable seamless integration of the extracted relations with the available biomedical resources through the process of semantic annotation. The proposed approach has successfully been applied to the CALBC corpus (i.e. almost a million text documents) and UMLS has been used as knowledge resource for semantic annotation. Copyright © 2011 ACM.
Diez-Pastor J.F.,Lenguajes y Sistemas Informaticos |
Rodriguez J.J.,Lenguajes y Sistemas Informaticos |
Garcia-Osorio C.,Lenguajes y Sistemas Informaticos |
Kuncheva L.I.,Bangor University
Knowledge-Based Systems | Year: 2015
In Machine Learning, a data set is imbalanced when the class proportions are highly skewed. Imbalanced data sets arise routinely in many application domains and pose a challenge to traditional classifiers. We propose a new approach to building ensembles of classifiers for two-class imbalanced data sets, called Random Balance. Each member of the Random Balance ensemble is trained with data sampled from the training set and augmented by artificial instances obtained using SMOTE. The novelty in the approach is that the proportions of the classes for each ensemble member are chosen randomly. The intuition behind the method is that the proposed diversity heuristic will ensure that the ensemble contains classifiers that are specialized for different operating points on the ROC space, thereby leading to larger AUC compared to other ensembles of classifiers. Experiments have been carried out to test the Random Balance approach by itself, and also in combination with standard ensemble methods. As a result, we propose a new ensemble creation method called RB-Boost which combines Random Balance with AdaBoost.M2. This combination involves enforcing random class proportions in addition to instance re-weighting. Experiments with 86 imbalanced data sets from two well known repositories demonstrate the advantage of the Random Balance approach. © 2015 Elsevier B.V. All rights reserved.