Research Programme on Biomedical Informatics GRIB

Barcelona, Spain

Research Programme on Biomedical Informatics GRIB

Barcelona, Spain

Time filter

Source Type

Li T.S.,Scripps Research Institute | Bravo A.,Research Programme on Biomedical Informatics GRIB | Furlong L.I.,Research Programme on Biomedical Informatics GRIB | Good B.M.,Scripps Research Institute | Su A.I.,Scripps Research Institute
Database | Year: 2016

Relations between chemicals and diseases are one of the most queried biomedical interactions. Although expert manual curation is the standard method for extracting these relations from the literature, it is expensive and impractical to apply to large numbers of documents, and therefore alternative methods are required. We describe here a crowdsourcing workflow for extracting chemical-induced disease relations from free text as part of the BioCreative V Chemical Disease Relation challenge. Five non-expert workers on the CrowdFlower platform were shown each potential chemical-induced disease relation highlighted in the original source text and asked to make binary judgments about whether the text supported the relation. Worker responses were aggregated through voting, and relations receiving four or more votes were predicted as true. On the official evaluation dataset of 500 PubMed abstracts, the crowd attained a 0.505 F-score (0.475 precision, 0.540 recall), with a maximum theoretical recall of 0.751 due to errors with named entity recognition. The total crowdsourcing cost was $1290.67 ($2.58 per abstract) and took a total of 7 h. A qualitative error analysis revealed that 46.66% of sampled errors were due to task limitations and gold standard errors, indicating that performance can still be improved. All code and results are publicly available at https://github.com/SuLab/crowd-cid-relex. © 2016 The Author(s) 2016. Published by Oxford University Press.


Kaczor A.A.,Research Programme on Biomedical Informatics GRIB | Kaczor A.A.,Medical University of Lublin | Guixa-Gonzalez R.,Research Programme on Biomedical Informatics GRIB | Carrio P.,Research Programme on Biomedical Informatics GRIB | And 3 more authors.
Journal of Molecular Modeling | Year: 2012

Protein surface roughness is a structural property associated with ligand-protein and protein-protein binding interfaces. In this work we apply for the first time the concept of surface roughness, expressed as the fractal dimension, to address structure and function of G protein-coupled receptors (GPCRs) which are an important group of drug targets. We calculate the exposure ratio and the fractal dimension for helix-forming residues of the ß2 adrenergic receptor (ß2AR), a model system in GPCR studies, in different conformational states: in complex with agonist, antagonist and partial inverse agonists. We show that both exposure ratio and roughness exhibit periodicity which results from the helical structure of GPCRs. The pattern of roughness and exposure ratio of a protein patch depends on its environment: the residues most exposed to membrane are in general most rough whereas parts of receptors mediating interhelical contacts in a monomer or protein complex are much smoother. We also find that intracellular ends (TM3, TM5, TM6 and TM7) which are relevant for G protein binding and thus receptor signaling, are exposed but smooth. Mapping the values of residual fractal dimension onto receptor 3D structures makes it possible to conclude that the binding sites of orthosteric ligands as well as of cholesterol are characterized with significantly higher roughness than the average for the whole protein. In summary, our study suggests that identification of specific patterns of roughness could be a novel approach to spot possible binding sites which could serve as original drug targets for GPCRs modulation. © The Author(s) 2012.


PubMed | Research Programme on Biomedical Informatics GRIB and Scripps Research Institute
Type: | Journal: Database : the journal of biological databases and curation | Year: 2016

Relations between chemicals and diseases are one of the most queried biomedical interactions. Although expert manual curation is the standard method for extracting these relations from the literature, it is expensive and impractical to apply to large numbers of documents, and therefore alternative methods are required. We describe here a crowdsourcing workflow for extracting chemical-induced disease relations from free text as part of the BioCreative V Chemical Disease Relation challenge. Five non-expert workers on the CrowdFlower platform were shown each potential chemical-induced disease relation highlighted in the original source text and asked to make binary judgments about whether the text supported the relation. Worker responses were aggregated through voting, and relations receiving four or more votes were predicted as true. On the official evaluation dataset of 500 PubMed abstracts, the crowd attained a 0.505F-score (0.475 precision, 0.540 recall), with a maximum theoretical recall of 0.751 due to errors with named entity recognition. The total crowdsourcing cost was $1290.67 ($2.58 per abstract) and took a total of 7 h. A qualitative error analysis revealed that 46.66% of sampled errors were due to task limitations and gold standard errors, indicating that performance can still be improved. All code and results are publicly available athttps://github.com/SuLab/crowd_cid_relexDatabase URL:https://github.com/SuLab/crowd_cid_relex.


PubMed | Research Programme on Biomedical Informatics GRIB
Type: Journal Article | Journal: Journal of molecular modeling | Year: 2012

Protein surface roughness is a structural property associated with ligand-protein and protein-protein binding interfaces. In this work we apply for the first time the concept of surface roughness, expressed as the fractal dimension, to address structure and function of G protein-coupled receptors (GPCRs) which are an important group of drug targets. We calculate the exposure ratio and the fractal dimension for helix-forming residues of the (2) adrenergic receptor ((2)AR), a model system in GPCR studies, in different conformational states: in complex with agonist, antagonist and partial inverse agonists. We show that both exposure ratio and roughness exhibit periodicity which results from the helical structure of GPCRs. The pattern of roughness and exposure ratio of a protein patch depends on its environment: the residues most exposed to membrane are in general most rough whereas parts of receptors mediating interhelical contacts in a monomer or protein complex are much smoother. We also find that intracellular ends (TM3, TM5, TM6 and TM7) which are relevant for G protein binding and thus receptor signaling, are exposed but smooth. Mapping the values of residual fractal dimension onto receptor 3D structures makes it possible to conclude that the binding sites of orthosteric ligands as well as of cholesterol are characterized with significantly higher roughness than the average for the whole protein. In summary, our study suggests that identification of specific patterns of roughness could be a novel approach to spot possible binding sites which could serve as original drug targets for GPCRs modulation.


PubMed | Research Programme on Biomedical Informatics GRIB and Scripps Research Institute
Type: | Journal: Database : the journal of biological databases and curation | Year: 2016

Drug toxicity is a major concern for both regulatory agencies and the pharmaceutical industry. In this context, text-mining methods for the identification of drug side effects from free text are key for the development of up-to-date knowledge sources on drug adverse reactions. We present a new system for identification of drug side effects from the literature that combines three approaches: machine learning, rule- and knowledge-based approaches. This system has been developed to address the Task 3.B of Biocreative V challenge (BC5) dealing with Chemical-induced Disease (CID) relations. The first two approaches focus on identifying relations at the sentence-level, while the knowledge-based approach is applied both at sentence and abstract levels. The machine learning method is based on the BeFree system using two corpora as training data: the annotated data provided by the CID task organizers and a new CID corpus developed by crowdsourcing. Different combinations of results from the three strategies were selected for each run of the challenge. In the final evaluation setting, the system achieved the highest Recall of the challenge (63%). By performing an error analysis, we identified the main causes of misclassifications and areas for improving of our system, and highlighted the need of consistent gold standard data sets for advancing the state of the art in text mining of drug side effects.Database URL: https://zenodo.org/record/29887?lnen#.VsL3yDLWR_V.

Loading Research Programme on Biomedical Informatics GRIB collaborators
Loading Research Programme on Biomedical Informatics GRIB collaborators