Lister Hill National Center for Biomedical Communications

Bethesda, United States

Lister Hill National Center for Biomedical Communications

Bethesda, United States
SEARCH FILTERS
Time filter
Source Type

De Herrera A.G.S.,Lister Hill National Center for Biomedical Communications | Schaer R.,University of Applied Sciences and Arts Western Switzerland | Bromuri S.,Open University of the Netherlands | Muller H.,University of Applied Sciences and Arts Western Switzerland
CEUR Workshop Proceedings | Year: 2016

ImageCLEF is the image retrieval task of the Conference and Labs of the Evaluation Forum (CLEF). ImageCLEF has historically focused on the multimodal and language{independent retrieval of images. Many tasks are related to image classification and the annotation of image data as well. The medical task has focused more on image retrieval in the beginning and then retrieval and classification tasks in subsequent years. In 2016 a main focus was the creation of meta data for a collection of medical images taken from articles of the the biomedical scientific literature. In total 8 teams participated in the four tasks and 69 runs were submitted. No team participated in the caption prediction task, a totally new task. Deep learning has now been used for several of the ImageCLEF tasks and by many of the participants obtaining very good results. A majority of runs was submitting using deep learning and this follows general trends in machine learning. In several of the tasks multimodal approaches clearly led to best results.


Winnenburg R.,Lister Hill National Center for Biomedical Communications | Bodenreider O.,Lister Hill National Center for Biomedical Communications
Journal of Biomedical Semantics | Year: 2014

Background: The objective of this study is to develop a framework for assessing the consistency of drug classes across sources, such as MeSH and ATC. Our framework integrates and contrasts lexical and instance-based ontology alignment techniques. Moreover, we propose metrics for assessing not only equivalence relations, but also inclusion relations among drug classes. Results: We identified 226 equivalence relations between MeSH and ATC classes through the lexical alignment, and 223 through the instance-based alignment, with limited overlap between the two (36). We also identified 6,257 inclusion relations. Discrepancies between lexical and instance-based alignments are illustrated and discussed. Conclusions: Our work is the first attempt to align drug classes with sophisticated instance-based techniques, while also distinguishing between equivalence and inclusion relations. Additionally, it is the first application of aligning drug classes in ATC and MeSH. By providing a detailed account of similarities and differences between drug classes across sources, our framework has the prospect of effectively supporting the creation of a mapping of drug classes between ATC and MeSH by domain experts. © 2014 Winnenburg and Bodenreider.; licensee BioMed Central Ltd.


Gallagher M.E.,Lister Hill National Center for Biomedical Communications
Archiving 2013 - Final Program and Proceedings | Year: 2013

The Profiles in Science® digital library features digitized surrogates of historical items selected from the archival collections of the U.S. National Library of Medicine as well as collaborating institutions. In addition, it contains a database of descriptive, technical and administrative metadata. It also contains various software components that allow creation of the metadata, management of the digital items, and access to the items and metadata through the Profiles in Science Web site [1]. The choices made building the digital library were designed to maximize the sustainability and long-term survival of all of the components of the digital library [2]. For example, selecting standard and open digital file formats rather than proprietary formats increases the sustainability of the digital files [3]. Correspondingly, using non-proprietary software may improve the sustainability of the software - either through in-house expertise or through the open source community. Limiting our digital library software exclusively to open source software or to software developed in-house has not been feasible. For example, we have used proprietary operating systems, scanning software, a search engine, and office productivity software. We did this when either lack of essential capabilities or the cost-benefit trade-off favored using proprietary software. We also did so knowing that in the future we would need to replace or upgrade some of our proprietary software, analogous to migrating from an obsolete digital file format to a new format as the technological landscape changes. Since our digital library's start in 1998, all of its software has been upgraded or replaced, but the digitized items have not yet required migration to other formats. Technological changes that compelled us to replace proprietary software included the cost of product licensing, product support, incompatibility with other software, prohibited use due to evolving security policies, and product abandonment. Sometimes these changes happen on short notice, so we continually monitor our library's software for signs of endangerment. We have attempted to replace proprietary software with suitable in-house or open source software. When the replacement involves a standalone piece of software with a nearly equivalent version, such as replacing a commercial HTTP server with an open source HTTP server, the replacement is straightforward. Recently we replaced software that functioned not only as our search engine but also as the backbone of the architecture of our Web site. In this paper, we describe the lessons learned and the pros and cons of replacing this software with open source software. © Copyright 2013; Society for Imaging Science and Technology.


Bekhuis T.,University of Pittsburgh | Demner-Fushman D.,Lister Hill National Center for Biomedical Communications
Studies in Health Technology and Informatics | Year: 2010

Systematic review authors synthesize research to guide clinicians in their practice of evidence-based medicine. Teammates independently identify provisionally eligible studies by reading the same set of hundreds and sometimes thousands of citations during an initial screening phase. We investigated whether supervised machine learning methods can potentially reduce their workload. We also extended earlier research by including observational studies of a rare condition. To build training and test sets, we used annotated citations from a search conducted for an in-progress Cochrane systematic review. We extracted features from titles, abstracts, and metadata, then trained, optimized, and tested several classifiers with respect to mean performance based on 10-fold cross-validations. In the training condition, the evolutionary support vector machine (EvoSVM) with an Epanechnikov or radial kernel is the best classifier: mean recall=100%; mean precision=48% and 41%, respectively. In the test condition, EvoSVM performance degrades: mean recall=77%, mean precision ranges from 26% to 37%. Because near-perfect recall is essential in this context, we conclude that supervised machine learning methods may be useful for reducing workload under certain conditions. © 2010 IMIA and SAHIA. All rights reserved.


Chen G.,Lister Hill National Center for Biomedical Communications | Cairelli M.J.,Lister Hill National Center for Biomedical Communications | Kilicoglu H.,Lister Hill National Center for Biomedical Communications | Shin D.,Lister Hill National Center for Biomedical Communications | Rindflesch T.C.,Lister Hill National Center for Biomedical Communications
PLoS Computational Biology | Year: 2014

Gene regulatory networks are a crucial aspect of systems biology in describing molecular mechanisms of the cell. Various computational models rely on random gene selection to infer such networks from microarray data. While incorporation of prior knowledge into data analysis has been deemed important, in practice, it has generally been limited to referencing genes in probe sets and using curated knowledge bases. We investigate the impact of augmenting microarray data with semantic relations automatically extracted from the literature, with the view that relations encoding gene/protein interactions eliminate the need for random selection of components in non-exhaustive approaches, producing a more accurate model of cellular behavior. A genetic algorithm is then used to optimize the strength of interactions using microarray data and an artificial neural network fitness function. The result is a directed and weighted network providing the individual contribution of each gene to its target. For testing, we used invasive ductile carcinoma of the breast to query the literature and a microarray set containing gene expression changes in these cells over several time points. Our model demonstrates significantly better fitness than the state-of-the-art model, which relies on an initial random selection of genes. Comparison to the component pathways of the KEGG Pathways in Cancer map reveals that the resulting networks contain both known and novel relationships. The p53 pathway results were manually validated in the literature. 60% of non-KEGG relationships were supported (74% for highly weighted interactions). The method was then applied to yeast data and our model again outperformed the comparison model. Our results demonstrate the advantage of combining gene interactions extracted from the literature in the form of semantic relations with microarray analysis in generating contribution-weighted gene regulatory networks. This methodology can make a significant contribution to understanding the complex interactions involved in cellular behavior and molecular physiology.


Kilicoglu H.,Lister Hill National Center for Biomedical Communications | Shin D.,Lister Hill National Center for Biomedical Communications | Fiszman M.,Lister Hill National Center for Biomedical Communications | Rosemblat G.,Lister Hill National Center for Biomedical Communications | Rindflesch T.C.,Lister Hill National Center for Biomedical Communications
Bioinformatics | Year: 2012

Effective access to the vast biomedical knowledge present in the scientific literature is challenging. Semantic relations are increasingly used in knowledge management applications supporting biomedical research to help address this challenge. We describe SemMedDB, a repository of semantic predications (subject-predicate-object triples) extracted from the entire set of PubMed citations. We propose the repository as a knowledge resource that can assist in hypothesis generation and literature-based discovery in biomedicine as well as in clinical decision-making support. © The Author 2012. Published by Oxford University Press. All rights reserved.


Simpson M.S.,Lister Hill National Center for Biomedical Communications
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium | Year: 2012

Image content is frequently the target of biomedical information extraction systems. However, the meaning of this content cannot be easily understood without some associated text. In order to improve the integration of textual and visual information, we are developing a visual ontology for biomedical image retrieval. Our visual ontology maps the appearance of image regions to concepts in an existing textual ontology, thereby inheriting relationships among the visual entities. Such a resource creates a bridge between the visual characteristics of important image regions and their semantic interpretation. We automatically populate our visual ontology by pairing image regions with their associated descriptions. To demonstrate the usefulness of this resource, we have developed a classification method that automatically labels image regions with appropriate concepts based solely on their appearance. Our results for thoracic imaging terms show that our methods are promising first steps towards the creation of a biomedical visual ontology.


Bekhuis T.,University of Pittsburgh | Demner-Fushman D.,Lister Hill National Center for Biomedical Communications | Crowley R.,University of Pittsburgh
Journal of the Medical Library Association | Year: 2013

Objectives: We analyzed the extent to which comparative effectiveness research (CER) organizations share terms for designs, analyzed coverage of CER designs in Medical Subject Headings (MeSH) and Emtree, and explored whether scientists use CER design terms. Methods: We developed local terminologies (LTs) and a CER design terminology by extracting terms in documents from five organizations. We defined coverage as the distribution over match type in MeSH and Emtree. We created a crosswalk by recording terms to which design terms mapped in both controlled vocabularies. We analyzed the hits for queries restricted to titles and abstracts to explore scientists' language. Results: Pairwise LT overlap ranged from 22.64% (12/53) to 75.61% (31/41). The CER design terminology (n578 terms) consisted of terms for primary study designs and a few terms useful for evaluating evidence, such as opinion paper and systematic review. Patterns of coverage were similar in MeSH and Emtree (gamma50.581, P50.002). Conclusions: Stakeholder terminologies vary, and terms are inconsistently covered in MeSH and Emtree. The CER design terminology and crosswalk may be useful for expert searchers. For partially mapped terms, queries could consist of free text for modifiers such as nonrandomized or interrupted added to broad or related controlled terms.


Abhyankar S.,Lister Hill National Center for Biomedical Communications | Demner-Fushman D.,Lister Hill National Center for Biomedical Communications | McDonald C.J.,Lister Hill National Center for Biomedical Communications
Journal of Biomedical Informatics | Year: 2012

Clinical databases provide a rich source of data for answering clinical research questions. However, the variables recorded in clinical data systems are often identified by local, idiosyncratic, and sometimes redundant and/or ambiguous names (or codes) rather than unique, well-organized codes from standard code systems. This reality discourages research use of such databases, because researchers must invest considerable time in cleaning up the data before they can ask their first research question. Researchers at MIT developed MIMIC-II, a nearly complete collection of clinical data about intensive care patients. Because its data are drawn from existing clinical systems, it has many of the problems described above. In collaboration with the MIT researchers, we have begun a process of cleaning up the data and mapping the variable names and codes to LOINC codes. Our first step, which we describe here, was to map all of the laboratory test observations to LOINC codes. We were able to map 87% of the unique laboratory tests that cover 94% of the total number of laboratory tests results. Of the 13% of tests that we could not map, nearly 60% were due to test names whose real meaning could not be discerned and 29% represented tests that were not yet included in the LOINC table. These results suggest that LOINC codes cover most of laboratory tests used in critical care. We have delivered this work to the MIMIC-II researchers, who have included it in their standard MIMIC-II database release so that researchers who use this database in the future will not have to do this work. © 2012 .


Yepes A.J.,Lister Hill National Center for Biomedical Communications
BMC bioinformatics | Year: 2013

Most of the institutional and research information in the biomedical domain is available in the form of English text. Even in countries where English is an official language, such as the United States, language can be a barrier for accessing biomedical information for non-native speakers. Recent progress in machine translation suggests that this technique could help make English texts accessible to speakers of other languages. However, the lack of adequate specialized corpora needed to train statistical models currently limits the quality of automatic translations in the biomedical domain. We show how a large-sized parallel corpus can automatically be obtained for the biomedical domain, using the MEDLINE database. The corpus generated in this work comprises article titles obtained from MEDLINE and abstract text automatically retrieved from journal websites, which substantially extends the corpora used in previous work. After assessing the quality of the corpus for two language pairs (English/French and English/Spanish) we use the Moses package to train a statistical machine translation model that outperforms previous models for automatic translation of biomedical text. We have built translation data sets in the biomedical domain that can easily be extended to other languages available in MEDLINE. These sets can successfully be applied to train statistical machine translation models. While further progress should be made by incorporating out-of-domain corpora and domain-specific lexicons, we believe that this work improves the automatic translation of biomedical texts.

Loading Lister Hill National Center for Biomedical Communications collaborators
Loading Lister Hill National Center for Biomedical Communications collaborators