Time filter

Source Type

West Haven, CT, United States

Garla V.N.,Yale University | Brandt C.,Connecticut VA Healthcare System
Proceedings - 2012 IEEE 2nd Conference on Healthcare Informatics, Imaging and Systems Biology, HISB 2012 | Year: 2012

Motivation: Word Sense Disambiguation (WSD) methods automatically assign an unambiguous concept to an ambiguous term based on context, and are important to many text processing tasks. In this study, we developed and evaluated a knowledge-based WSD method that uses semantic similarity measures derived from the Unified Medical Language System (UMLS), and we evaluated the contribution of WSD to clinical text classification. Results: We evaluated our system on biomedical WSD datasets; our system compares favorably to other knowledge-based methods. We evaluated the contribution of our WSD system to clinical document classification on the 2007 Computational Medicine Challenge corpus. Machine learning classifiers trained on disambiguated concepts significantly outperformed those trained using all concepts. Availability: We integrated our WSD system with MetaMap and cTAKES, two popular biomedical natural language processing systems. We released all code required to reproduce our results and all tools developed as part of this study as open source, available under http://code.google.com/p/ytex. © 2012 IEEE. Source

Garla V.,Yale University | Taylor C.,Connecticut VA Healthcare System | Brandt C.,Yale University
Journal of Biomedical Informatics | Year: 2013

Objective: To compare linear and Laplacian SVMs on a clinical text classification task; to evaluate the effect of unlabeled training data on Laplacian SVM performance. Background: The development of machine-learning based clinical text classifiers requires the creation of labeled training data, obtained via manual review by clinicians. Due to the effort and expense involved in labeling data, training data sets in the clinical domain are of limited size. In contrast, electronic medical record (EMR) systems contain hundreds of thousands of unlabeled notes that are not used by supervised machine learning approaches. Semi-supervised learning algorithms use both labeled and unlabeled data to train classifiers, and can outperform their supervised counterparts. Methods: We trained support vector machines (SVMs) and Laplacian SVMs on a training reference standard of 820 abdominal CT, MRI, and ultrasound reports labeled for the presence of potentially malignant liver lesions that require follow up (positive class prevalence 77%). The Laplacian SVM used 19,845 randomly sampled unlabeled notes in addition to the training reference standard. We evaluated SVMs and Laplacian SVMs on a test set of 520 labeled reports. Results: The Laplacian SVM trained on labeled and unlabeled radiology reports significantly outperformed supervised SVMs (Macro-F1 0.773 vs. 0.741, Sensitivity 0.943 vs. 0.911, Positive Predictive value 0.877 vs. 0.883). Performance improved with the number of labeled and unlabeled notes used to train the Laplacian SVM (pearson's ρ= 0.529 for correlation between number of unlabeled notes and macro-F1 score). These results suggest that practical semi-supervised methods such as the Laplacian SVM can leverage the large, unlabeled corpora that reside within EMRs to improve clinical text classification. © 2013 Elsevier Inc. Source

Garla V.N.,Yale University | Brandt C.,Connecticut VA Healthcare System | Brandt C.,Yale University
Journal of Biomedical Informatics | Year: 2012

In this study we present novel feature engineering techniques that leverage the biomedical domain knowledge encoded in the Unified Medical Language System (UMLS) to improve machine-learning based clinical text classification. Critical steps in clinical text classification include identification of features and passages relevant to the classification task, and representation of clinical text to enable discrimination between documents of different classes. We developed novel information-theoretic techniques that utilize the taxonomical structure of the Unified Medical Language System (UMLS) to improve feature ranking, and we developed a semantic similarity measure that projects clinical text into a feature space that improves classification. We evaluated these methods on the 2008 Integrating Informatics with Biology and the Bedside (I2B2) obesity challenge. The methods we developed improve upon the results of this challenge's top machine-learning based system, and may improve the performance of other machine-learning based clinical text classification systems. We have released all tools developed as part of this study as open source, available at http://code.google.com/p/ytex. © 2012 Elsevier Inc.. Source

Farber S.,Connecticut VA Healthcare System | Farber S.,Yale University | Tate J.,Connecticut VA Healthcare System | Tate J.,Yale University | And 8 more authors.
AIDS and Behavior | Year: 2013

The role of financial incentives in HIV care is not well studied. We conducted a single-site study of monetary incentives for viral load suppression, using each patient as his own control. The incentive size ($100/quarter) was designed to be cost-neutral, offsetting estimated downstream costs averted through reduced HIV transmission. Feasibility outcomes were clinic workflow, patient acceptability, and patient comprehension. Although the study was not powered for effectiveness, we also analyzed viral load suppression. Of 80 eligible patients, 77 consented, and 69 had 12 month follow-up. Feasibility outcomes showed minimal impact on patient workflow, near-unanimous patient acceptability, and satisfactory patient comprehension. Among individuals with detectable viral loads pre-intervention, the proportion of undetectable viral load tests increased from 57 to 69 % before versus after the intervention. It is feasible to use financial incentives to reward ART adherence, and to specify the incentive by requiring cost-neutrality and targeting biological outcomes. © 2013 The Author(s). Source

Garla V.,Yale University | Re III. V.L.,University of Pennsylvania | Dorey-Stein Z.,University of Pennsylvania | Kidwai F.,Connecticut VA Healthcare System | And 8 more authors.
Journal of the American Medical Informatics Association | Year: 2011

Background: Open-source clinical natural-languageprocessing (NLP) systems have lowered the barrier to the development of effective clinical document classification systems. Clinical natural-languageprocessing systems annotate the syntax and semantics of clinical text; however, feature extraction and representation for document classification pose technical challenges. Methods: The authors developed extensions to the clinical Text Analysis and Knowledge Extraction System (cTAKES) that simplify feature extraction, experimentation with various feature representations, and the development of both rule and machine-learning based document classifiers. The authors describe and evaluate their system, the Yale cTAKES Extensions (YTEX), on the classification of radiology reports that contain findings suggestive of hepatic decompensation. Results and discussion: The F 1-Score of the system for the retrieval of abdominal radiology reports was 96%, and was 79%, 91%, and 95% for the presence of liver masses, ascites, and varices, respectively. The authors released YTEX as open source, available at http://code.google.com/p/ytex. Source

Discover hidden collaborations