Entity

Time filter

Source Type


McCart J.A.,Consortium for Healthcare Informatics Research CHIR | Berndt D.J.,University of South Florida | Jarman J.,East Tennessee State University | Finch D.K.,Consortium for Healthcare Informatics Research CHIR | Luther S.L.,Consortium for Healthcare Informatics Research CHIR
Journal of the American Medical Informatics Association | Year: 2013

Objective: To determine how well statistical text mining (STM) models can identify falls within clinical text associated with an ambulatory encounter. Materials and Methods: 2241 patients were selected with a fall-related ICD-9-CM E-code or matched injury diagnosis code while being treated as an outpatient at one of four sites within the Veterans Health Administration. All clinical documents within a 48-h window of the recorded E-code or injury diagnosis code for each patient were obtained (n=26 010; 611 distinct document titles) and annotated for falls. Logistic regression, support vector machine, and cost-sensitive support vector machine (SVM-cost) models were trained on a stratified sample of 70% of documents from one location (dataset Atrain) and then applied to the remaining unseen documents (datasets Atest-D). Results: All three STM models obtained area under the receiver operating characteristic curve (AUC) scores above 0.950 on the four test datasets (Atest-D). The SVM-cost model obtained the highest AUC scores, ranging from 0.953 to 0.978. The SVM-cost model also achieved F-measure values ranging from 0.745 to 0.853, sensitivity from 0.890 to 0.931, and specificity from 0.877 to 0.944. Discussion: The STM models performed well across a large heterogeneous collection of document titles. In addition, the models also generalized across other sites, including a traditionally bilingual site that had distinctly different grammatical patterns. Conclusions: The results of this study suggest STM-based models have the potential to improve surveillance of falls. Furthermore, the encouraging evidence shown here that STM is a robust technique for mining clinical documents bodes well for other surveillance-related topics. Source


Luther S.,Consortium for Healthcare Informatics Research CHIR | Berndt D.,Consortium for Healthcare Informatics Research CHIR | Berndt D.,University of South Florida | Finch D.,Consortium for Healthcare Informatics Research CHIR | And 5 more authors.
Journal of Biomedical Informatics | Year: 2011

Statistical text mining was used to supplement efforts to develop a clinical vocabulary for post-traumatic stress disorder (PTSD) in the VA. A set of outpatient progress notes was collected for a cohort of 405 unique veterans with PTSD and a comparison group of 392 with other psychological conditions at one VA hospital. Two methods were employed: (1) "multi-model term scoring" used stepwise logistic regression to develop 21 separate models by varying three frequency weight and seven term weight options and (2) "iterative term refinement" which used a standard stop list followed by clinical review to eliminate non-clinical terms and terms not related to PTSD. Combined results of the two methods were reviewed by two clinicians resulting in 226 unique PTSD related terms. Results of the statistical text mining methods were compared with ongoing efforts to identify terms based on literature review, focus groups with clinicians treating PTSD and review of an existing vocabulary, lending support to the contributions of the STM analyses. © 2011. Source

Discover hidden collaborations