Bradford R.,Agilex Technologies
Communications in Computer and Information Science | Year: 2015
Latent semantic indexing (LSI) is a well-established technique that provides broad capabilities for information search, categorization, clustering, and discovery. However, there are some limitations that are encountered in using the technique. One such limitation is that the classical implementation of LSI does not provide a flexible mechanism for dealing with phrases. In the standard implementation of LSI, the only way that a phrase can be used as a whole in a query is if that phrase has been identified a priori and treated as a unit during the process of creating the LSI index. This requirement has greatly hindered the use of phrases in LSI applications. This paper presents a method for dealing with phrases in LSI-based information systems on an ad hoc basis – at query time, without requiring any prior knowledge of the phrases of interest. The approach is fast enough to be used during real-time query execution. © Springer International Publishing Switzerland 2015.
Bradford R.B.,Agilex Technologies
ISI 2010 - 2010 IEEE International Conference on Intelligence and Security Informatics: Public Safety and Security | Year: 2010
Automated extraction of named entities is an important text analysis task. In addition to recognizing the occurrence of entity names, it is important to be able to label those names by type. Most entity extraction techniques categorize extracted entities into a few basic types, such as PERSON, ORGANIZATION, and LOCATION. This paper presents an approach for generating more fine-grained subdivisions of entity type. The technique of latent semantic indexing (LSI) is used to provide semantic context as an indicator of likely entity subtype. Tests were carried out on a collection of 5.5 million English-language news articles. At modest levels of recall, the accuracy of sub-type assignment was comparable to the accuracy with which the gross type was assigned by a state-of-the-art commercial entity extraction software package. © 2010 IEEE.
Birisan M.,Agilex Technologies |
Beling P.A.,University of Virginia
Environment Systems and Decisions | Year: 2014
This paper proposes an image filtering and retrieval system driven by the multi-instance learning (MIL) algorithm. This system is aimed at improving the mission effectiveness of human analysts in searching through imagery for environmental, defense, or other purposes. Thus, the system is tuned and the experimental results are measured in terms of the true positive rate in predicted labels. While MIL has been used in image retrieval before, this paper examines how different tasks and feature spaces impact the performance of the algorithm. Images are translated into the single blob with neighbors (SBN) feature space, a novel feature space called color, texture, and shape (CTS), and a combined SBN and CTS feature space, for processing by the MIL algorithm. The paper introduces a feature space selection step in the classification process and shows that the true positive rate can be increased through the addition of this step. © 2014 Springer Science+Business Media New York.
Bradford R.B.,Agilex Technologies
IEEE ISI 2013 - 2013 IEEE International Conference on Intelligence and Security Informatics: Big Data, Emergent Threats, and Decision-Making in Security Informatics | Year: 2013
In many intelligence and security informatics applications, named entities constitute a particularly important element of queries and analytic operations. In such applications, variations in the rendering of entity names present a pervasive problem. The problem is most frequently encountered when dealing with names of persons. For person names, a wide variety of factors may lead to variations: use of nicknames, differences in given name / surname order, misspellings, phonetic renderings, use of different transliteration systems, etc. Historically, a number of methods have been developed for generating possible name variants. Most of these have been based on phonetic similarities, edit distance, or longest common substrings. However, in general, the larger the data collection, the less effective these techniques are. This paper presents an approach to attaining both high precision and high recall for name variant identification in large text collections. The approach exploits the technique of latent semantic indexing (LSI). In this approach, the contextual information provided by LSI allows likely true variants to be selected from multiple candidate variants generated by other techniques. This significantly improves the precision of candidate name variant results. This paper describes a basic LSI-augmented approach to name variant identification, as well as a new approach that yields additional precision improvements. © 2013 IEEE.
Gurram P.,U.S. Army |
Kwon H.,U.S. Army |
Han T.,Agilex Technologies
IEEE Geoscience and Remote Sensing Letters | Year: 2012
In this letter, a novel ensemble-learning approach for anomaly detection is presented. The proposed technique aims to optimize an ensemble of kernel-based one-class classifiers, such as support vector data description (SVDD) classifiers, by estimating optimal sparse weights of the subclassifiers. In this method, the features of a given multivariate data set representing normalcy are first randomly subsampled into a large number of feature subspaces. An enclosing hypersphere that defines the support of the normalcy data in the reproducing kernel Hilbert space (RKHS) of each respective feature subspace is estimated using standard SVDD. The joint hypersphere in the RKHS of the combined kernel is learned by optimally combining the weighted individual kernels while imposing the l1 constraint on the combining weights. The joint hypersphere representing the optimal compact support of the multivariate data in the joint RKHS is then used to test a new data point to determine if it belongs to the normalcy data or not. A performance comparison between the proposed algorithm and regular SVDD is reported using hyperspectral image data as well as general multivariate data. © 2012 IEEE.