Health Discovery | Date: 2012-03-12
Gene expression data are analyzed using learning machines such as support vector machines (SVM) and ridge regression classifiers to rank genes according to their ability to distinguish between BPH (benign prostatic hyperplasia) and all other conditions. Results are provided showing the correlation of results obtained using data from two independent studies that took place at different times using different microarrays. Genes are ranked according to area-under-the-curve, false discovery rate and fold change.
Health Discovery | Date: 2011-04-04
A method for enhancing knowledge discovery from a dataset uses visualization of a subset features within a dataset that provide the best separation of the dataset into classes. One or more classifiers are trained using each subset of features and the success rate of the classifiers in accurately classifying the dataset is calculated. The success rate is converted into a ranking that is represented as a visually distinguishable characteristic. One or more tree structures may be displayed with a node representing each feature, and the visually distinguishable characteristic is used to indicate the scores for each feature subset. Connectors between the nodes may be used to indicate unconstrained and constrained feature sets. Nodes within a constrained path may be substituted for a feature within the preferred, unconstrained path if that feature is impractical to measure.
Health Discovery | Date: 2013-06-19
A system and method for computer-assisted karyotyping includes a processor which receives a digitized image of metaphase chromosomes for processing in an image processing module and a classifier module. The image processing module may include a segmenting function for extracting individual chromosome images, a bend correcting function for straightening images of chromosomes that are bent or curved and a feature selection function for distinguishing between chromosome bands. The classifier module, which may be one or more trained kernel-based learning machines, receives the processed image and generates a classification of the image as normal or abnormal.
Health Discovery | Date: 2011-02-02
A method is provided for unsupervised clustering of gene expression data to identify co-regulation patterns. A clustering algorithm randomly divides the data into k different subsets and measures the similarity between pairs of datapoints within the subsets, assigning a score to the pairs based on similarity, with the greatest similarity giving the highest correlation score. A distribution of the scores is plotted for each k. The highest value of k that has a distribution that remains concentrated near the highest correlation score corresponds to the number of co-regulation patterns.
Health Discovery | Date: 2010-02-04
Support vector machines are used to classify data contained within a structured dataset such as a plurality of signals generated by a spectral analyzer. The signals are pre-processed to ensure alignment of peaks across the spectra. Similarity measures are constructed to provide a basis for comparison of pairs of samples of the signal. A support vector machine is trained to discriminate between different classes of the samples. to identify the most predictive features within the spectra. In a preferred embodiment feature selection is performed to reduce the number of features that must be considered.