Scientific Research Group in Egypt SRGE


Scientific Research Group in Egypt SRGE

Time filter
Source Type

Tharwat A.,Suez Canal University | Tharwat A.,Scientific Research Group in Egypt SRGE | Moemen Y.S.,Scientific Research Group in Egypt SRGE | Moemen Y.S.,Menoufia University | And 2 more authors.
Journal of Biomedical Informatics | Year: 2017

Measuring toxicity is an important step in drug development. Nevertheless, the current experimental methods used to estimate the drug toxicity are expensive and time-consuming, indicating that they are not suitable for large-scale evaluation of drug toxicity in the early stage of drug development. Hence, there is a high demand to develop computational models that can predict the drug toxicity risks. In this study, we used a dataset that consists of 553 drugs that biotransformed in liver. The toxic effects were calculated for the current data, namely, mutagenic, tumorigenic, irritant and reproductive effect. Each drug is represented by 31 chemical descriptors (features). The proposed model consists of three phases. In the first phase, the most discriminative subset of features is selected using rough set-based methods to reduce the classification time while improving the classification performance. In the second phase, different sampling methods such as Random Under-Sampling, Random Over-Sampling and Synthetic Minority Oversampling Technique (SMOTE), BorderLine SMOTE and Safe Level SMOTE are used to solve the problem of imbalanced dataset. In the third phase, the Support Vector Machines (SVM) classifier is used to classify an unknown drug into toxic or non-toxic. SVM parameters such as the penalty parameter and kernel parameter have a great impact on the classification accuracy of the model. In this paper, Whale Optimization Algorithm (WOA) has been proposed to optimize the parameters of SVM, so that the classification error can be reduced. The experimental results proved that the proposed model achieved high sensitivity to all toxic effects. Overall, the high sensitivity of the WOA + SVM model indicates that it could be used for the prediction of drug toxicity in the early stage of drug development. © 2017 Elsevier Inc.

El-Atta A.H.A.,Benha University | El-Atta A.H.A.,Scientific Research Group in Egypt SRGE | Hassanien A.E.,Cairo University | Hassanien A.E.,Scientific Research Group in Egypt SRGE
Information Sciences | Year: 2017

Information and computer science fields such as machine learning and graph theory are implemented in chemoinformatics to discover the properties of chemical compounds. This paper presents a new algorithm based on the two-class support vector machine (SVM) model, which has new kernel functions for paths of features, enabling the prediction of chemical compound activity. Initially, we extract all paths of features (star subgraphs) with certain lengths, and we encode them depending on their structure in the graphs. Then, we use these codes to construct two relationship matrices between those paths. These matrices contain common and different sub-paths between paths of stars. The number of sub-paths/paths for each compound is passed to the proposed kernel functions in the two-class SVM to predict the activity of chemical compounds. The relationship matrices created by the proposed algorithm help to reduce the number of features, which improves prediction accuracy. We apply the proposed algorithm with and without feature selection using two benchmark datasets, specifically, the monoamine oxidase (MAO) dataset and the AIDS antiviral screen database of active compound dataset, which have 68 and 2000 chemical compounds, respectively. We perform comparative experiments for the proposed kernel functions and many other two-class SVM prediction methods, and the results before feature selection show prediction accuracies of 94% and 99.5% for MAO and AIDS, respectively. After selection, the prediction accuracies are 96% and 99.5% for MAO and AIDS, respectively. © 2017 Elsevier Inc.

Amin I.I.,Cairo University | Amin I.I.,Scientific Research Group in Egypt SRGE | Kassim S.K.,Ain Shams University | Hassanien A.E.,Cairo University | And 2 more authors.
International Conference on Intelligent Systems Design and Applications, ISDA | Year: 2012

The main purpose of this paper is to show the use of formal concept analysis (FCA) as data mining approach for mining the common hypermethylated genes between breast cancer subtypes, by extracting formal concepts which representing sets of significant hypermethylated genes for each breast cancer subtypes, then the formal context is built which leading to construct a concept lattice which is composed of formal concepts. This lattice can be used as knowledge discovery and knowledge representation therefore, becoming more interesting for the biologists. © 2012 IEEE.

Amin I.I.,Cairo University | Amin I.I.,Scientific Research Group in Egypt SRGE | Kassim S.K.,Ain Shams University | Hassanien A.E.,Cairo University | And 2 more authors.
Proceedings of the 2013 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2013 | Year: 2013

Hypomethylation of DNA have been associated with cancer in several investigations. Hypomethylated of CPG islands associated with promoters can affect the expression of genes to be more expressed. The Illumina GoldenGate Methylation Cancer Panel I can measure DNA methylation at 1505 CpG loci of 806 cancer related genes. A powerful tools to analysis the DNA methylation data are needed. In this paper, formal concept analysis (FCA) is used as data mining tool for mining the hypomethylated genes among breast cancer subtypes, by building formal concepts with significant hypomethylated genes for each breast cancer subtypes. The concept lattice is constructed based on a formal context which is composed of formal concepts. This lattice reflects the biological relationships among breast cancer subtypes. © 2013 IEEE.

Hafez A.I.,Minia University | Hafez A.I.,Scientific Research Group in Egypt SRGE | Al-Shammari E.T.,Kuwait University | Hassanien A.E.,Scientific Research Group in Egypt SRGE | And 2 more authors.
Studies in Computational Intelligence | Year: 2014

Community detection in complex networks has attracted a lot of attention in recent years. Communities play special roles in the structure-function relationship. Therefore, detecting communities (or modules) can be a way to identify substructures that could correspond to important functions. Community detection can be viewed as an optimization problem in which an objective function that captures the intuition of a community as a group of nodes with better internal connectivity than external connectivity is chosen to be optimized. Many single-objective optimization techniques have been used to solve the detection problem. However, those approaches have drawbacks because they attempt to optimize only one objective function, this results in a solution with a particular community structure property. More recently, researchers have viewed the community detection problem as a multi-objective optimization problem, and many approaches have been proposed. Genetic Algorithms (GA) have been used as an effective optimization technique to solve both single- and multi-objective community detection problems. However, the most appropriate objective functions to be used with each other are still under debate since many similar objective functions have been proposed over the years. We show how those objectives correlate, investigate their performance when they are used in both the single- and multi-objective GA, and determine the community structure properties they tend to produce. © 2014 Springer International Publishing Switzerland.

Azar A.T.,Benha University | Azar A.T.,Scientific Research Group in Egypt SRGE | El-Said S.A.,Zagazig University | Hassanien A.E.,Cairo University
Computer Methods and Programs in Biomedicine | Year: 2013

Thyroid hormones produced by the thyroid gland help regulation of the body's metabolism. A variety of methods have been proposed in the literature for thyroid disease classification. As far as we know, clustering techniques have not been used in thyroid diseases data set so far. This paper proposes a comparison between hard and fuzzy clustering algorithms for thyroid diseases data set in order to find the optimal number of clusters. Different scalar validity measures are used in comparing the performances of the proposed clustering systems. To demonstrate the performance of each algorithm, the feature values that represent thyroid disease are used as input for the system. Several runs are carried out and recorded with a different number of clusters being specified for each run (between 2 and 11), so as to establish the optimum number of clusters. To find the optimal number of clusters, the so-called elbow criterion is applied. The experimental results revealed that for all algorithms, the elbow was located at c= 3. The clustering results for all algorithms are then visualized by the Sammon mapping method to find a low-dimensional (normally 2D or 3D) representation of a set of points distributed in a high dimensional pattern space. At the end of this study, some recommendations are formulated to improve determining the actual number of clusters present in the data set. © 2013 Elsevier Ireland Ltd.

Hassanien A.E.,Cairo University | Hassanien A.E.,Scientific Research Group in Egypt SRGE | Al-Shammari E.T.,Kuwait University | Ghali N.I.,Al - Azhar University of Egypt | Ghali N.I.,Scientific Research Group in Egypt SRGE
Computational Biology and Chemistry | Year: 2013

Computational intelligence (CI) is a well-established paradigm with current systems having many of the characteristics of biological computers and capable of performing a variety of tasks that are difficult to do using conventional techniques. It is a methodology involving adaptive mechanisms and/or an ability to learn that facilitate intelligent behavior in complex and changing environments, such that the system is perceived to possess one or more attributes of reason, such as generalization, discovery, association and abstraction. The objective of this article is to present to the CI and bioinformatics research communities some of the state-of-the-art in CI applications to bioinformatics and motivate research in new trend-setting directions. In this article, we present an overview of the CI techniques in bioinformatics. We will show how CI techniques including neural networks, restricted Boltzmann machine, deep belief network, fuzzy logic, rough sets, evolutionary algorithms (EA), genetic algorithms (GA), swarm intelligence, artificial immune systems and support vector machines, could be successfully employed to tackle various problems such as gene expression clustering and classification, protein sequence classification, gene selection, DNA fragment assembly, multiple sequence alignment, and protein function prediction and its structure. We discuss some representative methods to provide inspiring examples to illustrate how CI can be utilized to address these problems and how bioinformatics data can be characterized by CI. Challenges to be addressed and future directions of research are also presented and an extensive bibliography is included. © 2013 Elsevier Ltd.

Ayeldeen H.,Cairo University | Hassanien A.E.,Cairo University | Fahmy A.A.,Scientific Research Group in Egypt SRGE
ICET 2014 - 2nd International Conference on Engineering and Technology | Year: 2015

Knowledge exaction and text representation are considered as the main concepts concerning organizations nowadays. The estimation of the semantic similarity between words provides a valuable method to enable the understanding of texts. In the field of biomedical domains, using Ontologies have been very effective due to their scalability and efficiency. In this paper, we aim to cluster and classify medical thesis data to better discover the commonalities between theses data and hence, improve the accuracy of the similarity estimation which in return improves the scientific research sector. Experimental evaluations using 4,878 theses data set in the medical sector at Cairo University indicate that the proposed approach yields results that correlate more closely with human assessments than other by using the standard ontology (MeSH). Two different algorithms were used; the first is Lexical similarity and then applying K-means clustering and the second is fuzzy Euclidean distance clustering algorithm after using MeSH ontology on medical theses data for better categorization of the keywords within the data. © 2014 IEEE.

Azar A.T.,Benha University | Elshazly H.I.,Cairo University | Elshazly H.I.,Scientific Research Group in Egypt SRGE | Hassanien A.E.,Cairo University | And 2 more authors.
Computer Methods and Programs in Biomedicine | Year: 2014

Machine learning-based classification techniques provide support for the decision-making process in many areas of health care, including diagnosis, prognosis, screening, etc. Feature selection (FS) is expected to improve classification performance, particularly in situations characterized by the high data dimensionality problem caused by relatively few training examples compared to a large number of measured features. In this paper, a random forest classifier (RFC) approach is proposed to diagnose lymph diseases. Focusing on feature selection, the first stage of the proposed system aims at constructing diverse feature selection algorithms such as genetic algorithm (GA), Principal Component Analysis (PCA), Relief-F, Fisher, Sequential Forward Floating Search (SFFS) and the Sequential Backward Floating Search (SBFS) for reducing the dimension of lymph diseases dataset. Switching from feature selection to model construction, in the second stage, the obtained feature subsets are fed into the RFC for efficient classification. It was observed that GA-RFC achieved the highest classification accuracy of 92.2%. The dimension of input feature space is reduced from eighteen to six features by using GA. © 2013 Elsevier Ireland Ltd.

Azar A.T.,Benha University | Hassanien A.E.,Cairo University | Hassanien A.E.,Scientific Research Group in Egypt SRGE
Soft Computing | Year: 2015

Massive and complex data are generated every day in many fields. Complex data refer to data sets that are so large that conventional database management and data analysis tools are insufficient to deal with them. Managing and analysis of medical big data involve many different issues regarding their structure, storage and analysis. In this paper, linguistic hedges neuro-fuzzy classifier with selected features (LHNFCSF) is presented for dimensionality reduction, feature selection and classification. Four real-world data sets are provided to demonstrate the performance of the proposed neuro-fuzzy classifier. The new classifier is compared with the other classifiers for different classification problems. The results indicated that applying LHNFCSF not only reduces the dimensions of the problem, but also improves classification performance by discarding redundant, noise-corrupted, or unimportant features. The results strongly suggest that the proposed method not only help reducing the dimensionality of large data sets but also can speed up the computation time of a learning algorithm and simplify the classification tasks. © 2014, Springer-Verlag Berlin Heidelberg.

Loading Scientific Research Group in Egypt SRGE collaborators
Loading Scientific Research Group in Egypt SRGE collaborators