Benghabrit A.,Moulay Ismai University |
Ouhbi B.,Moulay Ismai University |
Zemmouri E.M.,Moulay Ismai University |
Frikh B.,University Sidi Mohammed Ben Abdellah |
Behja H.,Hassan University
International Conference on Next Generation Networks and Services, NGNS | Year: 2014
Knowing that not all the features in a dataset are important since some are redundant or irrelevant, the use of feature selection, an effective dimensionality reduction technique, is essential for web document clustering. For the clustering process, it represents the task of selecting important features for the underlying clusters. Therefore in order to pilot the web document clustering process, we propose a hybrid feature selection algorithm that selects simultaneously the most statistical and semantic informative features through a weighting model. The clustering process selects relevant features and performs document clustering iteratively until stability. The experimental results demonstrate the practical aspects of our algorithm and show that it generates more efficient clustering than the one obtained by other existing algorithms. © 2014 IEEE.