Sponsored Projects
Project 1 : Developing Music-Based Application using EEG Plots of Human Brain As Input
Publications
Nature is a great source of inspiration for solving complex problems in real-world. In this paper, a hybrid nature-inspired algorithm is proposed for feature selection problem. Traditionally, the real-world datasets contain all kinds of features informative as well as non-informative. These features not only increase computational complexity of the underlying algorithm but also deteriorate its performance. Hence, there an urgent need of feature selection method that select an informative subset of features from high dimensional without compromising the performance of the underlying algorithm. In this paper, we select an informative subset of features and perform cluster analysis by employing a cross breed approach of binary particle swarm optimization (BPSO) and sine cosine algorithm (SCA) named as hybrid binary particle swarm optimization and sine cosine algorithm (HBPSOSCA). Here, we employ a V-shaped transfer function to compute the likelihood of changing position for all particles. First, the effectiveness of the proposed method is tested on ten benchmark test functions. Second, the HBPSOSCA is used for data clustering problem on seven real-life datasets taken from the UCI machine learning store and gene expression model selector. The performance of proposed method is tested in comparison to original BPSO, modified BPSO with chaotic inertia weight (C-BPSO), binary moth flame optimization algorithm, binary dragonfly algorithm, binary whale optimization algorithm, SCA, and binary artificial bee colony algorithm. The conducted analysis demonstrates that the proposed method HBPSOSCA attain better performance in comparison to the competitive methods in most of the cases.
Graphical abstractDisplay Omitted HighlightsA feature selection method based on binary particle swarm optimization is presented.Fitness based adaptive inertia weight is integrated with the binary particle swarm optimization to dynamically control the exploration and exploitation of the particle in the search space.Opposition and mutation are integrated with the binary particle swarm optimization improve it's search capability.The performance of the clustering algorithm improves with the features selected by proposed method. Due to the ever increasing number of documents in the digital form, automated text clustering has become a promising method for the text analysis in last few decades. A major issue in the text clustering is high dimensionality of the feature space. Most of these features are irrelevant, redundant, and noisy that mislead the underlying algorithm. Therefore, feature selection is an essential step in the text clustering to reduce dimensionality of the feature space and to improve accuracy of the underlying clustering algorithm. In this paper, a hybrid intelligent algorithm, which combines the binary particle swarm optimization (BPSO) with opposition-based learning, chaotic map, fitness based dynamic inertia weight, and mutation, is proposed to solve feature selection problem in the text clustering. Here, fitness based dynamic inertia weight is integrated with the BPSO to control movement of the particles based on their current status, and the mutation and the chaotic strategy are applied to enhance the global search capability of the algorithm. Moreover, an opposition-based initialization is used to start with a set of promising and well-diversified solutions to achieve a better final solution. In addition, the opposition-based learning method is also used to generate opposite position of the gbest particle to get rid of the stagnation in the swarm. To prove effectiveness of the proposed method, experimental analysis is conducted on three different benchmark text datasets Reuters-21578, Classic4, and WebKB. The experimental results demonstrate that the proposed method selects more informative features set compared to the competitive methods as it attains higher clustering accuracy. Moreover, it also improves convergence speed of the BPSO.
Text clustering is widely used to create clusters of the digital documents. Selection of cluster centers plays an important role in the clustering. In this paper, we use artificial bee colony algorithm (ABC) to select appropriate cluster centers for creating clusters of the text documents. The ABC is a population-based nature-inspired algorithm, which simulates intelligent foraging behavior of the real honey bees and has been shown effective in solving many search and optimization problems. However, a major drawback of the algorithm is that it provides a good exploration of the search space at the cost of exploitation. In this paper, we improve search equation of the ABC and embed two local search paradigms namely chaotic local search and gradient search in the basic ABC to improve its exploitation capability. The proposed algorithm is named as chaotic gradient artificial bee colony. The effectiveness of the proposed algorithm is tested on three different benchmark text datasets namely Reuters-21,578, Classic4, and WebKB. The obtained results are compared with the ABC, a recent variant of the ABC namely gbest-guided ABC, a variant of the proposed methodology namely chaotic artificial bee colony, memetic ABC, and conventional clustering algorithm K-means. The empirical evaluation reveals very encouraging results in terms of the quality of solution and convergence speed.
High dimensionality of the feature space is one of the major concerns owing to computational complexity and accuracy consideration in the text clustering. Therefore, various dimension reduction methods have been introduced in the literature to select an informative subset (or sublist) of features. As each dimension reduction method uses a different strategy (aspect) to select a subset of features, it results in different feature sublists for the same dataset. Hence, a hybrid approach, which encompasses different aspects of feature relevance altogether for feature subset selection, receives considerable attention. Traditionally, union or intersection is used to merge feature sublists selected with different methods. The union approach selects all features and the intersection approach selects only common features from considered features sublists, which leads to increase the total number of features and loses some important features, respectively. Therefore, to take the advantage of one method and lessen the drawbacks of other, a novel integration approach namely modified union is proposed. This approach applies union on selected top ranked features and applies intersection on remaining features sublists. Hence, it ensures selection of top ranked as well as common features without increasing dimensions in the feature space much. In this study, feature selection methods term variance (TV) and document frequency (DF) are used for features’ relevance score computation. Next, a feature extraction method principal component analysis (PCA) is applied to further reduce dimensions in the feature space without losing much information. The effectiveness of the proposed method is tested on three benchmark datasets namely Reuters-21,578, Classic4, and WebKB. The obtained results are compared with TV, DF, and variants of the proposed hybrid dimension reduction method. The experimental studies clearly demonstrate that our proposed method improves clustering accuracy compared to the competitive methods.
Dimension reduction is a well-known pre-processing step in the text clustering to remove irrelevant, redundant and noisy features without sacrificing performance of the underlying algorithm. Dimension reduction methods are primarily classified as feature selection (FS) methods and feature extraction (FE) methods. Though FS methods are robust against irrelevant features, they occasionally fail to retain important information present in the original feature space. On the other hand, though FE methods reduce dimensions in the feature space without losing much information, they are significantly affected by the irrelevant features. The one-stage models, FS/FE methods, and the two-stage models, a combination of FS and FE methods proposed in the literature are not sufficient to fulfil all the above mentioned requirements of the dimension reduction. Therefore, we propose three-stage dimension reduction models to remove irrelevant, redundant and noisy features in the original feature space without loss of much valuable information. These models incorporates advantages of the FS and the FE methods to create a low dimension feature subspace. The experiments over three well-known benchmark text datasets of different characteristics show that the proposed three-stage models significantly improve performance of the clustering algorithm as measured by micro F-score, macro F-score, and total execution time.