My most significant research works have been in the field of Complex Networks and Machine Learning/Data mining.
## A Graph-based Topic Extraction Method enabling Simple Interactive CustomizationAt Machine Learning and Networked Information Spaces (MALNIS) Laboratory in Dalhousie University, Canada. It is often desirable to identify the concepts that are present in a corpus. A popular way to deal with this objective is to discover clusters of words or topics, for which many algorithms exist in the literature. Yet most of these methods lack the interpretability that would enable interaction with a user not familiar with their inner workings. The paper proposes a graph-based topic extraction algorithm, which can also be viewed as a soft-clustering of words present in a given corpus. Each topic, in the form of a set of words, represents an underlying concept in the corpus. The method allows easy interpretation of the clustering process, and hence enables the scope of user involvement at various steps. For a quantitative evaluation of the topics extracted, we use them as features to get a compact representation of documents for classification tasks. We compare the classification accuracy achieved by a reduced feature set obtained with our method versus other topic extraction techniques, namely Latent Dirichlet Allocation and Non-negative Matrix Factorization. While the results from all the three algorithms are comparable, the speed and easy interpretability of our algorithm makes it more appropriate to be used interactively by lay users. The user interface for the algorithm can be found at: This work has been accepted at DocEng 2013. A. Srivastava, A. J. Soto, E. Milios, "A Graph-based Topic Extraction Method enabling Simple Interactive Customization", 13th ACM Symposium on Document Engineering (DocEng 2013). [Accepted]## Evaluation of Topic Extraction AlgorithmsAt Machine Learning and Networked Information Spaces (MALNIS) Laboratory in Dalhousie University, Canada. There are various word clustering/topic extraction algorithms in the literature. For evaluating them, however, there are a very few techniques and that too are sometimes specific to the algorithm used. Usually we have the labels associated with documents and not words, and this makes supervised evaluation even more difficult. With these problems in mind, we have propose five different evaluation techniques that make use of available labels associated with documents to evaluate the performance of topics. We present the evaluation of LDA, NMF and Fuzzy-c-Means using these metrics. The work will soon be communicated to a journal. ## Text ClusteringAt Machine Learning and Networked Information Spaces (MALNIS) Laboratory in Dalhousie University, Canada. Many real life networks have an underlying bipartite structure based on which similarity between two nodes or data instances can be defined. For example, in the case of a document corpus, the similarity between a pair of documents can be assumed to arise from the words that co-occur in them, and this document-word co-occurrence relationship can be modeled as a bipartite graph. A document similarity graph can be obtained by taking a one-mode projection of the bipartite graph, which is a popular technique for studying similar networks which arise from bipartite structure. A graph-based clustering algorithm can then be applied to this projection graph to obtain clusters of documents. In this paper we study the use of one-mode projection of the document-word bipartite graph and the subsequent application of a modularity optimization algorithm to cluster the documents. In particular, we propose an alternative and faster algorithm, which works in two-steps: first, finding the documents that are easy to cluster, and then, assigning the remaining documents to the existing or new clusters. We show that the algorithms based on one-mode projections perform significantly better than traditional clustering approaches. In addition, our method has similar or better clustering performance than the most popular algorithm for modularity optimization, while also running four times faster.. The work has been accepted in SAC 2013. As future work, we wish to use similar ideas for topic modelling and document classification. More details in: A. Srivastava, A. J. Soto, E. Milios, "Text Clustering using One-Mode Projection of Document-Word Bipartite Graphs", 28th Symposium On Applied Computing, Portugal (2013). [url: http://dl.acm.org/citation.cfm?id=2480539]## Efficient Application of Gabor Filters with Nonlinear Support Vector MachinesAt Central Electronics Engineering Research Institute (CEERI), Pilani, India. Both Gabor filters and Support Vector Machines (SVMs) are widely used in computer vision tasks for feature extraction and classification respectively. However the method is usually plagued by the problems of high computational complexity and memory usage owing to the high dimensionality of the Gabor filter responses. There were methods proposed to mitigate this problem by truncating or finding a gist of the responses but such approaches also lead to loss of information. Ashraf et al. gave a reinterpretation of the whole method and proposed a way to eliminate the need for such approximations. But they only give an analysis for linear SVM. We extend their work and provides analysis for nonlinear kernels within the same framework. The class of nonlinear kernels that are compatible with this framework are derived and experimental results on the facial expression recognition task are reported. The work has been accepted to be published by IEEE Computer Society(CSDL) Digital Libraries: A. Srivastava, P. Mohapatra, A. S. Mandal, “Efficient Application of gabor Filters with Non-linear Support VectorMachines”. Proceedings of the 2012 International Conference on Computing Sciences, pp 7-10. ISBN 978-0-7695-4817-3/12 IEEE, DOI 10.1109/ICCS.2012.31 [url:http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6391637&isnumber=6391635]## Analytical Model of the Twitter Network
## Degree Distribution of One-mode-Projection NetworksMany real-world systems are modeled as evolving bipartite networks and their one-mode projections. In particular, Discrete Combinatorial Systems (DCSs), which consist of a finite set of elementary units and different combinations of these units, can be modeled by a subclass of bipartite networks known as Alphabetic Bipartite Networks or alpha-BiNs, where the bottom partite-set contains a fixed number of nodes (the elementary units) and the top partite-set grows unboundedly with time through the addition of nodes (the combinations). Utilizing an exact correspondence between the preferential growth of alpha-BiNs and the Polya Urn scheme, we analytically solve the model to compute exact degree distributions of the bottom (fixed) set and the bottom projection. To the best of our knowledge, this is the first work which proposes and solves such a generalized growth model for alpha-BiNs. More details can be found in: N. Ganguly, S. Ghosh, T. Krueger, A. Srivastava, "Degree distributions of evolving alphabetic bipartite networks and their projections", Theoretical Computer Science, Available online 21 August 2012, ISSN 0304-3975, 10.1016/j.tcs.2012.08.007.S. Ghosh, S. Saha, A. Srivastava, N. Ganguly, A. Mukherjee, "Understanding Evolution of Inter-Group Relationships using Bipartite Networks", IEEE Journal on Selected Areas in Communications - 2012 Special Issue on Emerging Technologies in Communications (2012 JSAC-SI-ET) [url: http://dx.doi.org/10.1109/JSAC.2013.SUP.0513051] |