Research framework

Big data analytics and smart machine learning

This research deals with big data that can help the detection and discovery of knowledge that characterize a big data interdisciplinary environment. In this era, the recent research in science is highly interdisciplinary and data intensive. But the current machine learning models that are developed for big data analytics are not intended for interdisciplinary applications and they are disadvantaged by two major problems. The first problem is the lack of approaches available for estimating the hyperparameters of the machine learning models, and the other problem is the difficulty of discovering transformative knowledge that is suitable for developing smart machine learning that is automatic, adaptive, and cognitive to characterize big data. I contributed to all of these studies as the primary investigator. One of the findings is that signal processing approaches may be deployed to develop computational models to estimate hyperparameters of the machine learning models. Another finding is that the adaptive elliptical model may be developed to detect transformative knowledge and characterize data sources in multidisciplinary settings.


  • Shan Suthaharan. "Deep Learning Models." In Machine Learning Models and Algorithms for Big Data Classification, pp. 289-307. Springer, Boston, MA, 2016.

  • Shan Suthaharan. 2016. ``A Cognitive Random Forest: An intra- and inter-cognitive computing for big data classification under cune-condition,'' Eds. V. Raghavan, V. Gudivada, V. Govindaraju, and C. R. Rao, Cognitive Computing: Theory and Applications, vol. 35, pp. 207-227, Elsevier.

  • Shan Suthaharan, Weining Shen, Elliptical modeling and pattern analysis for perturbation models and classification. Int. J. Data Sci. Anal. 7(2): 103-113 (2019).

  • Shan Suthaharan, Big data analytics: Machine learning and Bayesian learning perspectives - What is done? What is not? Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 9(1) (2019).


Big data analytics and machine learning

This research focuses on big data that can help the detection and discovery of knowledge that characterize a big data environment (i.e., an environment monitored using different data-capture techniques and technologies). The focus is in three areas: defining a new metric space for big data, building a network model for big data processing, and exploring machine learning (ML) techniques using the new metric and network models. The current definition of big data on a 3D metric space, V3, formed by three parameters -- volume, variety, and velocity -- cannot provide a suitable platform for the early detection of big data characteristics. Therefore an in-depth analysis is required. To alleviate this problem I recently proposed a new metric space, C3, which is defined based on three new parameters: cardinality, continuity, and complexity. This proposed definition has been published in the Performance Evaluation Review of ACM SIGMETRICS and cited multiple times. The research with big data and C3 metric space requires a network model that can help investigate ML techniques. To support this effort, I also proposed a hybrid network model that consists of four units: user interaction and learning system (UILS), network traffic recording system (NTRS), HDFS, and cloud computing storage system (CCSS). This network model has also been published in the Performance Evaluation Review of ACM SIGMETRICS. I also served as the main investigator and contributed to the research activities.


  • V. Jeyakumar, G. Li, and Shan Suthaharan. 2014. Support vector machine classifiers with uncertain knowledge sets via robust convex optimization. Optimization 63(7). pp. 1099-1116.

  • S. Suthaharan. 2014. Big Data Classification: Problems and challenges in network intrusion prediction with machine learning, ACM SIGMETRICS Performance Evaluation Review, vol. 41, no. 4, pp. 70-73.

  • K. Kotipalli, and S. Suthaharan. 2014. Modeling of class imbalance using an empirical approach with spambase dataset and random forest classification, ACM SIGITE/RIIT 2014, pp. 75-80.

  • S. Suthaharan. Machine learning models and algorithms for big data classification: thinking with examples for effective learning, vol. 36, Springer US, 2015.


Data analytics and network security

This research deals with data with distinct characteristics that differ significantly from the data characteristics of digital image or video sources. The knowledge discovery of network traffic data to detect and characterize network security (as anomaly detection or network traffic classification) is a distinct and challenging problem due to the unique unstructured, imbalanced, and randomized data characteristics. In 2010, one of my masters’ students and I collected wireless sensor network (WSN) datasets at UNCG in collaboration with three researchers from Melbourne University (UniMelb), Australia. We have disseminated the datasets for the benefit of the research community via http://www.uncg.edu/cmp/downloads/ (Labeled Wireless Sensor Network Data Repository at UNCG) and http://issnip.unimelb.edu.au/research_program/downloads (ARC Research Network at UniMelb - ISSNIP) in Australia. We have also published the theory and methods that we used to collect the WSN data and the associated anomaly detection approaches in two peer-reviewed conference papers. These papers have been cited multiple times and the data sets have been used by researchers in this field. In this collaborative research, I served as the main investigator and contributed to the research accordingly.


  • S. Suthaharan. 2007. Reduction of queue oscillation in the next generation Internet routers. Computer Communications, Elsevier, vol. 30, no. 18. pp. 3881-3891.

  • S. Suthaharan, M. Alzahrani, S. Rajasegarar, C. Leckie, and M. Palaniswami. 2010. Labelled data collection for anomaly detection in wireless sensor networks. In Proceedings of the 6th International Conference on Intelligent Sensors, Sensor Networks and Information Processing. pp. 269-274.

  • S. Suthaharan, C. Leckie, M. Moshtaghi, S. Karunasekara, and S. Rajasegarar. 2010. Sensor data boundary estimation for anomaly detection in wireless sensor networks. IEEE 7th International Conference on Mobile Ad-hoc and Sensor Systems. pp. 546-551.

  • S. Suthaharan, L. Sunkara, and S. Keshapagu. 2013. Lame’ Curve-based signature discovery learning technique for network traffic classification. Workshop on Signature Discovery for Intelligence and Security. IEEE International Conference on Intelligence and Security Informatics. pp. 321-326.


Data analytics and image/video quality

This research studies the digital image sequence and digital video data. In other words, visual phenomenon detection is the main focus of this research. Computational (descriptive models) models to measure the quality of digital video, by quantifying the visual relationships between the edge objects in a scene, have been extensively studied. The outcome of this research was the development of the perceptually-significant block impairment metric (PSBIM) that was published in the IEE Electronics Letters. This research was later evaluated by researchers from the AT&T Research Labs, the University of California San Diego, and Dolby Laboratories and highlighted its strengths and weaknesses. They evaluated 13 quality metrics using six expectation criteria (A-F), and published their findings in the IEEE Transactions on Image Processing. They presented the advantages of PSBIM and ranked it high. Some of the phrases they used to describe the contributions of PSBIM were: “In terms of individual metrics, we found that GBIM, PSBIM, DCT-Step, and BAM performed the best,” “Expectation A was satisfied by GBIM, PSBIM, and MCEAM,” and “Expectation F was satisfied only by one metric: PSBIM.” This peer evaluation highlights the benefits of this research. I contributed to this research as the primary investigator.


  • S. Suthaharan. 2000. Image and edge detail detection algorithm for object based coding. Pattern Recognition Letters, vol. 21, no. 6-7. pp. 549-557.

  • S. Suthaharan. 2003. A perceptual quality metric for digital video coding. IEE Electronics Letters, vol. 39 no. 5. pp. 431-433.

  • A. Reibman and S. Suthaharan. 2008. A no-reference spatial aliasing measure for digital image resizing. In Proceedings of the IEEE International Conference on Image Processing. pp. 1184-1187.

  • S. Suthaharan. 2009. No-reference visually significant blocking artifact metric for natural scene images. Signal Processing 89. pp. 1647-1652.


Data analytics and image security

Digital images are highly utilized digital transactions in many applications, including health sciences, biological sciences, and geographical information sciences. The analysis of such digital images for discovering knowledge that is suitable for developing computational models to protect the ownership of these materials is very important. The pattern in the amount of alterations the data source can make in the digital images that it produced for the purpose-oriented modifications (e.g., compression, perturbation, and randomization) and its discovery are the main focuses of this research. In this research, computational models (Descriptive models) have been studied, focusing on protection of digital images and the tamper detection, in which various aspects of watermarking techniques were explored and fragile watermarking techniques were developed for digital images. Theory, results, and findings were published in international conferences and journals. These publications have been cited by peers. I contributed to this research as the primary investigator. Additionally, as a byproduct of this watermarking research, a system of encryption and key management techniques was developed and patented (in Australia, Japan, and Singapore), and it led to commercialization activities.


  • S. W. Kim, S. Suthaharan, H. K. Lee, and K. R. Rao. 1999. Image watermarking scheme using visual model and BN distribution. IEE Electronics Letters, vol. 35, no. 3. pp. 212-214.

  • S. Suthaharan, S. W. Kim, H. K. Lee, and K. R. Rao. 2000. Perceptually tuned robust watermarking scheme for digital images. Pattern Recognition Letters, vol. 21, no. 2. pp. 145-149.

  • S. Suthaharan. 2004. Fragile image watermarking using a gradient image for improved localization and security. Pattern Recognition Letters 25. pp. 1893-1903.

  • S. Suthaharan. 2010. Logistic map-based fragile watermarking for pixel-level tamper detection and resistance. EURASIP Journal on Information Security. DOI:10.1155/2010/829516. EURASIP JIS/829516, vol. 2010. 7 pages.