Partially labeled data

Multi-Class Semi-Supervised Kernel Spectral Clustering (MSS-KSC) [IEEE TNNLS, 2015] 

In many contexts, learning from few labeled and large amount of unlabeled data points is highly desirable. Here MSS-KSC approach, an which follows an optimization problem formulated in the primal-dual settings where the available labeled data points are incorporated into the core model via a regularization term is proposed. The core model is Kernel Spectral Clustering (KSC), a completely unsupervised algorithm. MSS-KSC is a kernel based model with out-of-sample extension property. It addresses both multi-class semi-supervised classification and semi-supervised clustering. It realizes low embedding dimension to reveal the existing number of clusters. Thanks to the proper model selection scheme, it can detect hidden micro-clusters. The method has been successfully applied in several areas including: classification, clustering, image segmentation, community detention.

 [PDF]

Given a few user-labeled data points the initial model is learned and then the class membership of the remaining data points in the current and subsequent time instants are estimated and propagated in an on-line fashion. Furthermore, the tracking capabilities of the Kalman filter is used to provide the labels of the objects in motion and thus regularizing the solution obtained by the MSS-KSC algorithm.[PDF][datasets]

Time_series_clustering.mp4
I_MSS_KSC_Yuna_Kim.mp4

Multi-label semi-supervised learning using regularized kernel spectral clustering [IJCNN 2016][PDF][Code]

Often in real-world applications such as web page categorization, automatic image annotations and protein function prediction, each instance is associated with multiple labels (categories) simultaneously. In addition, due to the labeling cost one usually deals with a large amount of unlabeled data while the fraction of labeled data points will typically be small. In this paper, we propose a multi-label semi-supervised kernel spectral clustering learning algorithm that learns from both labeled and unlabeled instances. The kernel spectral clustering algorithm (KSC) serves as a core model and the information of labeled data points is integrated into the model via regularization terms. The propagation of the multiple labels to unlabeled data points is achieved by incorporating the mutual correlation between (similarity across) labels as well as encouraging the model output to be as close as possible to the given ground-truth of the labeled data points. Thanks to the Nyström approximation method, an explicit feature map is constructed and the optimization problem is solved in the primal. Experimental results demonstrate the effectiveness of the proposed approaches on real multi-label datasets

In many real-life applications, one encounters huge amount of unlabeled data points whereas the portion of labeled data is few. The following two approaches:

are proposed to make the Multi-class Semi-Supervised Kernel Spectral Clustering (MSSKSC) model scalable.

Non-parallel semi-supervised classification based on kernel spectral clustering [IEEE-IJCNN, 2013][PDF]

A non-parallel semi-supervised algorithm based on kernel spectral clustering is formulated. The prior knowledge about the labels is incorporated into the kernel spectral clustering formulation via adding regularization terms. The proposed method will generate two non-parallel hyperplanes which then are used for the out-of-sample extension. Thanks to the proper decision function, the method can learn the complex structure using a linear kernel and the available labeled data points. The left figure is the result of a purely unsupervised algorithm (KSC) when RBF kernel is used. The figure on the right, shows the result of the proposed semi-supervised method with linear kernel, when the labeled data points are incorporated into the core model (KSC). 

Scalable semi-supervised kernel spectral learning using random Fourier features [IEEE SSCI, 2016][PDF]

We live in the era of big data with dataset sizes growing steadily over the past decades. In addition, obtaining expert labels for all the instances is time-consuming and in many cases may not even be possible. This necessitates the development of advanced semi-supervised models that can learn from both labeled and unlabeled data points and also scale at worst linearly with the number of examples. In the context of kernel based semi-supervised models, constructing the training kernel matrix for the large training dataset is expensive and memory inefficient. This paper investigates the scalability of the recently proposed multi-class semi-supervised kernel spectral clustering model (MSSKSC) by means of random Fourier features. The proposed model maps the input data into an explicit low-dimensional feature space. Thanks to the explicit feature maps, one can then solve the MSSKSC optimization formation in the primal, making the complexity of the method linear in number of training data points. The performance of the proposed model is compared with that of recently introduced reduced kernel techniques and Nyström based MSSKSC approaches. Experimental results demonstrate the scalability, efficiency and faster training computation times of the proposed model over conventional large scale semi-supervised models on large scale real-life datasets.

A Real-time PCB Defect Detector Based on Supervised and Semi-supervised Learning [ESANN 2020]

This paper designs a deep model to detect PCB defects from an input pair of a detect-free template and a defective tested image. A novel group pyramid pooling module is proposed to efficiently extract features in various resolutions to predict defects in different scales. To train the deep model, a dataset including 6 common types of PCB defects is established, namely DeepPCB, which contains 1,500 image pairs with annotations. Besides, a semi-supervised learning manner is examined to ef- fectively utilize the unlabelled images for training the PCB defect detector. Experiment results validate the effectiveness and efficiency of the proposed model by achieving 98.6% mAP @ 62 FPS on DeepPCB dataset. Deep- PCB is now available at: https://github.com/tangsanli5201/DeepPCB.[PDF]

This paper introduces a methodology to incorpo- rate the label information in discovering the underlying clusters in a hierarchical setting using multi-class semi-supervised clus- tering algorithm. The method aims at revealing the relationship between clusters given few labels associated to some of the clusters. The problem is formulated as a regularized kernel spectral clustering algorithm in the primal-dual setting. The available labels are incorporated in different levels of hierarchy from top to bottom. As we advance towards the lowers levels in the tree all the previously added labels are used in the generation of the new levels of hierarchy. The model is trained on a subset of the data and then applied to the rest of the data in a learning framework. Thanks to the previously learned model, the out-of-sample extension property of the model allows then to predict the memberships of a new point. A combination of an internal clustering quality index and classification accuracy is used for model selection. Experiments are conducted on synthetic data and real image segmentation problems to show the applicability of the proposed approach.