Event Understanding through Multimodal Social Stream Interpretation (EUMSSI)
The main objective of EUMSSI is developing technologies for identifying and aggregating data presented as unstructured information in sources of very different nature (video, image, audio, speech, text and social context), including both online (e.g., YouTube) and traditional media (e.g. audiovisual repositories), and for dealing with information of very different degrees of granularity. The multimodal analytics will help organize, classify and cluster cross-media streams, by enriching its associated metadata. A core idea is that the process of integrating content from different media sources is carried out in an interactive manner, so that the data resulting from one media helps reinforce the aggregation of information from other media, in a cross-modal interoperable semantic representation framework. This will be accomplished thanks to the integration in a multimodal platform of state-of-the-art information extraction and analysis techniques from the different fields involved. Interoperability and interactive reinforcement of the data aggregation and a high-level semantic, conceptual and eventive representation will distinguish this proposal from others that incorporate multimodal search. The resulting platform will be potentially useful for any application in need of cross-media data analysis and interpretation, such as intelligent content management systems, personalized recommendation, real time event tracking, content filtering, etc.
Website: https://www.eumssi.eu/
Our online demo is available at: http://demo.eumssi.eu/demo/
In its final review, the European Commission has evaluated the EUMSSI project as excellent, which leaves as a result an advanced multimodal platform for automatic metadata enrichment.
Learning efficient representation using unsupervised networks
This project consists of two major works. In our first work, we propose a paradigm to compose many learning networks to learn efficient features. Furthermore, we also improve the way these features are incorporated across spatial regions using convolutional pooling. This architecture is then verified on standard datasets: CIFAR-10 and STL-10. Experimental results show that spatial boosting networks with careful choice of parameters help increasing accuracy over standard approaches. Even using the simple algorithm as K-Means, we still achieve competitive results in comparison to methods in literature. Moreover, since our framework works independently from choice of unsupervised learning algorithm, final results can still be boosted up using more powerful algorithms such as sparse coding or sparse autoencoder … These improvements can also be used in conjunction with other systems to further enhance performance of deep networks for unsupervised feature learning.
While expanding pooling regions using autoencoder, we observed that final representation is potentially noisy because of more feature vectors participating in one pooling unit. For K-Means, feature vectors are absolutely sparse. Meanwhile, sparse autoencoders do not create genuine “sparse vectors”, in which most values are zero. Hence, to make autoencoder compatible to convolutional pooling, we must enforce further sparsity constraint into its activation function. Therefore, in our second work, we propose inhibitory pseudo-interconnections in neural networks to form inhibitory sparse autoencoder. These interconnections can further constraint the sparse activation into feature vectors.
This project is also my master’s thesis under the supervision of Dr. Tran Minh Triet. In this project, I was the principal investigator and the main contributor. We received the grant from Vietnam National University. This project resulted in two publications in ICMLA and ICMV in 2013.
Link-based Document Classification
In this project, I was supervised by Dr. Tran Thai Son and Dr. Tran Minh Triet.
Document classification is a well-known problem in machine learning with many applications in real life. A many classical methods like SVM, Bayesian classifier, neural network, ... have been applied on the content of the documents. However, with the rise of internet, documents no longer exist as independent instances. They are linked with each other to form richly structured data sets. Hence, there is a need to classify the document using the link information between them.
In this project, we propose an approach which takes both the content and the link information of a document into the classification process. There are two type of models in the framework. First, the normal model takes the content of the document to classify it. Second, the individual link model uses the content and the category of the linking document to classify the linked document. These two types of model are combined to give final result. This new approach is evaluated in three standard data sets: CiteSeer, Cora, and Pubmed-Diabetes.
This project won the First Prize for Scientific Research Projects of Students by University of Science, HCMC in 2012.
This project received Award by Ministry of Education and Training for Excellent Student Projects in 2012.
Here are the slides of my presentations at IPMU, Italia and KSE, Vietnam.
[IPMU slide] [KSE slide]
A Hierarchical Approach for Handwritten Digit Recognition Using Sparse Autoencoder
This is my joint work with Duong Thien An and Phan Thanh Hai. We were supervised by Dr. Tran Thai Son.
We propose a new method to learn higher level features from specific characteristics of data using sparse autoencoder. The main key of our approach is to divide the handwritten digits into subsets corresponding to specific characteristics. The choice of the specific characteristic is crucial to the performance of this approach. We propose a new feature based on linear regression to extract geometrical characteristics of handwritten digits. The linear regression-based features are utilized to cluster digit images into different sets in preprocessing stage. After that,each set of clustered digit images is used as input for a corresponding hierarchical sparse autoencoder to extract higher level features automatically. This approach is evaluated on the MNIST data set. The experiments show that efficiently data clustering can get promising results.
Duong Thien An and Phan Thanh Hai received Olympia Award of Scientific Research Projects for Students in 2012.
3D Face Alignment Using Stereo Vision
This project is my joint work with Le Minh Hieu. This project is supervised by Dr. Tran Thai Son.
Face alignment is an important phase in tasks such as: face recognition, face manipulation, ... In this project, we extent the problem into face alignment in 3D space using two cameras. Active Shape Model (ASM) is the most well-known algorithm to align faces in 2D images. Using ASM, the corresponding landmarks of the face in two images could be found. With two calibrated cameras, one can estimate the depth of each corresponding pairs using stereo vision calculation.
There is a slideshow describing the process. Here is a screenshot of the program:
Low cost 3D Scanner Using Structured-Light
This project is supervised by Dr. Tran Minh Triet.
This is an implementation of the method proposed by Jean-Yves Bouguet and Pietro Perona at ICCV'98 [source]. Using this method, one can build 3D point cloud of an object using a pencil, a chessboard and a camera. With the structured light, the light ray from the light source to an object point is calculated based on movements of the shadows of the pencil. Together with the calibrated camera, intersection between a light ray and the corresponding line from image point to object point gives us the depth of the given object point.
This project was awarded Prize for Junior Research Projects by University of Science, HCMC in 2011.
Here is a demo of the software:
Solving Course Timetabling Problem Using Metaheuristic
The project is supervised by Dr. Nguyen T.T.M. Khang. Other members are Vo Dang Nguyen and Le Hong Ngoc.
The goal of the University Course Timetabling Problem (UCTP) is to find an optimal arrangement of courses, rooms and lecturers which satisfies a set of various constraints. Searching for the optimal solution is an NP-hard problem. Therefore, metaheuristic searches are applied to find a "near optimal" solution. In the scope of our project, two algorithms were studied:
Email Secure for PocketPC
This is my joint work with Le Trung Nghia. We were supervised by Dr. Tran Minh Triet.
This project was initiated due to the need of security software in mobile devices. Although there were many applications for encrypting emails in PCs but there were very few in PocketPCs. An encrypting application in PocketPC must be not only reliable but also efficient given the limited resource of mobile devices.
To tackle the challenges, we studied two symmetric and asymmetric encryption methods: Rijndael method, Elliptic Curve Cryptography (ECC). The result of this project is the "PocketPC Secure Email" application, which operates on Windows Mobile 6.0 or 6.5.
This project was awarded the First Prize for Junior Research Projects by the University of Science, HCMC in 2010.
Here are the slides describing the application and here is a demo of the software: