Computer Vision

1 The Simpsons Character Recognition

Built 4 models (KNN, Softmax Classifier, 3-layers CNN, and 6-layers CNN) to image classification for The Simpsons image
dataset with 20 characters.
In KNN model, the L2 distance depicted the distance between testing points and all training points. For L2 distance, the running
speed is no-loop > two-loop > one-loop. Finally, we code no-loop to save time and get the prediction accuracy, 0.537.
In Softmax Classifier, the vectorization code method is proved as a faster one. With vectorization, the training part got trained very fast. We use the best learning rate, 1.2e-01, and regularization strength, 1.05e-02, to predict and iterate 800 times. The
prediction accuracy is 0.634 with the lowest loss, 1.567.
In 3-layers CNN, the study does 10 epochs due to using the laptop. Epoch 9 brings the least loss, 0.0506, best accuracy, 0.985.
In 6-layers CNN, the prediction accuracy is 0.958.
The top 3 prediction results for each character in each method are plotted in 4*4 charts.

CONCLUSION: 3-layers CNN > 6-layers CNN > Softmax > KNN. So 3-layers CNN is the best model. Vectorization is an important code idea that protects the CPU and saves running time.

Report

Code

2 Bayer Pattern Color Interpolation

Code bilinear and proposed interpolation method to recover Bayer Pattern Greyscale image to color one.
The project is solved by Python, the dataset includes 4 images, office, onion, pears, and peppers.
Use Mean Square Error (MSE) and Peak Signal to Noise Ratio (PSNR) to evaluate the interpolation result.

CONCLUSION: Image onion and peppers have lower MSE and PSNR in Bilinear than that in Proposed one; image office and pears have lower MSE and PSNR in Proposed than that in Bilinear one.

Report

Code

3 Image Enhancement Using Histogram Matching

Code histogram equalization and histogram matching function to implement on self-taking bright and dark images.
The project is done in MATLAB, the dataset includes 2 sample images and 4 target images.
Plot matched image’s histogram to show the matching effect.

CONCLUSION: The bright and dark images look more normal. The histograms look very equalized.

Report

Code

4 Face Morphing

Blend the Ted’s face and Hillary’s together based on different weight to generate new image.
Code in MATLAB. Grab feature points and make Delaunay triangle, then generate bounding box, next define affine transformation to warp sample image to new one, finally define a weight and fuse 2 warped images.

CONCLUSION: the result is shown in video.

Report

Code

Video

5 Build Panoramic View

Implement 2 methods to do image stitching, blend 3 images together to form a panoramic view.
Code in both MATLAB and Python. The key knowledge includes Harris Corner Detector, SIFT, RANSAC, and back warping.

CONCLUSION: method with Harris Corner Detector and best matching draw more natural images. But implement OpenCV in Python will draw more regular images.

Report

Code

6 Motion Tracking in Optical Flow

Implement original Brightness Constancy Model and Lucas-Kanade (LK) motion estimation method to track vehicles from highway traffic video.
Code in Python. Choose Harris Corner Detector to extract key points.
In Brightness Constancy Model, we track the pixels for the whole image; in Lucas-Kanade (LK) motion estimation method, we only track the key points detected by Harris Corner Detector.

CONCLUSION: Please check video1 and video2 and video3. In video1, we can obverse the change of pixel direction while the objects pass them; in video2 and video3, we can observe the objects’ motion orbit specifically.

Report

Code

Video

7 Handwritten Recognition in YOLOV2

Train YOLO model to recognize handwritten letters and digits.
The dataset is generated from EMNIST. 10000 training set, 1000 testing set. 200 epochs.
There are 62 classes, 10 digits, 26 upper case letters and 26 lower case letters.

CONCLUSION: The stabled loss is around 6 to 8. The average accuracy is 84.7%, the average recall is 0.862, the average F1 score is 0.414. The Recognition video is here.

Report

Code

8 Methods Comparison of Facial Expression Recognition

This project is to recognize 8 human facial expressions via statistical models.
The dataset is AffectNet, consisting of more than 450000 images. The total expression category is 8, including happy, sad, fear, surprise, and so on.
The data is analyzed by KNN, LDA, and Mobilenet v2 to handle.

CONCLUSION: The accuracy rank is Mobilenet v2 > LDA > KNN.

Report

Code

9 Fine-Grained Classification Via Both Visual and Textual Features

This project does tasks like Text Detection, Text Recognition, Attention, Classification and so on.
Used packages: PyTorch

Report

Code

10 XnODR and XnIDR: Two Accurate and Fast Fully Connected Layers For Convolutional Neural Networks

This project proposes a new Fully Connected Layer, XODR and XIDR, by fusing CapsFC Layer and Xnorization.
To validate them, we insert XODR and XIDR into both lightweight models, MobileNet V2, and heavyweight model, ResNet-50.
The experiment takes on three datasets, MNIST, CIFAR-10, and MultiMNIST. ArXiv | Github
Published in Journal of Intelligent & Robotic Systems on Aug, 2023

CONCLUSION: Both XODR and XIDR can improve accuracy with the cost of fewer FLOPs and parameters.

Report

Code

11 Study of Graph Neural Networks on Facial Landmark Points Detection

This project proposes the Simple-DAG model to explore the feasibility of GNNs on Facial Landmark Points Detection.
We do experiments on the 300W dataset and record the NME and FR@0.1.

Conclusion: the visualization and metric values validate the feasibility of GNNs on Facial Landmark Points Detection.

Report

Code

12 Data Prepare on Classic Datasets

This project presents code to parse classic datasets, ImageNet, Multi-MNIST, COCO-Text, and 300W.
Prepare the image folders and related labels and help train ImageNet by Tensorflow in easy mode.
Provide two ways to generate the Multi-MNIST dataset for training.
Create different methods to parse the COCO-Text dataset for training in PyTorch.
Modify the code so it can generate challenging, common, and full sets without extra code. Easy to use in PyTorch.

Code - ImageNet

Code - Multi-MNIST

Code - COCO-Text

Code - 300W

13 MC-ViViT: Multi-branch Classifier-ViViT to Detect Mild Cognitive Impairment in Older Adults using Facial Videos

This project proposes MC-ViViT to predict Mild Cognitive Impairment in the early stage from I-CONECT Study videos.
We design the Multi-branch Classifier module to enrich the extracted spatio-temporal features and capture the visual features from different perspectives.
We develop the HP loss by combining Focal loss and AD-CORRE loss. The HP loss addresses the inter- and intra-class imbalanced issues and helps the model pay attention to classes with fewer samples and subjects with short video lengths.

CONCLUSION: MC-ViViT achieves over 90% accuracy. Plan to send the paper to a journal after double check.

Page updated

Google Sites

Report abuse