Welcome!

Welcome and thank you for visiting my page. My name is Gao Yu Lee (with an unofficial english name of Garrick) and I'm a Ph.D candidate in the school of Electrical and Electronic Engineering (EEE) and the Air Traffic Mangement Research Institute (ATMRI), in Nanyang Technological University (NTU) Singapore.
My main research interests include computer vision and image processing, with a main focus on addressing problems arising from low-data availability and sizes in the domain. Specifically, I have applied contrastive learning for effective and efficient dehazing (image haze removal) on haze datasets of small sample sizes, and have applied novel effective and efficient image classification techniques, such as the usage of the wider architecture ResNet for UAV-based disaster classification, for which the amount of images obtained are of lower quantity compared to typical benchmarks like CIFAR-10 and MNIST.
Apart from research works, this webpage also served as a blog for posting and discussing about various machine and deep learning ideas and principles. I would be posting topics of interest whenever I got the spirit of doing one (A few are available.).
Please feel free to email me at leegarrick214@gmail.com should you have any inquiries.
My Google scholar page: https://scholar.google.com/citations?user=OcKX7r0AAAAJ&hl=zh-CN
My github repository:https://github.com/GreedYLearner1146

Me posing with the sunset in Kyoto, Japan, in 2023.

Research Works

These are some of the research works (along with the relevant papers and github code repository links) that me and my main collaborators (Dr Tanmoy Dam: https://sites.google.com/view/tanmoy-dam/ , Dr Meftahul Ferdaus: https://mferdaus.com/ ) have done thus far:

I) Research and Conference Papers

1) WATT-EFFNet: A Wider ATTention EFFicientnet for Effective and Efficient Aerial Imagery Disaster Classification

We proposed WATT-EffNet to address some main shortcomings of UAV aerial disaster classification, which is essential for UAV search-and-rescue operation planning and execution. Our proposed model utilized an exisiting architecture already designed for efficiency (i.e., EfficientNet) and made it more efficient (while retaining its effectiveness) via widening its constitutent (MBConv) blocks while reducing the number of layers required. Additionally, we incorporated the channel-spatial attention network to better attend to key disaster class feature traits. Our WATT-EFFNet is evaluated on the AIDER images dataset, which is comprised of 4 disaster classes: Fire, flood, collapsed infrastucture, traffic accident , along with a non-disaster image class (normal). The dataset is imbalanced in the sense that the normal class images dominate the majority of the dataset class distribution, so as to simulate the real-world scenario in which diasaster images are relatively rarer to be encountered than non-disaster images. Nevertheless, we reported SOTA performances on AIDER.

Link to the published paper: https://ieeexplore.ieee.org/abstract/document/10108062

ArXiv preprint edition: https://arxiv.org/abs/2304.10811

github repository: https://github.com/GreedYLearner1146/WATT-EffNet-for-aerial-disaster-scene-classification

Fig.1: The algorithmic structure of our WATT-EffNet, as shown on the left of the figure. Our modification to the MBConv block layer using EfficientNet as the backbone is shown on the top right of the figure, as highlighted by the blue dotted box. We also illustrate the original MBConv block layer for comparison (red dotted box). The attention mechanism architecture is illustrated in the dotted orange box on the bottom right of the figure.

2) DRACO-DehazeNet: A Detail Recovery Attention-based Contrastive Dehazing Paradigm

We proposed DRACO-DehazeNet to address some challenges of image dehazing, in particular the low emphasis of detail recovery network to remove dehazing artifacts, as well as the inefficiency and ineffectiveness of prior dehazing works in attaining high performances on small haze datasets like O-HAZE. To address the first issue, we proposed an attention-based detail recovery network (ATTDRN), and we addressed the second issue via a combination of Inverted Residual block-based dilated dense network (DDIRB) and a novel quadruplet contrastive learning network that extract and compute the distances between the intermediate features of the hazy and clear images, as well as the distances between the intermediate features of the output of DDIRB and ATTDRN. More detail of the architecture can be found in the paper link below:

ArXiv preprint edition: https://arxiv.org/abs/2410.14595

github repository: https://github.com/GreedYLearner1146/DRACO-DehazeNet

Fig.2: Illustration of the overall architecture of our DRACO-DehazeNet. C denotes the number of channels, K denotes the kernel size, and D denotes the dilation rate. All strides used is of value 1. DDIRB represents the Dense Dilated Inverted Residual Blocks that served as the main dehazing network, and ATTDRN represents the Attention Detail Recovery Network which served as the detail recovery network for dehazing artifacts removal.

Fig.3: Comparative visual illustration of some of the various dehazed output on a selected O-HAZE image for the SOTAs (abbreviated on top of each image), including our approach. The original ground-truth and hazy image are also depicted for reference.

3) HELA-VFA: A Hellinger Distance-Attention-based Feature Aggregation Network for Few-Shot Classification

We proposed HELA-VFA, which performed variational few-shot image classification via the usage of the Hellinger distance instead of the commonly utilized Kullback-Leibler divergence. The upshot of the Hellinger distance is that it addressed the divergence problem that the KL divergence faced when one of its posterior probability distribution goes very close to 0, as well as providing for a more direct computational implementation due to its close resemblance to the Euclidean distance. We obtained new SOTAs performance on the common few-shot image classification benchmarks such as CIFAR-FS, CIFAR-100, miniImageNet and tieredImageNet. More detail of the architecture can be found in the paper link below:

Link to the published conference paper:

https://openaccess.thecvf.com/content/WACV2024/papers/Lee_HELA-VFA_A_Hellinger_Distance-Attention-Based_Feature_Aggregation_Network_for_Few-Shot_Classification_WACV_2024_paper.pdf

github repository: https://github.com/GreedYLearner1146/HELA-VFA

Fig.4: HELA-VFA algorithmic architecture (top) and the attention mechanism architecture (bottom). In the top diagram, S and S′ denotes the original and reconstructed images respectively , while Q and Q′ denotes the corresponding quantity but for the query set. ˆy and y denotes the predicted label after training and ground truth label respectively. The above network is allows a general a N-way-k-shot training and evaluation.

II) Review Papers

Two review papers have also been done along with my collaborators:

4) Unlocking the capabilities of explainable few‑shot learning in remote sensing

Recent advancements have significantly improved the efficiency and effectiveness of deep learning methods for image-based remote sensing tasks. However, the requirement for large amounts of labeled data can limit the applicability of deep neural networks to existing remote sensing datasets. To overcome this challenge, few-shot learning has emerged as a valuable approach for enabling learning with limited data. While previous research has evaluated the effectiveness of few-shot learning methods on satellite-based datasets, little attention has been paid to exploring the applications of these methods to datasets obtained from Unmanned Aerial Vehicles (UAVs), which are increasingly used in remote sensing studies. In this review, we provide an up-to-date overview of both existing and newly proposed few-shot classification techniques, along with appropriate datasets that are used for both satellite-based and UAV-based data. We demonstrate few-shot learning can effectively handle the diverse perspectives in remote sensing data. As an example application, we evaluate state-of-the-art approaches on a UAV disaster scene dataset, yielding promising results. Furthermore, we highlight the significance of incorporating explainable AI (XAI) techniques into few-shot models. In remote sensing, where decisions based on model predictions can have significant consequences, such as in natural disaster response or environmental monitoring, the transparency provided by XAI is crucial. Techniques like attention maps and prototype analysis can help clarify the decision-making processes of these complex models, enhancing their reliability. We identify key challenges including developing flexible few-shot methods to handle diverse remote sensing data effectively. This review aims to equip researchers with an improved understanding of few-shot learning’s capabilities and limitations in remote sensing, while pointing out open issues to guide progress in efficient, reliable and interpretable data-efficient techniques.

Link to the published paper: https://arxiv.org/abs/2310.08619

ArXiv preprint edition: https://link.springer.com/article/10.1007/s10462-024-10803-5

The github repository for the relevant implementation in the paper will be made available soon.

Fig.5: Overview of Explainable Few-Shot Learning in Remote Sensing. This illustration provides a high-level summary of the scope of our review on Explainable Few-Shot Learning techniques, applications, and challenges within Remote Sensing.

5) Dehazing Remote Sensing and UAV Imagery: A Review of Deep Learning, Prior-based, and Hybrid Approaches

High-quality images are crucial in remote sensing and UAV applications, but atmospheric haze can severely degrade image quality, making image dehazing a critical research area. Since the introduction of deep convolutional neural networks, numerous approaches have been proposed, and even more have emerged with the development of vision transformers and contrastive/few-shot learning. Simultaneously, papers describing dehazing architectures applicable to various Remote Sensing (RS) domains are also being published. This review goes beyond the traditional focus on benchmarked haze datasets, as we also explore the application of dehazing techniques to remote sensing and UAV datasets, providing a comprehensive overview of both deep learning and prior-based approaches in these domains. We identify key challenges, including the lack of large-scale RS datasets and the need for more robust evaluation metrics, and outline potential solutions and future research directions to address them. This review is the first, to our knowledge, to provide comprehensive discussions on both existing and very recent dehazing approaches (as of 2024) on benchmarked and RS datasets, including UAV-based imagery.

ArXiv preprint edition: https://arxiv.org/abs/2405.07520

Fig.6: A schematic diagram depicting the summary of our discussions on the open challenges and possible solutions for current image dehazing research.

More about me:

I completed my B.Sc in science (physics), doing my FYP thesis on the topic of theoretical particle physics ("Baryogenesis via Leptogenesis") with my supervisor Dr Leek Meng Lee (https://www.ntu.edu.sg/research/faculty-directory/detail/rp00396) , at the School of Physical and Mathematical Sciences (SPMS), NTU Singapore.

I did my Ph.D (Submitted thesis) on the theme of addressing computer vision-based model efficiency and low-data availability challenges for UAV imagery, under the supervision of my main supervisors (1st half) Prof Ken-Tye Yong (Now in University of Sydney, Australia: https://www.sydney.edu.au/engineering/about/our-people/academic-staff/ken-yong.html) and (2nd half) Assoc Prof. Daniel Puiu Poenar (https://www.ntu.edu.sg/research/faculty-directory/detail/rp00294), as well as my co-supervisor Prof Vu Duong (https://www.ntu.edu.sg/research/faculty-directory/detail/rp00362) at the School of EEE and ATMRI, NTU Singapore.

I changed my interest from Physics to AI from my bachelor to doctorate, which a rough move (since I knew that deep down physics is something I'm most passionate on). Although I do still love reading up physics stuffs currently, I felt that in the recent times, AI has become dominant in many aspects of the academic field, and recent research works have more often than not seen the incorporation of deep or machine learning to solve domain-specific problem. With the emergent of generative AI, the amount of datasets available for effective deep/machine learning has increased drastically, allowing further progression of deep/machine learning research since most of the state-of-the-art models are still data-hungry. As the impact of AI on society has rapidly become more noticeable, I believed that regardless of what field one might be working in eventually, having some knowledge of AI, including the ethical issue surrounding it, would be valuable not only from the perspective of career searching and advancing, but also from the perspective of being a vigilant citizen due to the emergence of malicious AI usage. Lastly, the skills obtained from AI tools can be transferred into many domains, and the role of multi-disciplainary research in recent decades cannot be emphasized enough, which justified the importance of equipping with AI knowledge and skills in current times.

In my free time, I loved reading up science (particularly astronomy and physics) and AI stuffs, stargazing (I owned a 4-inch F/5 celestron refractor), gyming, and cooking. Would also travel along with my family occasionally.

The constellation of Ursa Major (Containing the Big dipper (Left)), the constellation of Scorpius (Middle), and the constellation of Orion (Right Middle) and Canis Major (Right bottom partially blocked by tree) taken using my phone in the Hawaii island. The awe and beauty of the starry night sky can only be fully appreciated in dark locations, which unfortunately has been decreasing due to the increased in light pollution globally, and many other dimmer constellations cannot be seen at all in urban cities.

Page updated

Google Sites

Report abuse