Me posing with the sunset in Kyoto, Japan, in 2023.
Me with my Ph.D Supervisor, Assoc Prof Poenar Daniel Puiu, after the Ph.D Oral Examination Presentation.
Research Works
These are some of the research works (along with the relevant papers and github code repository links) that me and my main collaborators (Dr Tanmoy Dam: https://sites.google.com/view/tanmoy-dam/ , Dr Meftahul Ferdaus: https://mferdaus.com/ ) have done thus far:
I) Research and Conference Papers
1) WATT-EFFNet: A Wider ATTention EFFicientnet for Effective and Efficient Aerial Imagery Disaster Classification
We proposed WATT-EffNet to address some main shortcomings of UAV aerial disaster classification, which is essential for UAV search-and-rescue operation planning and execution. Our proposed model utilized an exisiting architecture already designed for efficiency (i.e., EfficientNet) and made it more efficient (while retaining its effectiveness) via widening its constitutent (MBConv) blocks while reducing the number of layers required. Additionally, we incorporated the channel-spatial attention network to better attend to key disaster class feature traits. Our WATT-EFFNet is evaluated on the AIDER images dataset, which is comprised of 4 disaster classes: Fire, flood, collapsed infrastucture, traffic accident , along with a non-disaster image class (normal). The dataset is imbalanced in the sense that the normal class images dominate the majority of the dataset class distribution, so as to simulate the real-world scenario in which diasaster images are relatively rarer to be encountered than non-disaster images. Nevertheless, we reported SOTA performances on AIDER.
Link to the published paper: https://ieeexplore.ieee.org/abstract/document/10108062
ArXiv preprint edition: https://arxiv.org/abs/2304.10811
github repository: https://github.com/GreedYLearner1146/WATT-EffNet-for-aerial-disaster-scene-classification
Fig.1: The algorithmic structure of our WATT-EffNet, as shown on the left of the figure. Our modification to the MBConv block layer using EfficientNet as the backbone is shown on the top right of the figure, as highlighted by the blue dotted box. We also illustrate the original MBConv block layer for comparison (red dotted box). The attention mechanism architecture is illustrated in the dotted orange box on the bottom right of the figure.
2) DRACO-DehazeNet: A Detail Recovery Attention-based Contrastive Dehazing Paradigm
We proposed DRACO-DehazeNet to address some challenges of image dehazing, in particular the low emphasis of detail recovery network to remove dehazing artifacts, as well as the inefficiency and ineffectiveness of prior dehazing works in attaining high performances on small haze datasets like O-HAZE. To address the first issue, we proposed an attention-based detail recovery network (ATTDRN), and we addressed the second issue via a combination of Inverted Residual block-based dilated dense network (DDIRB) and a novel quadruplet contrastive learning network that extract and compute the distances between the intermediate features of the hazy and clear images, as well as the distances between the intermediate features of the output of DDIRB and ATTDRN. More detail of the architecture can be found in the paper link below:
ArXiv preprint edition: https://arxiv.org/abs/2410.14595
github repository: https://github.com/GreedYLearner1146/DRACO-DehazeNet
Fig.2: Illustration of the overall architecture of our DRACO-DehazeNet. C denotes the number of channels, K denotes the kernel size, and D denotes the dilation rate. All strides used is of value 1. DDIRB represents the Dense Dilated Inverted Residual Blocks that served as the main dehazing network, and ATTDRN represents the Attention Detail Recovery Network which served as the detail recovery network for dehazing artifacts removal.
Fig.3: Comparative visual illustration of some of the various dehazed output on a selected O-HAZE image for the SOTAs (abbreviated on top of each image), including our approach. The original ground-truth and hazy image are also depicted for reference.
3) HELA-VFA: A Hellinger Distance-Attention-based Feature Aggregation Network for Few-Shot Classification
We proposed HELA-VFA, which performed variational few-shot image classification via the usage of the Hellinger distance instead of the commonly utilized Kullback-Leibler divergence. The upshot of the Hellinger distance is that it addressed the divergence problem that the KL divergence faced when one of its posterior probability distribution goes very close to 0, as well as providing for a more direct computational implementation due to its close resemblance to the Euclidean distance. We obtained new SOTAs performance on the common few-shot image classification benchmarks such as CIFAR-FS, CIFAR-100, miniImageNet and tieredImageNet. More detail of the architecture can be found in the paper link below:
Link to the published conference paper:
github repository: https://github.com/GreedYLearner1146/HELA-VFA
3.5) ANROT-HELANet: an Adversarially and Naturally RObusT Hellinger Aggregation Network
We have extended HELA-VFA to include adversarial noise and natural noise training (using Gaussian noise as an example), culminating in the the ANROT-HELANet model. (This paper has recently being published in Springer Nature's International Journal of Multimedia Information Retrieval.)
Link to the Springer paper: https://link.springer.com/article/10.1007/s13735-025-00390-8
github repository: https://github.com/GreedYLearner1146/ANROT-HELANet/tree/main
Fig.4: HELA-VFA algorithmic architecture (top) and the attention mechanism architecture (bottom). In the top diagram, S and S′ denotes the original and reconstructed images respectively , while Q and Q′ denotes the corresponding quantity but for the query set. ˆy and y denotes the predicted label after training and ground truth label respectively. The above network is allows a general a N-way-k-shot training and evaluation.
Fig.5: ANROT-HELANet algorithmic architecture. The major difference between this model and the HELA-VFA is the addition of the FGSM adversarial perturbations and the Gaussian natural noise injection and training step.
Some minor errata in the final version of the ANROT-HELANet paper:
In Fig.3 (page 12 of 23) (Fig.5 above), the scenario should be 5-way-1-shot, not 5-way-5-shot.
In page 10 of 23, BC = 1- D_{H}^{2}, not BC = \sqrt{1- D_{H}^{2}}.
In page 16 of 23, "Figure 7 and 8 depicts the GRAD-CAM for...", not "Figure 6 and 7 depicts the GRAD-CAM for...".
In page 17 of 23, "while natural noise with σ = 0.10 only causes a 1.4% reduction from 88.4% to 87.0%" should be "while natural noise with σ = 0.10 only causes a 0.2% reduction from 89.6% to 89.4%". The former is for the injection of Gaussian noise of σ = 0.15.
4) Enhancing Few-Shot Classification of Benchmark and Disaster Imagery with ATTBHFA-Net
We proposed ATTBHFA-Net which provides a novel approach to variational few-shot image classification via the combined usage of the Hellinger distance and the Bhattarcharyya coefficient measure, and train the model via an analogous contrastive learning-like paradigm, utilizing class distributions instead of feature points in the embedding space. Our model here improvised upon the HELA-VFA which only utilized the Hellinger distance, and in this work the latter served to regularizes same-class alignment while the Bhattarcharyya coefficient serves as a contrastive margin to enhance inter-class separability. This is due to the mathematical definition of the Bhattarcharyya coefficient which diverges quickly when two distributions differs from each other, hence serving as the "push" term in the contrastive learning-like paradigm. Apart from obtaining SOTAs performance on the few-shot image classification benchmarks such as CIFAR-FS, CIFAR-100, miniImageNet and tieredImageNet, we also evaluated our algorithm on the AIDER and CDD, two UAV-based disaster image classification datasets. Once again, our approach outperformed that of the SOTAs in general, shedding light on its extension and feasibility to UAV disaster image classification.
Link to the arXiv preprint:
https://arxiv.org/pdf/2510.18326
github repository: https://github.com/GreedYLearner1146/ABHFA-Net/tree/main
Fig.6: ATTBHFA-Net algorithmic architecture. It receives a set of support and query images as input, which are then processed by an attention-based encoder to extract their features. These features undergo encoding into latent representation in which the respective Gaussian probability distributions are obtained. The Bhattarcharyya coefficient and Hellinger distance computation process involves deriving confidence spaces for each class prototype’s distribution and comparing their overlapping degree with feature point distribution (represented by a red dot within a red dotted circle). The classes’ maximum probabilities are leveraged to generate a confidence score, indicating the predicted label for a given sample. To perform training via the Bhattacharyya-Hellinger Softmax loss and the categorical cross-entropy loss , we use the ATTBHFA-Net loss. The above illustrates a 5-way-1-shot Fast FSL training method. For more information regarding the meaning of the terms in the diagram, please see our preprint.
Fig.7: A pictorial comparison of the point feature-based contrastive learning (left) and the analogous distribution feature-based “contrastive learning” (right) using a three-class scenario, where the classes are denoted by purple circles, maroon diamonds and green hexagons. As mentioned, the Bhattarcharyya coefficient (as a similarity measure) can serve as a “push term” as the distance increases rapidly the more dissimilar the feature distribution, while the Hellinger distance can serve as a “pull-term” since similar embeddings are clustered closer together in a Euclidean-like manner but in the square-root probability space.
II) Review Papers
Two review papers have also been done along with my collaborators:
4) Unlocking the capabilities of explainable few‑shot learning in remote sensing
Recent advancements have significantly improved the efficiency and effectiveness of deep learning methods for image-based remote sensing tasks. However, the requirement for large amounts of labeled data can limit the applicability of deep neural networks to existing remote sensing datasets. To overcome this challenge, few-shot learning has emerged as a valuable approach for enabling learning with limited data. While previous research has evaluated the effectiveness of few-shot learning methods on satellite-based datasets, little attention has been paid to exploring the applications of these methods to datasets obtained from Unmanned Aerial Vehicles (UAVs), which are increasingly used in remote sensing studies. In this review, we provide an up-to-date overview of both existing and newly proposed few-shot classification techniques, along with appropriate datasets that are used for both satellite-based and UAV-based data. We demonstrate few-shot learning can effectively handle the diverse perspectives in remote sensing data. As an example application, we evaluate state-of-the-art approaches on a UAV disaster scene dataset, yielding promising results. Furthermore, we highlight the significance of incorporating explainable AI (XAI) techniques into few-shot models. In remote sensing, where decisions based on model predictions can have significant consequences, such as in natural disaster response or environmental monitoring, the transparency provided by XAI is crucial. Techniques like attention maps and prototype analysis can help clarify the decision-making processes of these complex models, enhancing their reliability. We identify key challenges including developing flexible few-shot methods to handle diverse remote sensing data effectively. This review aims to equip researchers with an improved understanding of few-shot learning’s capabilities and limitations in remote sensing, while pointing out open issues to guide progress in efficient, reliable and interpretable data-efficient techniques.
Link to the published paper: https://link.springer.com/article/10.1007/s10462-024-10803-5
The github repository for the relevant implementation in the paper will be made available soon.
Fig.8: Overview of Explainable Few-Shot Learning in Remote Sensing. This illustration provides a high-level summary of the scope of our review on Explainable Few-Shot Learning techniques, applications, and challenges within Remote Sensing.
5) Dehazing Remote Sensing and UAV Imagery: A Review of Deep Learning, Prior-based, and Hybrid Approaches
High-quality images are crucial in remote sensing and UAV applications, but atmospheric haze can severely degrade image quality, making image dehazing a critical research area. Since the introduction of deep convolutional neural networks, numerous approaches have been proposed, and even more have emerged with the development of vision transformers and contrastive/few-shot learning. Simultaneously, papers describing dehazing architectures applicable to various Remote Sensing (RS) domains are also being published. This review goes beyond the traditional focus on benchmarked haze datasets, as we also explore the application of dehazing techniques to remote sensing and UAV datasets, providing a comprehensive overview of both deep learning and prior-based approaches in these domains. We identify key challenges, including the lack of large-scale RS datasets and the need for more robust evaluation metrics, and outline potential solutions and future research directions to address them. This review is the first, to our knowledge, to provide comprehensive discussions on both existing and very recent dehazing approaches (as of 2024) on benchmarked and RS datasets, including UAV-based imagery.
ArXiv preprint edition: https://arxiv.org/abs/2405.07520
Fig.9: A schematic diagram depicting the summary of our discussions on the open challenges and possible solutions for current image dehazing research.
Services/Reviewing
In recent years I am also involved in scientific journal reviewing. Most of the journal papers I've reviewed are from Springer's Nature:
[2025]
Served as a reviewer for Wiley’s IET Image Processing Journal. Reviewed 2 papers. (IF:2.064)
Served as a reviewer for Springer’s Earth Science Informatics Journal. Reviewed 2 paper. (IF:2.7)
Served as a reviewer for Springer’s Scientific Reports. Reviewed 2 papers. (IF:3.8)
Served as a reviewer for IEEE Transactions on Geoscience and Remote Sensing . Reviewed 1 paper. (IF:7.5)
Served as a reviewer for Springer’s Archives of Computational Methods in Engineering. Reviewed 1 paper. (IF:12.1)
Served as a reviewer for Springer’s Machine Vision and Applications. Reviewed 1 paper. (IF:2.3)
Served as a reviewer for Springer’s Signal, Image and Video Processing. Reviewed 1 paper. (IF:2.1)
Served as a reviewer for Springer’s Multimedia Tools and Applications . Reviewed 1 paper. (IF:3.6)
[2026]
Served as a reviewer for Springer’s The Visual Computer . Reviewed 1 paper. (IF:2.9)
More about me:
I completed my B.Sc in science (physics), doing my FYP thesis on the topic of theoretical particle physics ("Baryogenesis via Leptogenesis") with my supervisor Dr Leek Meng Lee (https://www.ntu.edu.sg/research/faculty-directory/detail/rp00396) in SPMS, NTU Singapore.
I completed my Ph.D on the topic of "Overcoming Efficiency and Low-Data Challenges in UAV Disaster Classification and Dehazing: From Inverted Residual Block to Novel Contrastive and Few-Shot Approaches" under the supervision of my main supervisors (1st half) Prof Ken-Tye Yong (Now in University of Sydney, Australia: https://www.sydney.edu.au/engineering/about/our-people/academic-staff/ken-yong.html) and (2nd half) Assoc Prof. Daniel Puiu Poenar (https://www.ntu.edu.sg/research/faculty-directory/detail/rp00294), as well as my co-supervisor Prof Vu Duong (Now in Vin University) (https://vinuni.edu.vn/people/duong-nguyen-vu-2/) at the School of EEE and ATMRI, NTU Singapore.
I changed my interest from Physics to AI from my bachelor to doctorate, which a rough move (since I knew that deep down physics is something I'm most passionate on). Although I do still love reading up physics stuffs currently, I felt that in the recent times, AI has become dominant in many aspects of the academic field, and recent research works have more often than not seen the incorporation of deep or machine learning to solve domain-specific problem. With the emergent of generative AI, the amount of datasets available for effective deep/machine learning has increased drastically, allowing further progression of deep/machine learning research since most of the state-of-the-art models are still data-hungry. As the impact of AI on society has rapidly become more noticeable, I believed that regardless of what field one might be working in eventually, having some knowledge of AI, including the ethical issue surrounding it, would be valuable not only from the perspective of career searching and advancing, but also from the perspective of being a vigilant citizen due to the emergence of malicious AI usage. Lastly, the skills obtained from AI tools can be transferred into many domains, and the role of multi-disciplainary research in recent decades cannot be emphasized enough, which justified the importance of equipping with AI knowledge and skills in current times.
In my free time, I loved reading up science (particularly astronomy and physics) and AI stuffs, stargazing (I owned a 4-inch F/5 celestron refractor), gym, drawing, and cooking. Would also travel along with my family occasionally.
The constellation of Ursa Major (Containing the Big dipper (Left)), the constellation of Scorpius (Middle), and the constellation of Orion (Right Middle) and Canis Major (Right bottom partially blocked by tree) taken using my phone in the Hawaii island. The awe and beauty of the starry night sky can only be fully appreciated in dark locations, which unfortunately has been decreasing due to the increased in light pollution globally, and many other dimmer constellations cannot be seen at all in urban cities.