Leveraging Intermediate Features of Vision Transformer for Face Anti-Spoofing
Mika Feng, Koichi Ito, Takafumi Aoki, Tetsushi Ohki, Masakatsu Nishigaki.
Document Image Rectification using Stable Diffusion Transformer
Pooja Kumari, Sukhendu Das
Knowledge Distillation Approach for SOS Fusion Staging: Towards Fully Automated Skeletal Maturity Assessment
Omid Halimi Milani, Amanda N Nikho, Marouane Tliba, Lauren Mills, Rashid Ansari, Ahmet Enis Cetin, Mohammed Hossameldeen Saad Elnagar
Leveraging Fixed and Dynamic Pseudo-Labels in Cross-Supervision Framework for Semi-Supervised Medical Image Segmentation
Suruchi Kumari, Pravendra Singh
Domain Adaptation for Skin Lesion: Evaluating Real-World Generalisation
Nurjahan Sultana, Wenqi Lu, Xinqi Fan, Moi Hoon Yap
Enhanced Multi-View Pedestrian Detection Using Probabilistic Occupancy Volume
Reef Alturki, Adrian Hilton, Jean-Yves Guillemaut
Uncertainty-guided Style-aware Perceptual Quality Assessment for AI-Generated Images
Tushar Shinde, Shivaanee Eswaran
LLaVA-SCo: Teach Vision Language Models to Self-Correct
Zixuan Liu, Guangkai Jiang, khajavi siavash
Exploiting Adversarial Learning and Topology Augmentation for Open-Set Visual Recognition
Rosa Zuccarà, Georgia Fargetta, Alessandro Ortis, Sebastiano Battiato
Improved Out-of-Distribution Detection with Additive Angular Margin Loss
Deepak Ravikumar, Efstathia Soufleri, Kaushik Roy
Low-Resource Video Super-Resolution using Memory, Wavelets, and Deformable Convolutions
Kavitha Viswanathan, Amit Sethi, Shashwat Pathak, Piyush Bharambe, Harsh Choudhary
IdolDanceNet:Indian Heritage idol Dance Pose Classification
S Kanimozhi, Sabari Nathan, Sasithradevi A, P Prakash, S Mohamed Mansoor Roomi
Dust to Detail: Restoring Sand-dust Images with Frequency-Guided Attention and Multi-Scale Features
Romala Mishra, Sobhan Dhara
Training Data Reconstruction: Privacy due to Uncertainty?
Christina Runkel, Kanchana Vaishnavi Gandikota, Jonas Geiping, Carola-Bibiane Schönlieb, Michael Moeller
Consistent Amortized Image Clustering via Generative Flow Networks
Irit chelly (Ben Gurion University of the Negev)*; Roy Uziel (Ben Gurion University of the Negev); Oren Freifeld (Ben Gurion University of the Negev); Ari Pakman (Ben Gurion University of the Negev)
A Class- and Data-Agnostic Regularization Strategy for Rehearsal-Based Continual Learning
Lama Alssum (King Abdullah University of Science and Technology)*; Hasan Abed Al Kader Hammoud (King Abdullah University of Science and Technology); Motasem Alfarra (King Abdullah University of Science and Technology); Juan Leon (King Abdullah University of Science and Technology); Bernard Ghanem (King Abdullah University of Science and Technology)
Diffusion Classifiers Understand Compositionality, but Conditions Apply
Yujin Jeong (TU Darmstadt)*; Arnas Uselis (Tübingen AI Center, University of Tübingen); Seong Joon Oh (Tübingen AI Center, University of Tübingen); Anna Rohrbach (TU Darmstadt, hessian.AI)
SKULPT Yourself: A Data-Driven Facial Reconstruction Pipeline
Maida Aizaz (KAIST)*; Khadija Rajabova (KAIST); Seon Gyeom Kim (KAIST); Won Joon Lee (NFS Korea); Joon Yeol Ryu (NFS Korea); Hyobong Jang (NFS Korea); Soojung Park (NFS Korea); Kiwan Jeon (NIMS); Hyoung Suk Park (NIMS); Sung Ho Kang (NIMS); Tak Yeon Lee (KAIST)
Explainable AI for Enhancing Image Transformations in Cellular Neural Networks
Georgia Fargetta (University of Catania)*; Francesco Rundo (University of Catania); Sebastiano Battiato (University of Catania)
DN-CBM: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery
Sukrut Rao (Max Planck Institute for Informatics); Sweta Mahajan (MPI for Informatics)*; Moritz Böhle (Kyutai); Bernt Schiele (Max Planck Institute for Informatics )
Emuru: An Autoregressive Paradigm for Generalizable Text Image Generation
Vittorio Pippi (University of Modena and Reggio Emilia); Fabio Quattrini (University of Modena and Reggio Emilia); Silvia Cascianelli (Università di Modena e Reggio Emilia)*; Alessio Tonioni (Google); Rita Cucchiara (University of Modena and Reggio Emilia)
Generative Video Editing: From unconfident to confident
Kseniia Buzko (University of Waterloo)*; David Clausi (University of Waterloo); Yuhao Chen (University of Waterloo)
Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval
Davide Caffagni (University of Modena and Reggio Emilia); Sara Sarto (University of Modena and Reggio Emilia)*; Marcella Cornia (University of Modena and Reggio Emilia); Lorenzo Baraldi ( University of Modena and Reggio Emilia); Rita Cucchiara (University of Modena and Reggio Emilia)
Robust Training for Sinusoidal Neural Networks
Diana Aldana Moreno (Instituto de Matematica Pura e Aplicada)*; Andre Araujo (Google DeepMind); Luiz Velho (IMPA); Tiago Novello (IMPA)
The Deepfake Doctor: Diagnosing and Treating Audio-Video Fake Detection
Carlotta Segna (TU Darmstadt)*; Marcel Klemt (TU Darmstadt); Anna Rohrbach (TU Darmstadt)
Towards Controlling HTG Style with Latent Diffusion Models
Konstantina Nikolaidou (Luleå University of Technology)*
Knowledge-Tracing Machine Unlearning For Large Vision-Language Models
Yuwen Tan (Boston University); Boqing Gong (Boston University)*
VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment
Darshana Saravanan (IIIT Hyderabad)*; Varun Gupta (IIIT Hyderabad); Darshan Singh (IIIT Hyderabad); Zeeshan Khan (Inria, Paris); Vineet Gandhi (IIIT Hyderabad); Makarand Tapaswi (IIIT Hyderabad)
RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives
Deepti Rawat (International Institute of Information Technology)*
HAIKYU: Hockey Action Identification and Keypose Understanding
Kseniia Buzko (University of Waterloo)*; David Clausi (University of Waterloo); Yuhao Chen (University of Waterloo)
Unmasking Evaluation Artifacts: Rethinking Gender Bias Metrics in Vision-Language Retrieval
Sadie Askari (Indiana University)*; David Crandall (Indiana University)
3D-WAG: Wavelet-Guided Autoregressive Generation for 3D Shapes
Tejaswini Medi (University of Mannheim)*; Arianna Rampini (Autodesk); Pradyumna Reddy (Autodesk); Pradeep Kumar Jayaraman (Autodesk); Margret Keuper (University of Mannheim)
A Distractor-Aware Memory for Visual Object Tracking with SAM2
Jovana Videnović (University of Ljubljana)*; Alan Lukežič (University of Ljubljana ); Matej Kristan (University of Ljubljana )
PRISM-FL: Privacy-preserving Image Synthesis Mechanism for Federated Learning
Efstathia Soufleri (Purdue University)*
ReKon3D: Relation-Knowledge Aware Multi-Modal Embedding and Contrastive GAN for Zero-Shot 3D Recognition
Sejuti Rahman (University of Dhaka)*
How many classes do we need to see for novel class discovery?
Akanksha Sarkar (Cornell University)*; Been Kim (Google Deepmind); Jennifer Sun (Cornell University)
L-SWAG: Layer-Sample Wise Activation with Gradients information for Zero-Shot NAS on Vision Transformers
Sofia Casarin (Free University Of Bozen-Bolzano)*
Beyond STE: A Curriculum-Driven Backpropagation Method for Quantization-Aware Training
Kaiqi Zhao (Arizona State University)*
HD-VILA-Caption: A Diverse Video-Text Dataset Derived from ASR Narrations
Maheen Saleh (MPI-Informatics Saarbrucken)*; Nina Shvetsova (Goethe University Frankfurt); Anna Kukleva (MPI-Informatics Saarbrucken); Hilde Kuehne (Goethe University Frankfurt); Bernt Schiele (MPI-Informatics Saarbrucken)
Parallel ViT-Based Lung Severity Quantification from CXRs and CT Scans
Bouthaina Slika (University of the Basque Country)*; Fadi Dornaika (IKERBASQUE, Basque Foundation for Science); Karim Hammoudi (Department of Computer Science, IRIMAS, Université de Haute-Alsace)
Local Learning in Low-Rank Space: A Feedback Alignment Perspective
Arani Roy (Purdue University)*; Kaushik Roy (Purdue University)
Multimodal Selective State Space Learning for Efficient Action Quality Assessment
Yaoxin Li (University of Waterloo)*; Alexander Wong (University of Waterloo)
Network Inversion for Uncertainty-Aware Out-of-Distribution Detection
Pirzada Suhail (IIT Bombay)*; Rehna Afroz (IIT Bombay); Amit Sethi (IIT Bombay)
DeepTextile: Open Dataset for Textile Recycling via Machine Learning and Near Infrared Spectroscopy
Danika Gupta (The Harker Upper School)*
Towards AI-Driven Adaptive Lab Evolution: Fine Tuning Protein Language Models for Mutational Effects
Silba Dowell (University of Wyoming)*; Shivanand Sheshappanavar (University of Wyoming)
Beyond Vision: Language-Guided Unsupervised Learning for Robust Semantic Segmentation Across Domains
Chang Liu (University of Waterloo)*
TLPath: Leveraging the Digital Pathology Foundational model for Telomere Length Prediction
Anamika Yadav (Sanford Burnham Prebys Medical Discovery Institute)*; Kyle Alvarez (Sanford Burnham Prebys Medical Discovery Institute ); Akanimoh Adeleye (Sanford Burnham Prebys Medical Discovery Institute ); Sanju Sinha (Sanford Burnham Prebys Medical Discovery Institute )
Shortcut Learning Susceptibility in Vision Classifiers
Pirzada Suhail (IIT Bombay)*; Vrinda Goel (IIT Bombay); Amit Sethi (IIT Bombay)
Lighting the Way: NFL-BA for Robust Endoscopic SLAM
Andrea Dunn Beltran (UNC)*; Roni Sengupta (UNC)
Hyperbolic Safety-Aware Vision-Language Models
Tobia Poppi (University of Modena and Reggio Emilia); Tejaswi Kasarla (University of Amsterdam)*; Pascal Mettes (University of Amsterdam); Lorenzo Baraldi (University of Modena and Reggio Emilia); Rita Cucchiara (University of Modena and Reggio Emilia)