玉木・丁研究室 - 論文紹介

論文紹介

不定期でオンライン論文読み会を行っています．詳しくはconnpassのページへ

2026/06/18

LLMの紹介：Gemma 4 12B: Encoder-Free Unified Multimodal Model for Local ExecutionOlivier Lacombe, Gus Martins, "Introducing Gemma 4 12B: a unified, encoder-free multimodal model", https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12b/ Google Developers Blog, "Gemma 4 12B: The Developer Guide", https://developers.googleblog.com/gemma-4-12b-the-developer-guide/ Maarten Grootendorst, "A Visual Guide to Gemma 4 12B", https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-gemma-4-12b - Download as a PDF or view online for free

論文紹介：SoccerMaster, RoadTones, SVAgentから見る映像理解と言語化Zhongyu Yang, Zuhao Yang, Shuo Zhan, Tan Yue, Wei Pang, Yingfang Yuan, "SVAgent: Storyline-guided Long Video Understanding via Cross-Modal Multi-Agent Collaboration", CVPR2026 https://openaccess.thecvf.com/content/CVPR2026/html/Yang_SVAgent_Storyline-guided_Long_Video_Understanding_via_Cross-Modal_Multi-Agent_Collaboration_CVPR_2026_paper.html Haolin Yang, Jiayuan Rao, Haoning Wu, Weidi Xie, "SoccerMaster: A Vision Foundation Model for Soccer Understanding", CVPR2026 https://openaccess.thecvf.com/content/CVPR2026/html/Yang_SoccerMaster_A_Vision_Foundation_Model_for_Soccer_Understanding_CVPR_2026_paper.html Chirag Parikh, Siddhi Pravin Lipare, Ravi Kiran Sarvadevabhatla, "RoadTones: Tone Controllable Text Generation from Road Event Videos", CVPRF2026 https://openaccess.thecvf.com/content/CVPR2026F/html/Parikh_RoadTones_Tone_Controllable_Text_Generation_from_Road_Event_Videos_CVPRF_2026_paper.html - Download as a PDF or view online for free

2026/06/04

MLLMにおけるVision Encoderの重要性と設計方針：視覚的弱点・複数Encoder統合・最適Encoder選択（論文紹介）Shengbang Tong, Zhuang Liu, Yuexiang Zhai, Yi Ma, Yann LeCun, Saining Xie, "Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs", CVPR2024 https://openaccess.thecvf.com/content/CVPR2024/html/Tong_Eyes_Wide_Shut_Exploring_the_Visual_Shortcomings_of_Multimodal_LLMs_CVPR_2024_paper.html Mozhgan Nasr Azadani, James Riddell, Sean Sedwards, Krzysztof Czarnecki, "Rethinking the Mixture of Vision Encoders Paradigm for Enhanced Visual Understanding in Multimodal LLMs", TMLR2026 https://openreview.net/forum?id=tgnTVmRybs Muyang Li, Yucheng Liu, Jianbo Ma, Elliot Osborne, Bo Han, Tongliang Liu, "Rethinking Model Selection in VLM Through the Lens of Gromov-Wasserstein Distance", CVPR2026 https://openaccess.thecvf.com/content/CVPR2026/html/Li_Rethinking_Model_Selection_in_VLM_Through_the_Lens_of_Gromov-Wasserstein_CVPR_2026_paper.html - Download as a PDF or view online for free

論文紹介：Learning Streaming Video Representation via Multitask TrainingYibin Yan, Jilan Xu, Shangzhe Di, Yikun Liu, Yudi Shi, Qirui Chen, Zeqian Li, Yifei Huang, Weidi Xie, "Learning Streaming Video Representation via Multitask Training", ICCV2025 https://openaccess.thecvf.com/content/ICCV2025/html/Yan_Learning_Streaming_Video_Representation_via_Multitask_Training_ICCV_2025_paper.html - Download as a PDF or view online for free

2026/05/21

論文紹介：Precise Event Spotting in Sports Videos: Solving Long-Range Dependency and Class ImbalanceThis presentation introduces a novel method combining spatio-temporal refinement and Soft-IC loss to improve precise event spotting in sports videos, tackling long-range dependencies and class imbalance for superior accuracy across multiple datasets. - Download as a PDF or view online for free

2026/05/07

論文紹介：Subspace Geometry Governs Catastrophic Forgetting in Low-Rank Adaptation, Revisiting Weight Regularization for Low-Rank Continual Learning, Task-Driven Subspace Decomposition for Knowledge Sharing and Isolation in LoRA-based Continual LearningBrady Steele, "Subspace Geometry Governs Catastrophic Forgetting in Low-Rank Adaptation", arXiv2026 https://arxiv.org/abs/2603.02224 Yaoyue Zheng, Yin Zhang, Joost van de Weijer, Gido M van de Ven, Shaoyi Du, Xuetao Zhang, Zhiqiang Tian, "Revisiting Weight Regularization for Low-Rank Continual Learning", ICLR2026 https://openreview.net/forum?id=pZj2DhfaVD Lingfeng He, De Cheng, Huaijie Wang, Xi Yang, Nannan Wang, Xinbo Gao , "Task-Driven Subspace Decomposition for Knowledge Sharing and Isolation in LoRA-based Continual Learning", arXiv2026 https://arxiv.org/abs/2603.00191v1 - Download as a PDF or view online for free

論文紹介：TTOM Test-Time Optimization and Memorization for Compositional Video GenerationLeigang Qu, Ziyang Wang, Na Zheng, Wenjie Wang, Liqiang Nie, Tat-Seng Chua，"TTOM Test-Time Optimization and Memorization for Compositional Video Generation"，ICLR2026 https://openreview.net/forum?id=wqCwcTZsrv - Download as a PDF or view online for free

2026/04/23

論文紹介：Action Detail Matters: Refining Video Recognition with Local Action Queries, Generating Action-conditioned Prompts for Open-vocabulary Video Action RecognitionMengmeng Wang, Zeyi Huang, Xiangjie Kong, Guojiang Shen, Guang Dai, Jingdong Wang, Yong Liu, "Action Detail Matters: Refining Video Recognition with Local Action Queries", CVPR2025 https://openaccess.thecvf.com/content/CVPR2025/html/Wang_Action_Detail_Matters_Refining_Video_Recognition_with_Local_Action_Queries_CVPR_2025_paper.html Chengyou Jia, Minnan Luo, Xiaojun Chang, Zhuohang Dang, Mingfei Han, Mengmeng Wang, Guang Dai, Sizhe Dang, Jingdong Wang, "Generating Action-conditioned Prompts for Open-vocabulary Video Action Recognition", ACM MM 2024 https://dl.acm.org/doi/10.1145/3664647.3680690 - Download as a PDF or view online for free

論文紹介：Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE, OMoE: Diversifying Mixture of Low-Rank Adaptation by Orthogonal FinetuningZeren Chen, Ziqin Wang, Zhen Wang, Huayang Liu, Zhenfei Yin, Si Liu, Lu Sheng, Wanli Ouyang, Jing Shao, "Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE", ICLR 2024 https://openreview.net/forum?id=rTDyN8yajn Jinyuan Feng, Zhiqiang Pu, Tianyi Hu, Dongmin Li, Xiaolin Ai, Huimu Wang, "OMoE: Diversifying Mixture of Low-Rank Adaptation by Orthogonal Finetuning", ECAI 2025 https://arxiv.org/abs/2501.10062v2 - Download as a PDF or view online for free

2026/04/09

論文紹介："RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives", "VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection"Chirag Parikh, Deepti Rawat, Rakshitha R. T., Tathagata Ghosh, Ravi Kiran Sarvadevabhatla, "RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives", CVPR2025 https://openaccess.thecvf.com/content/CVPR2025/html/Parikh_RoadSocial_A_Diverse_VideoQA_Dataset_and_Benchmark_for_Road_Event_CVPR_2025_paper.html Songhao Han, Wei Huang, Hairong Shi, Le Zhuo, Xiu Su, Shifeng Zhang, Xu Zhou, Xiaojuan Qi, Yue Liao, Si Liu, "VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection", CVPR2025 https://openaccess.thecvf.com/content/CVPR2025/html/Han_VideoEspresso_A_Large-Scale_Chain-of-Thought_Dataset_for_Fine-Grained_Video_Reasoning_via_CVPR_2025_paper.html - Download as a PDF or view online for free

論文紹介："Video2LoRA: Unified Semantic-Controlled Video Generation via Per-Reference-Video LoRA", "Zero-Shot Dynamic Concept Personalization with Grid-Based LoRA"Zexi Wu, Baolu Li, Jing Dai, Yiming Zhang, Yue Ma, Qinghe Wang, Xu Jia, Hongming Xu, "Video2LoRA: Unified Semantic-Controlled Video Generation via Per-Reference-Video LoRA", arXiv2026 https://arxiv.org/abs/2603.08210 Rameen Abdal, Or Patashnik, Ekaterina Deyneka, Hao Chen, Aliaksandr Siarohin, Sergey Tulyakov, Daniel Cohen-Or, Kfir Aberman, "Zero-Shot Dynamic Concept Personalization with Grid-Based LoRA", SIGGRAPH Asia 2025 https://dl.acm.org/doi/10.1145/3757377.3763987 - Download as a PDF or view online for free

2025/12/6

ICCV2025参加報告_採択されやすいワークショップの選び方ICCV2025workshopに参加したときの体験談を踏まえた採択されやすいワークショップの選び方についての紹介．

ICCV2025論文まとめ: VLMにおけるトークン削減 (KVTP, STTM, METEOR)VLMのトークン削減論文まとめ・Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing ・Multi-Gra…

2025/11/25

論文紹介：HiLoRA: Adaptive Hierarchical LoRA Routing for Training-Free Domain GeneralizationZiyi Han, Huanyu Wang, Zeyu Zhang, Xiangxiang Dai, Xutong Liu, John C.S. Lui, "HiLoRA: Adaptive Hierarchical LoRA Routing for Training-Free Domain Generalization", arXiv2025 https://arxiv.org/abs/2510.12266 - Download as a PDF or view online for free

論文紹介：DiffusionRet: Generative Text-Video Retrieval with Diffusion ModelPeng Jin, Hao Li, Zesen Cheng, Kehan Li, Xiangyang Ji, Chang Liu, Li Yuan, Jie Chen, "DiffusionRet: Generative Text-Video Retrieval with Diffusion Model", ICCV2023 https://openaccess.thecvf.com/content/ICCV2023/html/Jin_DiffusionRet_Generative_Text-Video_Retrieval_with_Diffusion_Model_ICCV_2023_paper.html - Download as a PDF or view online for free

論文紹介：MotionMatcher: Cinematic Motion Customizationof Text-to-Video Diffusion Models via Motion Feature MatchingYen-Siang Wu, Chi-Pin Huang, Fu-En Yang, Yu-Chiang Frank Wang, "MotionMatcher: Cinematic Motion Customizationof Text-to-Video Diffusion Models via Motion Feature Matching", ICCV2025 Workshop https://openaccess.thecvf.com/content/ICCV2025W/P13N/html/Wu_MotionMatcher_Cinematic_Motion_Customization_of_Text-to-Video_Diffusion_Models_via_Motion_ICCVW_2025_paper.html - Download as a PDF or view online for free

2025/11/13

論文紹介：InternVideo2: Scaling Foundation Models for Multimodal Video UnderstandingYi Wang, Kunchang Li, Xinhao Li, Jiashuo Yu, Yinan He, Guo Chen, Baoqi Pei, Rongkun Zheng, Jilan Xu, Zun Wang, Yansong Shi, Tianxiang Jiang, SongZe Li, hongjie Zhang, Yifei Huang, Yu Qiao, Yali Wang, Limin Wang, "InternVideo2: Scaling Foundation Models for Multimodal Video Understanding", ECCV2024 https://eccv.ecva.net/virtual/2024/poster/1476 - Download as a PDF or view online for free

2025/10/31

論文紹介："Reflexion: language agents with verbal reinforcement learning", "MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding"Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, Shunyu Yao, "Reflexion: language agents with verbal reinforcement learning", NeurIPS2023 https://proceedings.neurips.cc/paper_files/paper/2023/hash/1b44b878bb782e6954cd888628510e90-Abstract-Conference.html Bo He, Hengduo Li, Young Kyun Jang, Menglin Jia, Xuefei Cao, Ashish Shah, Abhinav Shrivastava, Ser-Nam Lim, "MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding", CVPR2024 https://openaccess.thecvf.com/content/CVPR2024/html/He_MA-LMM_Memory-Augmented_Large_Multimodal_Model_for_Long-Term_Video_Understanding_CVPR_2024_paper.html - Download as a PDF or view online for free

論文紹介："MM-Tracker: Motion Mamba for UAV-platform Multiple Object Tracking", "MambaTrack: A Simple Baseline for Multiple Object Tracking with State Space Model"Mufeng Yao, Jinlong Peng, Qingdong He, Bo Peng, Hao Chen, Mingmin Chi, Chao Liu, Jon Atli Benediktsson, "MM-Tracker: Motion Mamba for UAV-platform Multiple Object Tracking", AAAI2025 https://ojs.aaai.org/index.php/AAAI/article/view/33019 Changcheng Xiao, Qiong Cao, Zhigang Luo, Long Lan, "MambaTrack: A Simple Baseline for Multiple Object Tracking with State Space Model", arXiv2024 https://arxiv.org/abs/2408.09178 - Download as a PDF or view online for free

2025/10/9

論文紹介：Simultaneous Detection and Interaction Reasoning for Object-Centric Action RecognitionXunsong Li, Pengzhan Sun, Yangcen Liu, Lixin Duan, Wen Li, "Simultaneous Detection and Interaction Reasoning for Object-Centric Action Recognition ", TMM2025 https://ieeexplore.ieee.org/document/10891539 - Download as a PDF or view online for free

論文紹介： "Locality-Aware Zero-Shot Human-Object Interaction Detection" "Disentangled Pre-training for Human-Object Interaction Detection" "Discovering Syntactic Interaction Clues for Human-Object Interaction Detection"Sanghyun Kim, Deunsol Jung, Minsu Cho, "Locality-Aware Zero-Shot Human-Object Interaction Detection", CVPR2025 https://openaccess.thecvf.com/content/CVPR2025/html/Kim_Locality-Aware_Zero-Shot_Human-Object_Interaction_Detection_CVPR_2025_paper.html Zhuolong Li, Xingao Li, Changxing Ding, Xiangmin Xu, "Disentangled Pre-training for Human-Object Interaction Detection", CVPR2024 https://openaccess.thecvf.com/content/CVPR2024/html/Li_Disentangled_Pre-training_for_Human-Object_Interaction_Detection_CVPR_2024_paper.html Jinguo Luo, Weihong Ren, Weibo Jiang, Xi'ai Chen, Qiang Wang, Zhi Han, Honghai Liu, "Discovering Syntactic Interaction Clues for Human-Object Interaction Detection", CVPR2024 https://openaccess.thecvf.com/content/CVPR2024/html/Luo_Discovering_Syntactic_Interaction_Clues_for_Human-Object_Interaction_Detection_CVPR_2024_paper.html - Download as a PDF or view online for free

2025/9/25

論文紹介："RAt: Injecting Implicit Bias for Text-To-Image Prompt Refinement Models", "Measuring What Matters: Evaluating Ensemble LLMs with Label Refinement in Inductive Coding", "Dynamic Label Name Refinement for Few-Shot Dialogue Intent Classification"Ziyi Kou, Shichao Pei, Meng Jiang, Xiangliang Zhang, "RAt: Injecting Implicit Bias for Text-To-Image Prompt Refinement Models", EMNLP2024 https://aclanthology.org/2024.emnlp-main.1144/ Angelina Parfenova, Jürgen Pfeffer, "Measuring What Matters: Evaluating Ensemble LLMs with Label Refinement in Inductive Coding", EMNLP2025 https://aclanthology.org/2025.findings-acl.563/ Gyutae Park, Ingeol Baek, Byeongjeong Kim, Joongbo Shin, Hwanhee Lee, "Dynamic Label Name Refinement for Few-Shot Dialogue Intent Classification", EMNLP2025 https://aclanthology.org/2025.acl-short.3/ - Download as a PDF or view online for free

論文紹介："Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing", KVTP, METEOR, STTMYudong Liu, Jingwei Sun, Yueqian Lin, Jingyang Zhang, Ming Yin, Qinsi Wang, Jianyi Zhang, Hai Li, Yiran Chen, "Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing", ICCV2025 https://www.arxiv.org/abs/2503.10742 Yuchen Liu, Yaoming Wang, Bowen Shi, Xiaopeng Zhang, Wenrui Dai, Chenglin Li, Hongkai Xiong, Qi Tian, "METEOR: Multi-Encoder Collaborative Token Pruning for Efficient Vision Language Models", ICCV2025 https://arxiv.org/abs/2507.20842 Jeongseok Hyun, Sukjun Hwang, Su Ho Han, Taeoh Kim, Inwoong Lee, Dongyoon Wee, Joon-Young Lee, Seon Joo Kim, Minho Shim, "Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs", ICCV2025 https://arxiv.org/abs/2507.07990 - Download as a PDF or view online for free

2025/9/11

論文紹介：SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer VideosAdrien Deliege, Anthony Cioppa, Silvio Giancola, Meisam J. Seikavandi, Jacob V. Dueholm, Kamal Nasrollahi, Bernard Ghanem, Thomas B. Moeslund, Marc Van Droogenbroeck, "SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer Videos", CVPR2021W https://openaccess.thecvf.com/content/CVPR2021W/CVSports/html/Deliege_SoccerNet-v2_A_Dataset_and_Benchmarks_for_Holistic_Understanding_of_Broadcast_CVPRW_2021_paper.html - Download as a PDF or view online for free

論文紹介：LoRA in LoRA: Towards Parameter-Efficient Architecture Expansion for Continual Visual Instruction TuningChang Che, Ziqi Wang, Pengwan Yang, Qi Wang, Hui Ma, Zenglin Shi, "LoRA in LoRA: Towards Parameter-Efficient Architecture Expansion for Continual Visual Instruction Tuning", arXiv2025 https://arxiv.org/abs/2508.06202 - Download as a PDF or view online for free

2025/8/9

CVPRW2025_髙間_現地参加報告

2025/7/10

CVPR2025論文紹介：Unboxed

CVPR2025論文紹介：OVO-Bench

2025/6/26

論文紹介：Segment Anything, SAM2: Segment Anything in Images and VideosAlexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick, "Segment Anything", ICCV2023 https://openaccess.thecvf.com/content/ICCV2023/html/Kirillov_Segment_Anything_ICCV_2023_paper.html Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Dollár, Christoph Feichtenhofer, "SAM2: Segment Anything in Images and Videos", arXiv2024 https://arxiv.org/abs/2408.00714 - Download as a PDF or view online for free

論文紹介：HOTR: End-to-End Human-Object Interaction Detection With Transformers, Human-Object Interaction Detection via Disentangled Transformer, QPICBumsoo Kim, Junhyun Lee, Jaewoo Kang, Eun-Sol Kim, Hyunwoo J. Kim, "HOTR: End-to-End Human-Object Interaction Detection With Transformers", CVPR2021 https://openaccess.thecvf.com/content/CVPR2021/html/Kim_HOTR_End-to-End_Human-Object_Interaction_Detection_With_Transformers_CVPR_2021_paper.html Desen Zhou, Zhichao Liu, Jian Wang, Leshan Wang, Tao Hu, Errui Ding, Jingdong Wang, "Human-Object Interaction Detection via Disentangled Transformer", CVPR2022 https://openaccess.thecvf.com/content/CVPR2022/html/Zhou_Human-Object_Interaction_Detection_via_Disentangled_Transformer_CVPR_2022_paper.html Masato Tamura, Hiroki Ohashi, Tomoaki Yoshinaga, "QPIC: Query-Based Pairwise Human-Object Interaction Detection With Image-Wide Contextual Information", CVPR2021 https://openaccess.thecvf.com/content/CVPR2021/html/Tamura_QPIC_Query-Based_Pairwise_Human-Object_Interaction_Detection_With_Image-Wide_Contextual_Information_CVPR_2021_paper.html - Download as a PDF or view online for free

2025/6/12

論文紹介：Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks, Unbiased Video Scene Graph Generation via Visual and Semantic Dual Debiasing　他Nina Shvetsova, Arsha Nagrani, Bernt Schiele, Hilde Kuehne, Christian Rupprecht, "Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks", CVPR2025 https://openaccess.thecvf.com/content/CVPR2025/html/Shvetsova_Unbiasing_through_Textual_Descriptions_Mitigating_Representation_Bias_in_Video_Benchmarks_CVPR_2025_paper.html Yanjun Li, Zhaoyang Li, Honghui Chen, Lizhi Xu "Unbiased Video Scene Graph Generation via Visual and Semantic Dual Debiasing", CVPR2025 https://openaccess.thecvf.com/content/CVPR2025/html/Li_Unbiased_Video_Scene_Graph_Generation_via_Visual_and_Semantic_Dual_CVPR_2025_paper.html Haosen Yang, Deng Huang, Bin Wen, Jiannan Wu, Hongxun Yao, Yi Jiang, Xiatian Zhu, Zehuan Yuan, "MotionMAE: Self-Supervised Video Representation Learning with Motion-Aware Masked Autoencoders", BMVC2024 https://bmvc2024.org/proceedings/499/ - Download as a PDF or view online for free

論文紹介：AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts, Toward Human Readable Prompt Tuning: Kubrick’s The Shining is a good movie, and a good prompt too?　他Taylor Shin, Yasaman Razeghi, Robert L. Logan IV, Eric Wallace, Sameer Singh, "AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts", EMNLP2020 https://aclanthology.org/2020.emnlp-main.346/ Weijia Shi, Xiaochuang Han, Hila Gonen, Ari Holtzman, Yulia Tsvetkov, Luke Zettlemoyer, "Toward Human Readable Prompt Tuning: Kubrick’s The Shining is a good movie, and a good prompt too?", EMNLP2023 https://aclanthology.org/2023.findings-emnlp.733/ Yuxin Wen, Neel Jain, John Kirchenbauer, Micah Goldblum, Jonas Geiping, Tom Goldstein, "Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery", NeurIPS 2023 https://proceedings.neurips.cc/paper_files/paper/2023/hash/a00548031e4647b13042c97c922fadf1-Abstract-Conference.html - Download as a PDF or view online for free

2025/5/22

論文紹介：「mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models」「SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts Commentaries」論文紹介：「mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models」「SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts Commentaries」 - Download as a PDF or view online for free

論文紹介：「Amodal Completion via Progressive Mixed Context Diffusion」「Amodal Instance Segmentation with Diffusion Shape Prior Estimation」論文紹介：「Amodal Completion via Progressive Mixed Context Diffusion」「Amodal Instance Segmentation with Diffusion Shape Prior Estimation」 - Download as a PDF or view online for free

2025/5/8

論文紹介："Visual Genome:Connecting Language and VisionUsing Crowdsourced Dense Image Annotations" "Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs"論文紹介："Visual Genome:Connecting Language and VisionUsing Crowdsourced Dense Image Annotations" "Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs" - Download as a PDF or view online for free

論文紹介："InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" "Adaptive Plasticity Improvement for Continual Learning"論文紹介："InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" "Adaptive Plasticity Improvement for Continual Learning" - Download as a PDF or view online for free

2025/4/24

論文紹介：PitcherNet: Powering the Moneyball Evolution in Baseball Video Analytics論文紹介：PitcherNet: Powering the Moneyball Evolution in Baseball Video Analytics - Download as a PDF or view online for free

論文紹介：What, when, and where? Self-Supervised Spatio-Temporal Groundingin Untrimmed Multi-Action Videosfrom Narrated Instructions論文紹介：What, when, and where? Self-Supervised Spatio-Temporal Groundingin Untrimmed Multi-Action Videosfrom Narrated Instructions - Download as a PDF or view online for free

2025/4/8

論文紹介：ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos論文紹介：ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos - Download as a PDF or view online for free

論文紹介：Make Pixels Dance: High-Dynamic Video Generation論文紹介：Make Pixels Dance: High-Dynamic Video Generation - Download as a PDF or view online for free

2024/11/19

論文紹介：On Feature Normalization and Data Augmentation論文紹介：On Feature Normalization and Data Augmentation - Download as a PDF or view online for free

論文紹介：T-DEED: Temporal-Discriminability Enhancer Encoder-Decoder for Precise Event Spotting in Sports Videos論文紹介：T-DEED: Temporal-Discriminability Enhancer Encoder-Decoder for Precise Event Spotting in Sports Videos - Download as a PDF or view online for free

2024/11/05

論文紹介：MS-DETR: Efficient DETR Training with Mixed Supervision論文紹介：MS-DETR: Efficient DETR Training with Mixed Supervision - Download as a PDF or view online for free

論文紹介：CLIFF: Continual Latent Diffusion for Open-Vocabulary Object Detection論文紹介：CLIFF: Continual Latent Diffusion for Open-Vocabulary Object Detection - Download as a PDF or view online for free

2024/10/22

論文紹介：2D Pose-guided Complete Silhouette Estimation of Human Body in Occlusion論文紹介：2D Pose-guided Complete Silhouette Estimation of Human Body in Occlusion - Download as a PDF or view online for free

論文紹介：Synergy of Sight and Semantics: Visual Intention Understanding with CLIP論文紹介：Synergy of Sight and Semantics: Visual Intention Understanding with CLIP - Download as a PDF or view online for free

2024/10/08

論文紹介：DEVIAS: Learning Disentangled Video Representations of Action and Scene論文紹介：DEVIAS: Learning Disentangled Video Representations of Action and Scene - Download as a PDF or view online for free

論文紹介：Multi-class Video Co-segmentation with a Generative Multi-video Model論文紹介：Multi-class Video Co-segmentation with a Generative Multi-video Model - Download as a PDF or view online for free

2024/9/19

論文紹介：QLoRA: Efficient Finetuning of Quantized LLMs論文紹介：QLoRA: Efficient Finetuning of Quantized LLMs - Download as a PDF or view online for free

論文紹介：TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval論文紹介：TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval - Download as a PDF or view online for free

2024/9/12

論文紹介：Is Appearance Free Action Recognition Possible論文紹介：Is Appearance Free Action Recognition Possible - Download as a PDF or view online for free

論文紹介：DiffusionTrack: Point Set Diffusion Model for Visual Object Tracking論文紹介：DiffusionTrack: Point Set Diffusion Model for Visual Object Tracking - Download as a PDF or view online for free

2024/8/01

論文紹介：Image amodal completion: A survey (CVIU)論文紹介：Image amodal completion: A survey (CVIU) - Download as a PDF or view online for free

論文紹介：MaPLe: Multi-Modal Prompt Learning (CVPR)論文紹介：MaPLe: Multi-Modal Prompt Learning (CVPR) - Download as a PDF or view online for free

論文紹介：AutoSoccerPose: Automated 3D Posture Analysis of Soccer Shot Movements論文紹介：AutoSoccerPose: Automated 3D Posture Analysis of Soccer Shot Movements - Download as a PDF or view online for free

2024/7/17

論文紹介：Can I Trust Your Answer? Visually Grounded Video Question Answering論文紹介：Can I Trust Your Answer? Visually Grounded Video Question Answering - Download as a PDF or view online for free

論文紹介：Rugby Scene Classification Enhanced by Vision Language Model論文紹介：Rugby Scene Classification Enhanced by Vision Language Model - Download as a PDF or view online for free

論文紹介：Hierarchical NeuroSymbolic Approach for Comprehensive and Explainable Action Quality Assessment論文紹介：Hierarchical NeuroSymbolic Approach for Comprehensive and Explainable Action Quality Assessment - Download as a PDF or view online for free

2024/7/3

論文紹介：Task-aligned Part-aware Panoptic Segmentation through Joint Object-Part Representations論文紹介：Task-aligned Part-aware Panoptic Segmentation through Joint Object-Part Representations - Download as a PDF or view online for free

論文紹介：BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos論文紹介：BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos - Download as a PDF or view online for free

2024/6/19

論文紹介：A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models論文紹介：A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models - Download as a PDF or view online for free

論文紹介：Coarse-to-Fine Amodal Segmentation with Shape Prior論文紹介：Coarse-to-Fine Amodal Segmentation with Shape Prior - Download as a PDF or view online for free

論文紹介：Learning from One Continuous Video Stream論文紹介：Learning from One Continuous Video Stream - Download as a PDF or view online for free

2024/6/6

論文紹介：Deep Learning-Based Human Pose Estimation: A Survey論文紹介：Deep Learning-Based Human Pose Estimation: A Survey - Download as a PDF or view online for free

論文紹介：A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future論文紹介：A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future - Download as a PDF or view online for free

2024/5/30

論文紹介：Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation論文紹介：Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation - Download as a PDF or view online for free

論文紹介：Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers論文紹介：Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers - Download as a PDF or view online for free

論文紹介：When Visual Prompt Tuning Meets Source-Free Domain Adaptive Semantic Segmentation論文紹介：When Visual Prompt Tuning Meets Source-Free Domain Adaptive Semantic Segmentation - Download as a PDF or view online for free

2024/5/16

論文紹介：Deep Occlusion-Aware Instance Segmentation With Overlapping BiLayers論文紹介：Deep Occlusion-Aware Instance Segmentation With Overlapping BiLayers - Download as a PDF or view online for free

論文紹介：ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation論文紹介：ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation - Download as a PDF or view online for free

論文紹介：ArcFace: Additive Angular Margin Loss for Deep Face Recognition論文紹介：ArcFace: Additive Angular Margin Loss for Deep Face Recognition - Download as a PDF or view online for free

2024/5/02

論文紹介：Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding論文紹介：Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding - Download as a PDF or view online for free

論文紹介：Selective Structured State-Spaces for Long-Form Video Understanding論文紹介：Selective Structured State-Spaces for Long-Form Video Understanding - Download as a PDF or view online for free

2024/4/18

論文紹介：Automated Classification of Model Errors on ImageNet論文紹介：Automated Classification of Model Errors on ImageNet - Download as a PDF or view online for free

論文紹介：Semantic segmentation using Vision Transformers: A survey論文紹介：Semantic segmentation using Vision Transformers: A survey - Download as a PDF or view online for free

論文紹介：Content-Aware Token Sharing for Efficient Semantic Segmentation With Vision Transformers論文紹介：Content-Aware Token Sharing for Efficient Semantic Segmentation With Vision Transformers - Download as a PDF or view online for free

論文紹介：Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators論文紹介：Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators - Download as a PDF or view online for free

2024/3/25

論文紹介：MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Action Recognition論文紹介：MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Action Recognition - Download as a PDF or view online for free

論文紹介：Tracking Anything with Decoupled Video Segmentation論文紹介：Tracking Anything with Decoupled Video Segmentation - Download as a PDF or view online for free

論文紹介：MOSE: A New Dataset for Video Object Segmentation in Complex Scenes論文紹介：MOSE: A New Dataset for Video Object Segmentation in Complex Scenes - Download as a PDF or view online for free

2023/1/11

論文紹介：Real-Time Evaluation in Online Continual Learning: A New Hope論文紹介：Real-Time Evaluation in Online Continual Learning: A New Hope - Download as a PDF or view online for free

論文紹介：Multitask Vision-Language Prompt Tuning論文紹介：Multitask Vision-Language Prompt Tuning - Download as a PDF or view online for free

論文紹介：PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation論文紹介：PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation - Download as a PDF or view online for free

論文紹介：MovieCLIP: Visual Scene Recognition in Movies論文紹介：MovieCLIP: Visual Scene Recognition in Movies - Download as a PDF or view online for free

2023/11/30

論文紹介：Efficient Video Action Detection with Token Dropout and Context Refinement論文紹介：Efficient Video Action Detection with Token Dropout and Context Refinement - Download as a PDF or view online for free

論文紹介：Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Localization論文紹介：Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Localization - Download as a PDF or view online for free

論文紹介：Vision Transformer Adapter for Dense Predictions論文紹介：Vision Transformer Adapter for Dense Predictions - Download as a PDF or view online for free

論文紹介：Revealing the unseen: Benchmarking video action recognition under occlusion論文紹介：Revealing the unseen: Benchmarking video action recognition under occlusion - Download as a PDF or view online for free

論文紹介：Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving論文紹介：Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving - Download as a PDF or view online for free

論文紹介：Spatio-Temporal Action Detection Under Large Motion論文紹介：Spatio-Temporal Action Detection Under Large Motion - Download as a PDF or view online for free

論文紹介：MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition論文紹介：MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition - Download as a PDF or view online for free

論文紹介：Discovering Universal Geometry in Embeddings with ICA論文紹介：Discovering Universal Geometry in Embeddings with ICA - Download as a PDF or view online for free

2023/10/30

論文紹介：Masked Vision and Language Modeling for Multi-modal Representation Learning論文紹介：Masked Vision and Language Modeling for Multi-modal Representation Learning - Download as a PDF or view online for free

論文紹介：Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval論文紹介：Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval - Download as a PDF or view online for free

論文紹介：Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning論文紹介：Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning - Download as a PDF or view online for free

論文紹介：ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models論文紹介：ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models - Download as a PDF or view online for free

論文紹介：Video Test-Time Adaptation for Action Recognition論文紹介：Video Test-Time Adaptation for Action Recognition - Download as a PDF or view online for free

論文紹介：Transferable Decoding with Visual Entities for Zero-Shot Image Captioning論文紹介：Transferable Decoding with Visual Entities for Zero-Shot Image Captioning - Download as a PDF or view online for free

2023/09/27

論文紹介：STMixer: A One-Stage Sparse Action Detector論文紹介：STMixer: A One-Stage Sparse Action Detector - Download as a PDF or view online for free

論文紹介：OneFormer: One Transformer To Rule Universal Image Segmentation論文紹介：OneFormer: One Transformer To Rule Universal Image Segmentation - Download as a PDF or view online for free

論文紹介：InternVideo: General Video Foundation Models via Generative and Discriminative Learning論文紹介：InternVideo: General Video Foundation Models via Generative and Discriminative Learning - Download as a PDF or view online for free

2023/06/30

論文紹介：Learning With Neighbor Consistency for Noisy LabelsLearning With Neighbor Consistency for Noisy Labels Ahmet Iscen, Jack Valmadre, Anurag Arnab, Cordelia Schmid, CVPR2022 橋口凌大（名工大） 2023/6/30 概要 nノイズの多いラベルからの学習...

論文紹介：Parameter-Efficient Transfer Learning for NLPParameter-Efficient Transfer Learning for NLP Neil Houlsby, Andrei Giurgiu, Stanisław Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona...

論文紹介：Temporal Sentence Grounding in Videos: A Survey and Future DirectionsTemporal Senctence Grounding in Videos: A Survey and Future Direction Hao Zhang, Aixin Sun, Wei Jing, and Joey Tianyi Zhou TPAMI 仁田智也（名工大）概要 nTemporal Senten...

2023/06/08

論文紹介：DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object DetectionDINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heu...

論文紹介：Temporal Action Segmentation From Timestamp SupervisionTemporal Action Segmentation From Timestamp Supervision Zhe Li, Yazan Abu Farha, Jurgen Gall CVPR2021 加藤樹（名工大玉木研） 2023/6/8 研究概要 nTemoral Action Segmentation (...

論文紹介：Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video LearningRethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning AJ Piergiovanni, Weicheng Kuo, Anelia Angelova arXiv2022 2023/6/8 ◼Vision Transfo...

論文紹介：End-to-End Spatio-Temporal Action Localisation with Video TransformersEnd-to-End Spatio-Temporal Action Localisation with Video Transformers Alexey Gritsenko, Xuehan Xiong, Josip Djologna, Mostafa Dehghani, Chen Sun, Mario Luci c...

論文紹介：Look for the Change: Learning Object States and State-Modifying Actions From Untrimmed Web VideosLook for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos Tomas Soucek, Jean-Baptiste Alayrac, Antoine Miech, Ivan Lapt...

論文紹介：Video Panoptic SegmentationVideo Panoptic Segmentation Dahun Kim, Sanghyun Woo, Joon-Young Lee, In So Kweon CVPR2020 水野翼（名工大玉木研） 2023/6/8 概要 n目的 • 画像領域におけるパノプティックセグメンテーションの概念をビデオ領域にも拡張...

論文紹介：Flamingo: a Visual Language Model for Few-Shot Learning🦩 Flamingo: a Visual Language Model for Few-Shot Learning Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc,...

論文紹介：LVIS: A Dataset for Large Vocabulary Instance SegmentationLVIS: A Dataset for Large Vocabulary Instance Segmentation Agrim Gupta, Piotr Dollar, Ross Girshick, CVPR2019 2023/06/08 ◼LVIS dataset • Instance segmentation...

論文紹介：VLP: A Survey on Vision-Language Pre-trainingVLP: A Survey on Vision- Language Pre-training Feilong Chen, Duzhen Zhang, Minglun Han, Xiuyi Chen, Jing Shi, Shuang Xu, Bo Xu, MIR 2023 福沢匠（名工大玉木研） 2023/6/8 ...

2023/05/11

論文紹介：End-to-End Object Detection with TransformersEnd-to-End Object Detection with Transformers 2023/5/11 ◼ DETR (DEtection TRansformer) ◼ ◼End-to-end • ◼ • NMS • Non-Maximum Suppression (NMS) • Bounding box...

論文紹介：Is Space-Time Attention All You Need for Video Understanding?Is Space-Time Attention All You Need for Video Understanding? Gedas Bertasius, Heng Wang, Lorenzo Torresani, ICML2021 2023/5/11 ◼Transformer : TimeSformer • •...

論文紹介：Zero-Shot Video Question Answering via Frozen Bidirectional Language ModelsZero-Shot Video Question Answering via Frozen Bidirectional Language Models Antoine Yang, Antoine Miech, Josef Sivic, Ivan Laptev, Cordelia Schmid, NeurIPS2022...

論文紹介：A Survey of Vision-Language Pre-Trained ModelsA Survey of Vision-Language Pre-Trained Models Yifan Du, Zikang Liu, Junyi Li, Wayne Xin Zhao, IJCAI2022 福沢匠（名工大玉木研） 2023/5/11 概要 nPre-Trained Models • 巨大なモデル...

論文紹介：Unsupervised Hierarchical Semantic Segmentation With Multiview Cosegmentation and Clustering TransformersUnsupervised Hierarchical Semantic Segmentation With Multiview Cosegmentation and Clustering Transformers Tsung-Wei Ke, Jyh-Jing Hwang, Yunhui Guo, Xudong Wang...

論文紹介：Temporal Action Segmentation: An Analysis of Modern TechniquesTemporal Action Segmentation: An Analysis of Modern Techniques Guodong Ding, Fadime Sener, and Angela Yao arXiv2022 加藤樹，神谷広大（名工大玉木研） 2023/5/11 Introduction nT...

論文紹介：The Cityscapes Dataset for Semantic Urban Scene UnderstandingThe Cityscapes Dataset for Semantic Urban Scene Understanding Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, ...

2023/04/06

論文紹介：Transformers in Action: Weakly Supervised Action SegmentationJohn Ridley, Huseyin Coskun, David Joseph Tan, Nassir Navab, Federico Tombari, "Transformers in Action: Weakly Supervised Action Segmentation" arXiv2022 https:…

論文紹介：Human Hands As Probes for Interactive Object UnderstandingMohit Goyal, Sahil Modi, Rishabh Goyal, Saurabh Gupta, "Human Hands As Probes for Interactive Object Understanding" CVPR2022 https://openaccess.thecvf.com/cont…

論文紹介：DramaQA: Character-Centered Video Story Understanding with Hiera…Seongho Choi, Kyoung-Woon On, Yu-Jung Heo, Ahjeong Seo, Youwon Jang, Minsu Lee, Byoung-Tak Zhang, "DramaQA: Character-Centered Video Story Understanding with H…

論文紹介：Rethinking Zero-shot Video Classification: End-to-end Training f…Biagio Brattoli, Joseph Tighe, Fedor Zhdanov, Pietro Perona, Krzysztof Chalupka, "Rethinking Zero-shot Video Classification: End-to-end Training for Realistic …

論文紹介：Omnivore: A Single Model for Many Visual ModalitiesRohit Girdhar, Mannat Singh, Nikhila Ravi, Laurens van der Maaten, Armand Joulin, Ishan Misra, "Omnivore: A Single Model for Many Visual Modalities" CVPR2022 h…

論文紹介：Beyond Short Clips: End-to-End Video-Level Learning With Collabo…Xitong Yang, Haoqi Fan, Lorenzo Torresani, Larry S. Davis, Heng Wang, "Beyond Short Clips: End-to-End Video-Level Learning With Collaborative Memories" CVPR202…

論文紹介：Panoptic-aware Image-to-Image TranslationLiyun Zhang, Photchara Ratsamee, Bowen Wang, Zhaojie Luo, Yuki Uranishi, Manabu Higashida, Haruo Takemura, "Panoptic-aware Image-to-Image Translation" WACV2023…

論文紹介：Multimodal Learning with Transformers: A SurveyPeng Xu, Xiatian Zhu, David A. Clifton, "Multimodal Learning with Transformers: A Survey" arXiv2022 https://arxiv.org/abs/2206.06488

2022/11/25

論文紹介：Learn2Augment: Learning to Composite Videos for Data Augmentatio…Shreyank N Gowda, Marcus Rohrbach, Frank Keller, Laura Sevilla-Lara, "Learn2Augment: Learning to Composite Videos for Data Augmentation in Action Recognition" …

論文紹介：Deep Mutual LearningYing Zhang, Tao Xiang, Timothy M. Hospedales, Huchuan Lu, "Deep Mutual Learning" CVPR2018 https://openaccess.thecvf.com/content_cvpr_2018/html/Zhang_Deep_Mutua…

論文紹介：Towards Robust Adaptive Object Detection Under Noisy AnnotationsXinyu Liu, Wuyang Li, Qiushi Yang, Baopu Li, Yixuan Yuan, "Towards Robust Adaptive Object Detection Under Noisy Annotations" CVPR2022 https://openaccess.thecvf…

論文紹介：TubeDETR: Spatio-Temporal Video Grounding With TransformersAntoine Yang, Antoine Miech, Josef Sivic, Ivan Laptev, Cordelia Schmid, "TubeDETR: Spatio-Temporal Video Grounding With Transformers" CVPR2022 https://openacce…

2022/11/11

文献紹介：PolyViT: Co-training Vision Transformers on Images, Videos and A…Valerii Likhosherstov, Anurag Arnab, Krzysztof Choromanski, Mario Lucic, Yi Tay, Adrian Weller, Mostafa Dehghani, "PolyViT: Co-training Vision Transformers on …

文献紹介：VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Und…Hu Xu, Gargi Ghosh, Po-Yao Huang, Dmytro Okhonko, Armen Aghajanyan, Florian Metze, Luke Zettlemoyer, Christoph Feichtenhofer, "VideoCLIP: Contrastive Pre-train…

文献紹介：Multi-dataset Training of Transformers for Robust Action Recogni…Junwei Liang, Enwei Zhang, Jun Zhang, Chunhua Shen, "Multi-dataset Training of Transformers for Robust Action Recognition" NeurIPS2022 https://arxiv.org/abs/22…

文献紹介：Length-Controllable Image CaptioningChaorui Deng, Ning Ding, Mingkui Tan, Qi Wu, "Length-Controllable Image Captioning" ECCV2020 https://www.ecva.net/papers/eccv_2020/papers_ECCV/html/2035_ECCV_2…

2022/10/28

文献紹介：Temporal Convolutional Networks for Action Segmentation and Dete…Colin Lea, Michael D. Flynn, Rene Vidal, Austin Reiter, Gregory D. Hager, "Temporal Convolutional Networks for Action Segmentation and Detection", CVPR2017 htt…

文献紹介：Unsupervised Domain Adaptation for Spatio-Temporal Action Locali…Nakul Agarwal, Yi-Ting Chen, Behzad Dariush and Ming-Hsuan Yang, "Unsupervised Domain Adaptation for Spatio-Temporal Action Localization", BMVC2020 https://www…

文献紹介：Toward Multimodal Image-to-Image TranslationJun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A. Efros, Oliver Wang, Eli Shechtman, "Toward Multimodal Image-to-Image Translation", NeurIPS…

文献紹介：Learning From Noisy Labels With Deep Neural Networks: A SurveyH. Song, M. Kim, D. Park, Y. Shin and J. -G. Lee, "Learning From Noisy Labels With Deep Neural Networks: A Survey", in IEEE Transactions on Neural Networks and…

2022/10/14

文献紹介：Elaborative Rehearsal for Zero-Shot Action RecognitionShizhe Chen, Dong Huang, "Elaborative Rehearsal for Zero-Shot Action Recognition", Proceedings of the IEEE/CVF International Conference on Computer Vision (ICC…

文献紹介：Temporal Alignment Networks for Long-Term VideoTengda Han, Weidi Xie, Andrew Zisserman, "Temporal Alignment Networks for Long-Term Video", Proceedings of the IEEE/CVF Conference on Computer Vision and Patte…

文献紹介：Multi-Task Learning for Dense Prediction Tasks: A SurveySimon Vandenhende, Stamatios Georgoulis, Wouter Van Gansbeke, Marc Proesmans, Dengxin Dai, Luc Van Gool, "Multi-Task Learning for Dense Prediction Tasks: A Sur…

文献紹介：Omnivore: A Single Model for Many Visual ModalitiesRohit Girdhar, Mannat Singh, Nikhila Ravi, Laurens van der Maaten, Armand Joulin, Ishan Misra, "Omnivore: A Single Model for Many Visual Modalities", Proceedin…

2022/06/17

Activity-Net Challenge 2021の紹介Activity-Net Challenge 2021の紹介 http://activity-net.org http://activity-net.org/challenges/2021/index.html http://activity-net.org/challenges/2022/index.html

文献紹介：A Survey of Deep Learning-Based Object DetectionLicheng Jiao, Fan Zhang, Fang Liu, Shuyuan Yang, Lingling Li, Zhixi Feng, Rong Qu, "A Survey of Deep Learning-Based Object Detection", IEEE Access, Vol.7, pp. …

文献紹介：Image Segmentation Using Deep Learning: A SurveyShervin Minaee, Yuri Boykov, Fatih Porikli, Antonio Plaza, Nasser Kehtarnavaz, Demetri Terzopoulos, Image Segmentation Using Deep Learning: A Survey, IEEE Tran…

文献紹介：Image-to-Image Translation: Methods and ApplicationsYingxue Pang, Jianxin Lin, Tao Qin, Zhibo Chen, Image-to-Image Translation: Methods and Applications, IEEE Transactions on Multimedia, doi: 10.1109/TMM.2021.31…

文献紹介：YOLO series：v1-v5, X, F, and YOWO20220617_You_Only_Look_Once_Series.pdf You Only Look Once: Unified, Real-Time Object Detection https://www.cv-foundation.org/openaccess/content_cvpr_2016/html/…

2022/04/15

文献紹介：EfficientDet: Scalable and Efficient Object DetectionMingxing Tan, Ruoming Pang, Quoc V. Le; EfficientDet: Scalable and Efficient Object Detection, Proceedings of the IEEE/CVF Conference on Computer Vision and Pa…

文献紹介：Learning Motion-Appearance Co-Attention for Zero-Shot Video Obje…Shu Yang, Lu Zhang, Jinqing Qi, Huchuan Lu, Shuo Wang, Xiaoxing Zhang; Learning Motion-Appearance Co-Attention for Zero-Shot Video Object Segmentation, Proceed…

文献紹介：Spatially-Adaptive Pixelwise Networks for Fast Image TranslationTamar Rott Shaham, Michael Gharbi, Richard Zhang, Eli Shechtman, Tomer Michaeli; Spatially-Adaptive Pixelwise Networks for Fast Image Translation, Proceedings …

文献紹介：You Only Look Once: Unified, Real-Time Object DetectionJoseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi; You Only Look Once: Unified, Real-Time Object Detection, Proceedings of the IEEE Conference on Comp…

文献紹介：Swin Transformer: Hierarchical Vision Transformer Using Shifted …Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo; Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows…

文献紹介：Video Description: A Survey of Methods, Datasets, and Evaluation…Nayyer Aafaq, Ajmal Mian, Wei Liu, Syed Zulqarnain Gilani, and Mubarak Shah. 2019. Video Description: A Survey of Methods, Datasets, and Evaluation Metrics. AC…

文献紹介：Simpler Is Better: Few-Shot Semantic Segmentation With Classifie…Zhihe Lu, Sen He, Xiatian Zhu, Li Zhang, Yi-Zhe Song, Tao Xiang; Simpler Is Better: Few-Shot Semantic Segmentation With Classifier Weight Transformer, Proceedi…

2021/12/16

文献紹介：Adversarial Cross-Domain Action Recognition with Co-AttentionBoxiao Pan, Zhangjie Cao, Ehsan Adeli, Juan Carlos Niebles, Adversarial Cross-Domain Action Recognition with Co-Attention, AAAI2020. https://doi.org/10.1609/aa…

文献紹介：Extreme Low-Resolution Activity Recognition Using a Super-Resolu…Mingzheng Hou, Song Liu, Jiliu Zhou, Yi Zhang, Ziliang Feng, Extreme Low-Resolution Activity Recognition Using a Super-Resolution-Oriented Generative Adversari…

文献紹介：2D or not 2D? Adaptive 3D Convolution Selection for Efficient Vi…Hengduo Li, Zuxuan Wu, Abhinav Shrivastava, Larry S. Davis; 2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition, Proceedings of the…

文献紹介：Rethinking Data Augmentation for Image Super-resolution: A Compr…Jaejun Yoo, Namhyuk Ahn, Kyung-Ah Sohn; Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Analysis and a New Strategy, Proceedings of th…

文献紹介：Token Shift Transformer for Video ClassificationHao Zhang, Yanbin Hao, Chong-Wah Ngo, Token Shift Transformer for Video Classification, ACM MM '21: Proceedings of the 29th ACM International Conference on Mul…

2021/12/3

文献紹介： Shuffle and Attend: Video Domain AdaptationJinwoo Choi, Gaurav Sharma, Samuel Schulter, Jia-Bin Huang, Shuffle and Attend: Video Domain Adaptation, ECCV2020. https://www.ecva.net/papers/eccv_2020/papers…

文献紹介：TinyVIRAT: Low-resolution Video Action RecognitionUgur Demir, Yogesh S Rawat, Mubarak Shah, TinyVIRAT: Low-resolution Video Action Recognition, ICPR2021, pp. 7387-7394 doi: 10.1109/ICPR48806.2021.9412541 https…

文献紹介：CutDepth: Edge-aware Data Augmentation in Depth EstimationCutDepth:Edge-aware Data Augmentation in Depth Estimation, arXiv:2107.07684 https://arxiv.org/abs/2107.07684

文献紹介：Deep Analysis of CNN-Based Spatio-Temporal Representations for A…Chun-Fu Richard Chen, Rameswar Panda, Kandan Ramakrishnan, Rogerio Feris, John Cohn, Aude Oliva, Quanfu Fan; Deep Analysis of CNN-Based Spatio-Temporal Represe…

文献紹介：Video Transformer NetworkDaniel Neimark, Omri Bar, Maya Zohar, Dotan Asselmann; Video Transformer Network, Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV…

2021/11/19

文献紹介：VideoMix: Rethinking Data Augmentation for Video ClassificationSangdoo Yun, Seong Joon Oh, Byeongho Heo, Dongyoon Han, Jinhyung Kim, VideoMix: Rethinking Data Augmentation for Video Classification, arXiv:2012.03457 https:/…

文献紹介：An Image is Worth 16x16 Words: Transformers for Image Recognitio…Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, …

文献紹介：Why do deep convolutional networks generalize so poorly to small…Aharon Azulay, Yair Weiss, Why do deep convolutional networks generalize so poorly to small image transformations?, JMLR 20(184):1−25, 2019. https://jmlr.org/p…

文献紹介：RESOUND: Towards Action Recognition without Representation BiasYingwei Li, Yi Li, Nuno Vasconcelos; RESOUND: Towards Action Recognition without Representation Bias, Proceedings of the European Conference on Computer Vision…

文献紹介：Tell Me Where to Look: Guided Attention Inference NetworkKunpeng Li, Ziyan Wu, Kuan-Chuan Peng, Jan Ernst, Yun Fu; Tell Me Where to Look: Guided Attention Inference Network, Proceedings of the IEEE Conference on Comp…

2021/11/05

文献紹介：Learnable Gated Temporal Shift Module for Free-form Video Inpain…Ya-Liang Chang, Zhe Yu Liu, Kuan-Ying Lee, Winston Hsu, Learnable Gated Temporal Shift Module for Free-form Video Inpainting, BMVC2019 DOI: https://dx.doi.org/…

文献紹介：Selective Feature Compression for Efficient Activity Recognition…Chunhui Liu, Xinyu Li, Hao Chen, Davide Modolo, Joseph Tighe; Selective Feature Compression for Efficient Activity Recognition Inference, Proceedings of the IE…

文献紹介：CutMix: Regularization Strategy to Train Strong Classifiers With…Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, Youngjoon Yoo; CutMix: Regularization Strategy to Train Strong Classifiers With Localizab…

文献紹介：Efficient Multi-Domain Learning by Covariance NormalizationYunsheng Li, Nuno Vasconcelos; Efficient Multi-Domain Learning by Covariance Normalization, Proceedings of the IEEE/CVF Conference on Computer Vision and Patte…

文献紹介：SegFormer: Simple and Efficient Design for Semantic Segmentation…Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo, SegFormer: Simple and Efficient Design for Semantic Segmentation with Transform…

2021/10/29

文献紹介：Simple Copy-Paste Is a Strong Data Augmentation Method for Insta…Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung-Yi Lin, Ekin D. Cubuk, Quoc V. Le, Barret Zoph; Simple Copy-Paste Is a Strong Data Augmentation Metho…

文献紹介：Understanding How Image Quality Affects Deep Neural NetworksSamuel Dodge, Lina Karam, Understanding How Image Quality Affects Deep Neural Networks, IEEE Xplore in the Proceedings of the Conference on the Quality of Mult…

文献紹介：Gate-Shift Networks for Video Action RecognitionSwathikiran Sudhakaran, Sergio Escalera, Oswald Lanz; Gate-Shift Networks for Video Action Recognition, Proceedings of the IEEE/CVF Conference on Computer Visi…

文献紹介：X3D: Expanding Architectures for Efficient Video RecognitionChristoph Feichtenhofer; X3D: Expanding Architectures for Efficient Video Recognition , Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern R…

文献紹介：Efficient Parametrization of Multi-Domain Deep Neural NetworksSylvestre-Alvise Rebuffi, Hakan Bilen, Andrea Vedaldi; Efficient Parametrization of Multi-Domain Deep Neural Networks, Proceedings of the IEEE Conference on Co…

2021/10/15

文献紹介：Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance D…Debidatta Dwibedi, Ishan Misra, Martial Hebert; Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection, Proceedings of the IEEE International…

文献紹介：SlowFast Networks for Video RecognitionChristoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He, SlowFast Networks for Video Recognition, Proceedings of the IEEE/CVF International Conference o…

文献紹介：Learning multiple visual domains with residual adaptersSylvestre-Alvise Rebuffi, Hakan Bilen, Andrea Vedaldi, Learning multiple visual domains with residual adapters, Advances in Neural Information Processing Syste…

文献紹介：TSM: Temporal Shift Module for Efficient Video UnderstandingJi Lin, Chuang Gan, Song Han; TSM: Temporal Shift Module for Efficient Video Understanding, Proceedings of the IEEE/CVF International Conference on Computer Vi…

文献紹介：Benchmarking Neural Network Robustness to Common Corruptions and…Dan Hendrycks, Thomas Dietterich, Benchmarking Neural Network Robustness to Common Corruptions and Perturbations, ICLR2019 https://openreview.net/forum?id=HJz6…

2021/5/31

文献紹介：Attention-Based Spatial Guidance for Image-to-Image TranslationYu Lin, Yigong Wang, Yifan Li, Yang Gao, Zhuoyi Wang, Latifur Khan; Attention-Based Spatial Guidance for Image-to-Image Translation, Proceedings of the IEEE/CV…

文献紹介：Text-to-Image Generation Grounded by Fine-Grained User AttentionJing Yu Koh, Jason Baldridge, Honglak Lee, Yinfei Yang; Text-to-Image Generation Grounded by Fine-Grained User Attention, Proceedings of the IEEE/CVF Winter Co…

文献紹介：BlockGAN: Learning 3D Object-aware Scene Representations from Un…Thu H. Nguyen-Phuoc, Christian Richardt, Long Mai, Yongliang Yang, Niloy Mitra, BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images…

文献紹介：R-MNet: A Perceptual Adversarial Network for Image InpaintingJireh Jam, Connah Kendrick, Vincent Drouard, Kevin Walker, Gee-Sern Hsu, Moi Hoon Yap, R-MNet: A Perceptual Adversarial Network for Image Inpainting Proceedin…

文献紹介：Big Bird: Transformers for Longer SequencesManzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahm…

2021/5/18

文献紹介：Learning Video Stabilization Using Optical FlowJiyang Yu, Ravi Ramamoorthi; Learning Video Stabilization Using Optical Flow, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition…

文献紹介：Bringing Old Photos Back to LifeZiyu Wan, Bo Zhang, Dongdong Chen, Pan Zhang, Dong Chen, Jing Liao, Fang Wen; Bringing Old Photos Back to Life, Proceedings of the IEEE/CVF Conference on Compu…

文献紹介：Iterative Answer Prediction With Pointer-Augmented Multimodal Tr…Ronghang Hu, Amanpreet Singh, Trevor Darrell, Marcus Rohrbach, Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA, Proceedi…

2021/5/11

文献紹介：Future Video Synthesis With Object Motion PredictionYue Wu, Rongrong Gao, Jaesik Park, Qifeng Chen; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 5539-5548ht…

文献紹介：Prior Guided GAN Based Semantic InpaintingAvisek Lahiri, Arnav Kumar Jain, Sanskar Agrawal, Pabitra Mitra, Prabir Kumar Biswas; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Rec…

文献紹介：Efficient Attention: Attention With Linear ComplexitiesZhuoran Shen, Mingyuan Zhang, Haiyu Zhao, Shuai Yi, Hongsheng Li; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2021…

Page updated

Report abuse