Search this site
Embedded Files
Skip to main content
Skip to navigation
玉木・丁研究室
ホーム
ゼミ
論文紹介
発表論文
ポスター
メンバー
卒論修論
卒研配属情報
リンク
玉木・丁研究室
ホーム
ゼミ
論文紹介
発表論文
ポスター
メンバー
卒論修論
卒研配属情報
リンク
More
ホーム
ゼミ
論文紹介
発表論文
ポスター
メンバー
卒論修論
卒研配属情報
リンク
論文紹介
不定期でオンライン論文読み会を行っています.詳しくは
connpassのページ
へ
2025/
8/9
CVPRW2025_髙間_現地参加報告
2025/7/10
CVPR2025論文紹介:Unboxed
CVPR2025論文紹介:OVO-Bench
202
5/6/26
論文紹介:Segment Anything, SAM2: Segment Anything in Images and Videos
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick, "Segment Anything", ICCV2023 https://openaccess.thecvf.com/content/ICCV2023/html/Kirillov_Segment_Anything_ICCV_2023_paper.html Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Dollár, Christoph Feichtenhofer, "SAM2: Segment Anything in Images and Videos", arXiv2024 https://arxiv.org/abs/2408.00714 - Download as a PDF or view online for free
論文紹介:HOTR: End-to-End Human-Object Interaction Detection With Transformers, Human-Object Interaction Detection via Disentangled Transformer, QPIC
Bumsoo Kim, Junhyun Lee, Jaewoo Kang, Eun-Sol Kim, Hyunwoo J. Kim, "HOTR: End-to-End Human-Object Interaction Detection With Transformers", CVPR2021 https://openaccess.thecvf.com/content/CVPR2021/html/Kim_HOTR_End-to-End_Human-Object_Interaction_Detection_With_Transformers_CVPR_2021_paper.html Desen Zhou, Zhichao Liu, Jian Wang, Leshan Wang, Tao Hu, Errui Ding, Jingdong Wang, "Human-Object Interaction Detection via Disentangled Transformer", CVPR2022 https://openaccess.thecvf.com/content/CVPR2022/html/Zhou_Human-Object_Interaction_Detection_via_Disentangled_Transformer_CVPR_2022_paper.html Masato Tamura, Hiroki Ohashi, Tomoaki Yoshinaga, "QPIC: Query-Based Pairwise Human-Object Interaction Detection With Image-Wide Contextual Information", CVPR2021 https://openaccess.thecvf.com/content/CVPR2021/html/Tamura_QPIC_Query-Based_Pairwise_Human-Object_Interaction_Detection_With_Image-Wide_Contextual_Information_CVPR_2021_paper.html - Download as a PDF or view online for free
2025/6/12
論文紹介:Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks, Unbiased Video Scene Graph Generation via Visual and Semantic Dual Debiasing 他
Nina Shvetsova, Arsha Nagrani, Bernt Schiele, Hilde Kuehne, Christian Rupprecht, "Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks", CVPR2025 https://openaccess.thecvf.com/content/CVPR2025/html/Shvetsova_Unbiasing_through_Textual_Descriptions_Mitigating_Representation_Bias_in_Video_Benchmarks_CVPR_2025_paper.html Yanjun Li, Zhaoyang Li, Honghui Chen, Lizhi Xu "Unbiased Video Scene Graph Generation via Visual and Semantic Dual Debiasing", CVPR2025 https://openaccess.thecvf.com/content/CVPR2025/html/Li_Unbiased_Video_Scene_Graph_Generation_via_Visual_and_Semantic_Dual_CVPR_2025_paper.html Haosen Yang, Deng Huang, Bin Wen, Jiannan Wu, Hongxun Yao, Yi Jiang, Xiatian Zhu, Zehuan Yuan, "MotionMAE: Self-Supervised Video Representation Learning with Motion-Aware Masked Autoencoders", BMVC2024 https://bmvc2024.org/proceedings/499/ - Download as a PDF or view online for free
論文紹介:AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts, Toward Human Readable Prompt Tuning: Kubrick’s The Shining is a good movie, and a good prompt too? 他
Taylor Shin, Yasaman Razeghi, Robert L. Logan IV, Eric Wallace, Sameer Singh, "AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts", EMNLP2020 https://aclanthology.org/2020.emnlp-main.346/ Weijia Shi, Xiaochuang Han, Hila Gonen, Ari Holtzman, Yulia Tsvetkov, Luke Zettlemoyer, "Toward Human Readable Prompt Tuning: Kubrick’s The Shining is a good movie, and a good prompt too?", EMNLP2023 https://aclanthology.org/2023.findings-emnlp.733/ Yuxin Wen, Neel Jain, John Kirchenbauer, Micah Goldblum, Jonas Geiping, Tom Goldstein, "Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery", NeurIPS 2023 https://proceedings.neurips.cc/paper_files/paper/2023/hash/a00548031e4647b13042c97c922fadf1-Abstract-Conference.html - Download as a PDF or view online for free
2025/5/22
論文紹介:「mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models」「SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts Commentaries」
論文紹介:「mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models」「SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts Commentaries」 - Download as a PDF or view online for free
論文紹介:「Amodal Completion via Progressive Mixed Context Diffusion」「Amodal Instance Segmentation with Diffusion Shape Prior Estimation」
論文紹介:「Amodal Completion via Progressive Mixed Context Diffusion」「Amodal Instance Segmentation with Diffusion Shape Prior Estimation」 - Download as a PDF or view online for free
2025/5/8
論文紹介:"Visual Genome:Connecting Language and VisionUsing Crowdsourced Dense Image Annotations" "Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs"
論文紹介:"Visual Genome:Connecting Language and VisionUsing Crowdsourced Dense Image Annotations" "Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs" - Download as a PDF or view online for free
論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" "Adaptive Plasticity Improvement for Continual Learning"
論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" "Adaptive Plasticity Improvement for Continual Learning" - Download as a PDF or view online for free
2025/4/24
論文紹介:PitcherNet: Powering the Moneyball Evolution in Baseball Video Analytics
論文紹介:PitcherNet: Powering the Moneyball Evolution in Baseball Video Analytics - Download as a PDF or view online for free
論文紹介:What, when, and where? Self-Supervised Spatio-Temporal Groundingin Untrimmed Multi-Action Videosfrom Narrated Instructions
論文紹介:What, when, and where? Self-Supervised Spatio-Temporal Groundingin Untrimmed Multi-Action Videosfrom Narrated Instructions - Download as a PDF or view online for free
2025/4/8
論文紹介:ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos
論文紹介:ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos - Download as a PDF or view online for free
論文紹介:Make Pixels Dance: High-Dynamic Video Generation
論文紹介:Make Pixels Dance: High-Dynamic Video Generation - Download as a PDF or view online for free
2024/11/19
論文紹介:On Feature Normalization and Data Augmentation
論文紹介:On Feature Normalization and Data Augmentation - Download as a PDF or view online for free
論文紹介:T-DEED: Temporal-Discriminability Enhancer Encoder-Decoder for Precise Event Spotting in Sports Videos
論文紹介:T-DEED: Temporal-Discriminability Enhancer Encoder-Decoder for Precise Event Spotting in Sports Videos - Download as a PDF or view online for free
2024/11/05
論文紹介:MS-DETR: Efficient DETR Training with Mixed Supervision
論文紹介:MS-DETR: Efficient DETR Training with Mixed Supervision - Download as a PDF or view online for free
論文紹介:CLIFF: Continual Latent Diffusion for Open-Vocabulary Object Detection
論文紹介:CLIFF: Continual Latent Diffusion for Open-Vocabulary Object Detection - Download as a PDF or view online for free
2024/10/22
論文紹介:2D Pose-guided Complete Silhouette Estimation of Human Body in Occlusion
論文紹介:2D Pose-guided Complete Silhouette Estimation of Human Body in Occlusion - Download as a PDF or view online for free
論文紹介:Synergy of Sight and Semantics: Visual Intention Understanding with CLIP
論文紹介:Synergy of Sight and Semantics: Visual Intention Understanding with CLIP - Download as a PDF or view online for free
2024/10/08
論文紹介:DEVIAS: Learning Disentangled Video Representations of Action and Scene
論文紹介:DEVIAS: Learning Disentangled Video Representations of Action and Scene - Download as a PDF or view online for free
論文紹介:Multi-class Video Co-segmentation with a Generative Multi-video Model
論文紹介:Multi-class Video Co-segmentation with a Generative Multi-video Model - Download as a PDF or view online for free
2024/9/19
論文紹介:QLoRA: Efficient Finetuning of Quantized LLMs
論文紹介:QLoRA: Efficient Finetuning of Quantized LLMs - Download as a PDF or view online for free
論文紹介:TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
論文紹介:TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval - Download as a PDF or view online for free
2024/9/12
論文紹介:Is Appearance Free Action Recognition Possible
論文紹介:Is Appearance Free Action Recognition Possible - Download as a PDF or view online for free
論文紹介:DiffusionTrack: Point Set Diffusion Model for Visual Object Tracking
論文紹介:DiffusionTrack: Point Set Diffusion Model for Visual Object Tracking - Download as a PDF or view online for free
2024/8/01
論文紹介:Image amodal completion: A survey (CVIU)
論文紹介:Image amodal completion: A survey (CVIU) - Download as a PDF or view online for free
論文紹介:MaPLe: Multi-Modal Prompt Learning (CVPR)
論文紹介:MaPLe: Multi-Modal Prompt Learning (CVPR) - Download as a PDF or view online for free
論文紹介:AutoSoccerPose: Automated 3D Posture Analysis of Soccer Shot Movements
論文紹介:AutoSoccerPose: Automated 3D Posture Analysis of Soccer Shot Movements - Download as a PDF or view online for free
2024/7/17
論文紹介:Can I Trust Your Answer? Visually Grounded Video Question Answering
論文紹介:Can I Trust Your Answer? Visually Grounded Video Question Answering - Download as a PDF or view online for free
論文紹介:Rugby Scene Classification Enhanced by Vision Language Model
論文紹介:Rugby Scene Classification Enhanced by Vision Language Model - Download as a PDF or view online for free
論文紹介:Hierarchical NeuroSymbolic Approach for Comprehensive and Explainable Action Quality Assessment
論文紹介:Hierarchical NeuroSymbolic Approach for Comprehensive and Explainable Action Quality Assessment - Download as a PDF or view online for free
2024/
7/3
論文紹介:Task-aligned Part-aware Panoptic Segmentation through Joint Object-Part Representations
論文紹介:Task-aligned Part-aware Panoptic Segmentation through Joint Object-Part Representations - Download as a PDF or view online for free
論文紹介:BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos
論文紹介:BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos - Download as a PDF or view online for free
2024/6/19
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models - Download as a PDF or view online for free
論文紹介:Coarse-to-Fine Amodal Segmentation with Shape Prior
論文紹介:Coarse-to-Fine Amodal Segmentation with Shape Prior - Download as a PDF or view online for free
論文紹介:Learning from One Continuous Video Stream
論文紹介:Learning from One Continuous Video Stream - Download as a PDF or view online for free
2024/6/6
論文紹介:Deep Learning-Based Human Pose Estimation: A Survey
論文紹介:Deep Learning-Based Human Pose Estimation: A Survey - Download as a PDF or view online for free
論文紹介:A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future
論文紹介:A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future - Download as a PDF or view online for free
2024/5/30
論文紹介:Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
論文紹介:Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation - Download as a PDF or view online for free
論文紹介:Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers
論文紹介:Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers - Download as a PDF or view online for free
論文紹介:When Visual Prompt Tuning Meets Source-Free Domain Adaptive Semantic Segmentation
論文紹介:When Visual Prompt Tuning Meets Source-Free Domain Adaptive Semantic Segmentation - Download as a PDF or view online for free
2024/5/16
論文紹介:Deep Occlusion-Aware Instance Segmentation With Overlapping BiLayers
論文紹介:Deep Occlusion-Aware Instance Segmentation With Overlapping BiLayers - Download as a PDF or view online for free
論文紹介:ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
論文紹介:ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation - Download as a PDF or view online for free
論文紹介:ArcFace: Additive Angular Margin Loss for Deep Face Recognition
論文紹介:ArcFace: Additive Angular Margin Loss for Deep Face Recognition - Download as a PDF or view online for free
2024/5/02
論文紹介:Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
論文紹介:Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding - Download as a PDF or view online for free
論文紹介:Selective Structured State-Spaces for Long-Form Video Understanding
論文紹介:Selective Structured State-Spaces for Long-Form Video Understanding - Download as a PDF or view online for free
2024/4/18
論文紹介:Automated Classification of Model Errors on ImageNet
論文紹介:Automated Classification of Model Errors on ImageNet - Download as a PDF or view online for free
論文紹介:Semantic segmentation using Vision Transformers: A survey
論文紹介:Semantic segmentation using Vision Transformers: A survey - Download as a PDF or view online for free
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vision Transformers
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vision Transformers - Download as a PDF or view online for free
論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators - Download as a PDF or view online for free
202
4
/
3
/
25
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Action Recognition
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Action Recognition - Download as a PDF or view online for free
論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video Segmentation - Download as a PDF or view online for free
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes - Download as a PDF or view online for free
2023/
1
/
11
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope - Download as a PDF or view online for free
論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt Tuning - Download as a PDF or view online for free
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation - Download as a PDF or view online for free
論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in Movies - Download as a PDF or view online for free
2023/11/30
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement - Download as a PDF or view online for free
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Localization
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Localization - Download as a PDF or view online for free
論文紹介:Vision Transformer Adapter for Dense Predictions
論文紹介:Vision Transformer Adapter for Dense Predictions - Download as a PDF or view online for free
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion - Download as a PDF or view online for free
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving - Download as a PDF or view online for free
論文紹介:Spatio-Temporal Action Detection Under Large Motion
論文紹介:Spatio-Temporal Action Detection Under Large Motion - Download as a PDF or view online for free
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition - Download as a PDF or view online for free
論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICA - Download as a PDF or view online for free
2023/10/30
論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning
論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning - Download as a PDF or view online for free
論文紹介:Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval
論文紹介:Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval - Download as a PDF or view online for free
論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning
論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning - Download as a PDF or view online for free
論文紹介:ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models
論文紹介:ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models - Download as a PDF or view online for free
論文紹介:Video Test-Time Adaptation for Action Recognition
論文紹介:Video Test-Time Adaptation for Action Recognition - Download as a PDF or view online for free
論文紹介:Transferable Decoding with Visual Entities for Zero-Shot Image Captioning
論文紹介:Transferable Decoding with Visual Entities for Zero-Shot Image Captioning - Download as a PDF or view online for free
2023/09/27
論文紹介:STMixer: A One-Stage Sparse Action Detector
論文紹介:STMixer: A One-Stage Sparse Action Detector - Download as a PDF or view online for free
論文紹介:OneFormer: One Transformer To Rule Universal Image Segmentation
論文紹介:OneFormer: One Transformer To Rule Universal Image Segmentation - Download as a PDF or view online for free
論文紹介:InternVideo: General Video Foundation Models via Generative and Discriminative Learning
論文紹介:InternVideo: General Video Foundation Models via Generative and Discriminative Learning - Download as a PDF or view online for free
2023/0
6/30
論文紹介:Learning With Neighbor Consistency for Noisy Labels
Learning With Neighbor Consistency for Noisy Labels Ahmet Iscen, Jack Valmadre, Anurag Arnab, Cordelia Schmid, CVPR2022 橋口凌大(名工大) 2023/6/30 概要 nノイズの多いラベルからの学習...
論文紹介:Parameter-Efficient Transfer Learning for NLP
Parameter-Efficient Transfer Learning for NLP Neil Houlsby, Andrei Giurgiu, Stanisław Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona...
論文紹介:Temporal Sentence Grounding in Videos: A Survey and Future Directions
Temporal Senctence Grounding in Videos: A Survey and Future Direction Hao Zhang, Aixin Sun, Wei Jing, and Joey Tianyi Zhou TPAMI 仁田智也(名工大) 概要 nTemporal Senten...
2023/0
6
/
08
論文紹介:DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heu...
論文紹介:Temporal Action Segmentation From Timestamp Supervision
Temporal Action Segmentation From Timestamp Supervision Zhe Li, Yazan Abu Farha, Jurgen Gall CVPR2021 加藤樹(名工大玉木研) 2023/6/8 研究概要 nTemoral Action Segmentation (...
論文紹介:Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning AJ Piergiovanni, Weicheng Kuo, Anelia Angelova arXiv2022 2023/6/8 ◼Vision Transfo...
論文紹介:End-to-End Spatio-Temporal Action Localisation with Video Transformers
End-to-End Spatio-Temporal Action Localisation with Video Transformers Alexey Gritsenko, Xuehan Xiong, Josip Djologna, Mostafa Dehghani, Chen Sun, Mario Luci c...
論文紹介:Look for the Change: Learning Object States and State-Modifying Actions From Untrimmed Web Videos
Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos Tomas Soucek, Jean-Baptiste Alayrac, Antoine Miech, Ivan Lapt...
論文紹介:Video Panoptic Segmentation
Video Panoptic Segmentation Dahun Kim, Sanghyun Woo, Joon-Young Lee, In So Kweon CVPR2020 水野翼(名工大玉木研) 2023/6/8 概要 n目的 • 画像領域におけるパノプティックセグメンテーションの概念を ビデオ領域にも拡張...
論文紹介:Flamingo: a Visual Language Model for Few-Shot Learning
🦩 Flamingo: a Visual Language Model for Few-Shot Learning Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc,...
論文紹介:LVIS: A Dataset for Large Vocabulary Instance Segmentation
LVIS: A Dataset for Large Vocabulary Instance Segmentation Agrim Gupta, Piotr Dollar, Ross Girshick, CVPR2019 2023/06/08 ◼LVIS dataset • Instance segmentation...
論文紹介:VLP: A Survey on Vision-Language Pre-training
VLP: A Survey on Vision- Language Pre-training Feilong Chen, Duzhen Zhang, Minglun Han, Xiuyi Chen, Jing Shi, Shuang Xu, Bo Xu, MIR 2023 福沢匠(名工大玉木研) 2023/6/8 ...
2023/05/11
論文紹介:End-to-End Object Detection with Transformers
End-to-End Object Detection with Transformers 2023/5/11 ◼ DETR (DEtection TRansformer) ◼ ◼End-to-end • ◼ • NMS • Non-Maximum Suppression (NMS) • Bounding box...
論文紹介:Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding? Gedas Bertasius, Heng Wang, Lorenzo Torresani, ICML2021 2023/5/11 ◼Transformer : TimeSformer • •...
論文紹介:Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models Antoine Yang, Antoine Miech, Josef Sivic, Ivan Laptev, Cordelia Schmid, NeurIPS2022...
論文紹介:A Survey of Vision-Language Pre-Trained Models
A Survey of Vision-Language Pre-Trained Models Yifan Du, Zikang Liu, Junyi Li, Wayne Xin Zhao, IJCAI2022 福沢匠(名工大玉木研) 2023/5/11 概要 nPre-Trained Models • 巨大なモデル...
論文紹介:Unsupervised Hierarchical Semantic Segmentation With Multiview Cosegmentation and Clustering Transformers
Unsupervised Hierarchical Semantic Segmentation With Multiview Cosegmentation and Clustering Transformers Tsung-Wei Ke, Jyh-Jing Hwang, Yunhui Guo, Xudong Wang...
論文紹介:Temporal Action Segmentation: An Analysis of Modern Techniques
Temporal Action Segmentation: An Analysis of Modern Techniques Guodong Ding, Fadime Sener, and Angela Yao arXiv2022 加藤樹,神谷広大(名工大玉木研) 2023/5/11 Introduction nT...
論文紹介:The Cityscapes Dataset for Semantic Urban Scene Understanding
The Cityscapes Dataset for Semantic Urban Scene Understanding Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, ...
2023/04/06
論文紹介:Transformers in Action: Weakly Supervised Action Segmentation
John Ridley, Huseyin Coskun, David Joseph Tan, Nassir Navab, Federico Tombari, "Transformers in Action: Weakly Supervised Action Segmentation" arXiv2022 https:…
論文紹介:Human Hands As Probes for Interactive Object Understanding
Mohit Goyal, Sahil Modi, Rishabh Goyal, Saurabh Gupta, "Human Hands As Probes for Interactive Object Understanding" CVPR2022 https://openaccess.thecvf.com/cont…
論文紹介:DramaQA: Character-Centered Video Story Understanding with Hiera…
Seongho Choi, Kyoung-Woon On, Yu-Jung Heo, Ahjeong Seo, Youwon Jang, Minsu Lee, Byoung-Tak Zhang, "DramaQA: Character-Centered Video Story Understanding with H…
論文紹介:Rethinking Zero-shot Video Classification: End-to-end Training f…
Biagio Brattoli, Joseph Tighe, Fedor Zhdanov, Pietro Perona, Krzysztof Chalupka, "Rethinking Zero-shot Video Classification: End-to-end Training for Realistic …
論文紹介:Omnivore: A Single Model for Many Visual Modalities
Rohit Girdhar, Mannat Singh, Nikhila Ravi, Laurens van der Maaten, Armand Joulin, Ishan Misra, "Omnivore: A Single Model for Many Visual Modalities" CVPR2022 h…
論文紹介:Beyond Short Clips: End-to-End Video-Level Learning With Collabo…
Xitong Yang, Haoqi Fan, Lorenzo Torresani, Larry S. Davis, Heng Wang, "Beyond Short Clips: End-to-End Video-Level Learning With Collaborative Memories" CVPR202…
論文紹介:Panoptic-aware Image-to-Image Translation
Liyun Zhang, Photchara Ratsamee, Bowen Wang, Zhaojie Luo, Yuki Uranishi, Manabu Higashida, Haruo Takemura, "Panoptic-aware Image-to-Image Translation" WACV2023…
論文紹介:Multimodal Learning with Transformers: A Survey
Peng Xu, Xiatian Zhu, David A. Clifton, "Multimodal Learning with Transformers: A Survey" arXiv2022 https://arxiv.org/abs/2206.06488
2022/11/25
論文紹介:Learn2Augment: Learning to Composite Videos for Data Augmentatio…
Shreyank N Gowda, Marcus Rohrbach, Frank Keller, Laura Sevilla-Lara, "Learn2Augment: Learning to Composite Videos for Data Augmentation in Action Recognition" …
論文紹介:Deep Mutual Learning
Ying Zhang, Tao Xiang, Timothy M. Hospedales, Huchuan Lu, "Deep Mutual Learning" CVPR2018 https://openaccess.thecvf.com/content_cvpr_2018/html/Zhang_Deep_Mutua…
論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations
Xinyu Liu, Wuyang Li, Qiushi Yang, Baopu Li, Yixuan Yuan, "Towards Robust Adaptive Object Detection Under Noisy Annotations" CVPR2022 https://openaccess.thecvf…
論文紹介:TubeDETR: Spatio-Temporal Video Grounding With Transformers
Antoine Yang, Antoine Miech, Josef Sivic, Ivan Laptev, Cordelia Schmid, "TubeDETR: Spatio-Temporal Video Grounding With Transformers" CVPR2022 https://openacce…
2022/11/11
文献紹介:PolyViT: Co-training Vision Transformers on Images, Videos and A…
Valerii Likhosherstov, Anurag Arnab, Krzysztof Choromanski, Mario Lucic, Yi Tay, Adrian Weller, Mostafa Dehghani, "PolyViT: Co-training Vision Transformers on …
文献紹介:VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Und…
Hu Xu, Gargi Ghosh, Po-Yao Huang, Dmytro Okhonko, Armen Aghajanyan, Florian Metze, Luke Zettlemoyer, Christoph Feichtenhofer, "VideoCLIP: Contrastive Pre-train…
文献紹介:Multi-dataset Training of Transformers for Robust Action Recogni…
Junwei Liang, Enwei Zhang, Jun Zhang, Chunhua Shen, "Multi-dataset Training of Transformers for Robust Action Recognition" NeurIPS2022 https://arxiv.org/abs/22…
文献紹介:Length-Controllable Image Captioning
Chaorui Deng, Ning Ding, Mingkui Tan, Qi Wu, "Length-Controllable Image Captioning" ECCV2020 https://www.ecva.net/papers/eccv_2020/papers_ECCV/html/2035_ECCV_2…
2022/10/28
文献紹介:Temporal Convolutional Networks for Action Segmentation and Dete…
Colin Lea, Michael D. Flynn, Rene Vidal, Austin Reiter, Gregory D. Hager, "Temporal Convolutional Networks for Action Segmentation and Detection", CVPR2017 htt…
文献紹介:Unsupervised Domain Adaptation for Spatio-Temporal Action Locali…
Nakul Agarwal, Yi-Ting Chen, Behzad Dariush and Ming-Hsuan Yang, "Unsupervised Domain Adaptation for Spatio-Temporal Action Localization", BMVC2020 https://www…
文献紹介:Toward Multimodal Image-to-Image Translation
Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A. Efros, Oliver Wang, Eli Shechtman, "Toward Multimodal Image-to-Image Translation", NeurIPS…
文献紹介:Learning From Noisy Labels With Deep Neural Networks: A Survey
H. Song, M. Kim, D. Park, Y. Shin and J. -G. Lee, "Learning From Noisy Labels With Deep Neural Networks: A Survey", in IEEE Transactions on Neural Networks and…
2022/10/14
文献紹介:Elaborative Rehearsal for Zero-Shot Action Recognition
Shizhe Chen, Dong Huang, "Elaborative Rehearsal for Zero-Shot Action Recognition", Proceedings of the IEEE/CVF International Conference on Computer Vision (ICC…
文献紹介:Temporal Alignment Networks for Long-Term Video
Tengda Han, Weidi Xie, Andrew Zisserman, "Temporal Alignment Networks for Long-Term Video", Proceedings of the IEEE/CVF Conference on Computer Vision and Patte…
文献紹介:Multi-Task Learning for Dense Prediction Tasks: A Survey
Simon Vandenhende, Stamatios Georgoulis, Wouter Van Gansbeke, Marc Proesmans, Dengxin Dai, Luc Van Gool, "Multi-Task Learning for Dense Prediction Tasks: A Sur…
文献紹介:Omnivore: A Single Model for Many Visual Modalities
Rohit Girdhar, Mannat Singh, Nikhila Ravi, Laurens van der Maaten, Armand Joulin, Ishan Misra, "Omnivore: A Single Model for Many Visual Modalities", Proceedin…
2022/06/17
Activity-Net Challenge 2021の紹介
Activity-Net Challenge 2021の紹介 http://activity-net.org http://activity-net.org/challenges/2021/index.html http://activity-net.org/challenges/2022/index.html
文献紹介:A Survey of Deep Learning-Based Object Detection
Licheng Jiao, Fan Zhang, Fang Liu, Shuyuan Yang, Lingling Li, Zhixi Feng, Rong Qu, "A Survey of Deep Learning-Based Object Detection", IEEE Access, Vol.7, pp. …
文献紹介:Image Segmentation Using Deep Learning: A Survey
Shervin Minaee, Yuri Boykov, Fatih Porikli, Antonio Plaza, Nasser Kehtarnavaz, Demetri Terzopoulos, Image Segmentation Using Deep Learning: A Survey, IEEE Tran…
文献紹介:Image-to-Image Translation: Methods and Applications
Yingxue Pang, Jianxin Lin, Tao Qin, Zhibo Chen, Image-to-Image Translation: Methods and Applications, IEEE Transactions on Multimedia, doi: 10.1109/TMM.2021.31…
文献紹介:YOLO series:v1-v5, X, F, and YOWO
20220617_You_Only_Look_Once_Series.pdf You Only Look Once: Unified, Real-Time Object Detection https://www.cv-foundation.org/openaccess/content_cvpr_2016/html/…
2022/04/15
文献紹介:EfficientDet: Scalable and Efficient Object Detection
Mingxing Tan, Ruoming Pang, Quoc V. Le; EfficientDet: Scalable and Efficient Object Detection, Proceedings of the IEEE/CVF Conference on Computer Vision and Pa…
文献紹介:Learning Motion-Appearance Co-Attention for Zero-Shot Video Obje…
Shu Yang, Lu Zhang, Jinqing Qi, Huchuan Lu, Shuo Wang, Xiaoxing Zhang; Learning Motion-Appearance Co-Attention for Zero-Shot Video Object Segmentation, Proceed…
文献紹介:Spatially-Adaptive Pixelwise Networks for Fast Image Translation
Tamar Rott Shaham, Michael Gharbi, Richard Zhang, Eli Shechtman, Tomer Michaeli; Spatially-Adaptive Pixelwise Networks for Fast Image Translation, Proceedings …
文献紹介:You Only Look Once: Unified, Real-Time Object Detection
Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi; You Only Look Once: Unified, Real-Time Object Detection, Proceedings of the IEEE Conference on Comp…
文献紹介:Swin Transformer: Hierarchical Vision Transformer Using Shifted …
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo; Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows…
文献紹介:Video Description: A Survey of Methods, Datasets, and Evaluation…
Nayyer Aafaq, Ajmal Mian, Wei Liu, Syed Zulqarnain Gilani, and Mubarak Shah. 2019. Video Description: A Survey of Methods, Datasets, and Evaluation Metrics. AC…
文献紹介:Simpler Is Better: Few-Shot Semantic Segmentation With Classifie…
Zhihe Lu, Sen He, Xiatian Zhu, Li Zhang, Yi-Zhe Song, Tao Xiang; Simpler Is Better: Few-Shot Semantic Segmentation With Classifier Weight Transformer, Proceedi…
2021/12/16
文献紹介:Adversarial Cross-Domain Action Recognition with Co-Attention
Boxiao Pan, Zhangjie Cao, Ehsan Adeli, Juan Carlos Niebles, Adversarial Cross-Domain Action Recognition with Co-Attention, AAAI2020. https://doi.org/10.1609/aa…
文献紹介:Extreme Low-Resolution Activity Recognition Using a Super-Resolu…
Mingzheng Hou, Song Liu, Jiliu Zhou, Yi Zhang, Ziliang Feng, Extreme Low-Resolution Activity Recognition Using a Super-Resolution-Oriented Generative Adversari…
文献紹介:2D or not 2D? Adaptive 3D Convolution Selection for Efficient Vi…
Hengduo Li, Zuxuan Wu, Abhinav Shrivastava, Larry S. Davis; 2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition, Proceedings of the…
文献紹介:Rethinking Data Augmentation for Image Super-resolution: A Compr…
Jaejun Yoo, Namhyuk Ahn, Kyung-Ah Sohn; Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Analysis and a New Strategy, Proceedings of th…
文献紹介:Token Shift Transformer for Video Classification
Hao Zhang, Yanbin Hao, Chong-Wah Ngo, Token Shift Transformer for Video Classification, ACM MM '21: Proceedings of the 29th ACM International Conference on Mul…
2021/12/3
文献紹介: Shuffle and Attend: Video Domain Adaptation
Jinwoo Choi, Gaurav Sharma, Samuel Schulter, Jia-Bin Huang, Shuffle and Attend: Video Domain Adaptation, ECCV2020. https://www.ecva.net/papers/eccv_2020/papers…
文献紹介:TinyVIRAT: Low-resolution Video Action Recognition
Ugur Demir, Yogesh S Rawat, Mubarak Shah, TinyVIRAT: Low-resolution Video Action Recognition, ICPR2021, pp. 7387-7394 doi: 10.1109/ICPR48806.2021.9412541 https…
文献紹介:CutDepth: Edge-aware Data Augmentation in Depth Estimation
CutDepth:Edge-aware Data Augmentation in Depth Estimation, arXiv:2107.07684 https://arxiv.org/abs/2107.07684
文献紹介:Deep Analysis of CNN-Based Spatio-Temporal Representations for A…
Chun-Fu Richard Chen, Rameswar Panda, Kandan Ramakrishnan, Rogerio Feris, John Cohn, Aude Oliva, Quanfu Fan; Deep Analysis of CNN-Based Spatio-Temporal Represe…
文献紹介:Video Transformer Network
Daniel Neimark, Omri Bar, Maya Zohar, Dotan Asselmann; Video Transformer Network, Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV…
2021/11/19
文献紹介:VideoMix: Rethinking Data Augmentation for Video Classification
Sangdoo Yun, Seong Joon Oh, Byeongho Heo, Dongyoon Han, Jinhyung Kim, VideoMix: Rethinking Data Augmentation for Video Classification, arXiv:2012.03457 https:/…
文献紹介:An Image is Worth 16x16 Words: Transformers for Image Recognitio…
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, …
文献紹介:Why do deep convolutional networks generalize so poorly to small…
Aharon Azulay, Yair Weiss, Why do deep convolutional networks generalize so poorly to small image transformations?, JMLR 20(184):1−25, 2019. https://jmlr.org/p…
文献紹介:RESOUND: Towards Action Recognition without Representation Bias
Yingwei Li, Yi Li, Nuno Vasconcelos; RESOUND: Towards Action Recognition without Representation Bias, Proceedings of the European Conference on Computer Vision…
文献紹介:Tell Me Where to Look: Guided Attention Inference Network
Kunpeng Li, Ziyan Wu, Kuan-Chuan Peng, Jan Ernst, Yun Fu; Tell Me Where to Look: Guided Attention Inference Network, Proceedings of the IEEE Conference on Comp…
2021/11/05
文献紹介:Learnable Gated Temporal Shift Module for Free-form Video Inpain…
Ya-Liang Chang, Zhe Yu Liu, Kuan-Ying Lee, Winston Hsu, Learnable Gated Temporal Shift Module for Free-form Video Inpainting, BMVC2019 DOI: https://dx.doi.org/…
文献紹介:Selective Feature Compression for Efficient Activity Recognition…
Chunhui Liu, Xinyu Li, Hao Chen, Davide Modolo, Joseph Tighe; Selective Feature Compression for Efficient Activity Recognition Inference, Proceedings of the IE…
文献紹介:CutMix: Regularization Strategy to Train Strong Classifiers With…
Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, Youngjoon Yoo; CutMix: Regularization Strategy to Train Strong Classifiers With Localizab…
文献紹介:Efficient Multi-Domain Learning by Covariance Normalization
Yunsheng Li, Nuno Vasconcelos; Efficient Multi-Domain Learning by Covariance Normalization, Proceedings of the IEEE/CVF Conference on Computer Vision and Patte…
文献紹介:SegFormer: Simple and Efficient Design for Semantic Segmentation…
Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo, SegFormer: Simple and Efficient Design for Semantic Segmentation with Transform…
2021/10/29
文献紹介:Simple Copy-Paste Is a Strong Data Augmentation Method for Insta…
Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung-Yi Lin, Ekin D. Cubuk, Quoc V. Le, Barret Zoph; Simple Copy-Paste Is a Strong Data Augmentation Metho…
文献紹介:Understanding How Image Quality Affects Deep Neural Networks
Samuel Dodge, Lina Karam, Understanding How Image Quality Affects Deep Neural Networks, IEEE Xplore in the Proceedings of the Conference on the Quality of Mult…
文献紹介:Gate-Shift Networks for Video Action Recognition
Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz; Gate-Shift Networks for Video Action Recognition, Proceedings of the IEEE/CVF Conference on Computer Visi…
文献紹介:X3D: Expanding Architectures for Efficient Video Recognition
Christoph Feichtenhofer; X3D: Expanding Architectures for Efficient Video Recognition , Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern R…
文献紹介:Efficient Parametrization of Multi-Domain Deep Neural Networks
Sylvestre-Alvise Rebuffi, Hakan Bilen, Andrea Vedaldi; Efficient Parametrization of Multi-Domain Deep Neural Networks, Proceedings of the IEEE Conference on Co…
2021/10/15
文献紹介:Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance D…
Debidatta Dwibedi, Ishan Misra, Martial Hebert; Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection, Proceedings of the IEEE International…
文献紹介:SlowFast Networks for Video Recognition
Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He, SlowFast Networks for Video Recognition, Proceedings of the IEEE/CVF International Conference o…
文献紹介:Learning multiple visual domains with residual adapters
Sylvestre-Alvise Rebuffi, Hakan Bilen, Andrea Vedaldi, Learning multiple visual domains with residual adapters, Advances in Neural Information Processing Syste…
文献紹介:TSM: Temporal Shift Module for Efficient Video Understanding
Ji Lin, Chuang Gan, Song Han; TSM: Temporal Shift Module for Efficient Video Understanding, Proceedings of the IEEE/CVF International Conference on Computer Vi…
文献紹介:Benchmarking Neural Network Robustness to Common Corruptions and…
Dan Hendrycks, Thomas Dietterich, Benchmarking Neural Network Robustness to Common Corruptions and Perturbations, ICLR2019 https://openreview.net/forum?id=HJz6…
2021/5/31
文献紹介:Attention-Based Spatial Guidance for Image-to-Image Translation
Yu Lin, Yigong Wang, Yifan Li, Yang Gao, Zhuoyi Wang, Latifur Khan; Attention-Based Spatial Guidance for Image-to-Image Translation, Proceedings of the IEEE/CV…
文献紹介:Text-to-Image Generation Grounded by Fine-Grained User Attention
Jing Yu Koh, Jason Baldridge, Honglak Lee, Yinfei Yang; Text-to-Image Generation Grounded by Fine-Grained User Attention, Proceedings of the IEEE/CVF Winter Co…
文献紹介:BlockGAN: Learning 3D Object-aware Scene Representations from Un…
Thu H. Nguyen-Phuoc, Christian Richardt, Long Mai, Yongliang Yang, Niloy Mitra, BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images…
文献紹介:R-MNet: A Perceptual Adversarial Network for Image Inpainting
Jireh Jam, Connah Kendrick, Vincent Drouard, Kevin Walker, Gee-Sern Hsu, Moi Hoon Yap, R-MNet: A Perceptual Adversarial Network for Image Inpainting Proceedin…
文献紹介:Big Bird: Transformers for Longer Sequences
Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahm…
2021/5/18
文献紹介:Learning Video Stabilization Using Optical Flow
Jiyang Yu, Ravi Ramamoorthi; Learning Video Stabilization Using Optical Flow, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition…
文献紹介:Bringing Old Photos Back to Life
Ziyu Wan, Bo Zhang, Dongdong Chen, Pan Zhang, Dong Chen, Jing Liao, Fang Wen; Bringing Old Photos Back to Life, Proceedings of the IEEE/CVF Conference on Compu…
文献紹介:Iterative Answer Prediction With Pointer-Augmented Multimodal Tr…
Ronghang Hu, Amanpreet Singh, Trevor Darrell, Marcus Rohrbach, Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA, Proceedi…
2021/5/11
文献紹介:Future Video Synthesis With Object Motion Prediction
Yue Wu, Rongrong Gao, Jaesik Park, Qifeng Chen; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 5539-5548ht…
文献紹介:Prior Guided GAN Based Semantic Inpainting
Avisek Lahiri, Arnav Kumar Jain, Sanskar Agrawal, Pabitra Mitra, Prabir Kumar Biswas; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Rec…
文献紹介:Efficient Attention: Attention With Linear Complexities
Zhuoran Shen, Mingyuan Zhang, Haiyu Zhao, Shuai Yi, Hongsheng Li; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2021…
Report abuse
Page details
Page updated
Report abuse