News in Pictures
Dear colleagues and friends,
We want to thank you for our memorable meeting of today when our minds and heart smiled in the beauty of making research! More than a presentation, a beautiful brainstorming about the meaning of intelligence towards the future if AI.
Let us meet always in this beautiful place where theory and practice meets in person!
The paper presented by Drd. Ing. Mihai Masala:
Wang, Z., Li, M., Xu, R., Zhou, L., Lei, J., Lin, X., ... & Ji, H. (2022). Language models with image descriptors have strong few-shot video-language learners. Advances in Neural Information Processing Systems, 35, 8483-8497.
Link to the paper: https://arxiv.org/pdf/2205.10747.pdf
Find us every Tuesday same address : 10:00 AM, in room 303, 3rd floor, in the PRECIS building at Universitatea POLITEHNICA din București, entrance from Bulevardul Maniu
See you next Tuesday with another interesting proposal!
20 February 2024
Dear colleagues and friends,
Tomorrow I propose to discuss a work that aims to build video-language models capable of solving various tasks (i.e. video captioning, video question answering, text-video retrieval and event prediction).
The authors propose a general method that does not require training and uses image-language models together with an LLM. The method obtains, in the fewshot context, very good results, even close to other strongly pre-trained models.
The paper is:
Wang, Z., Li, M., Xu, R., Zhou, L., Lei, J., Lin, X., ... & Ji, H. (2022). Language models with image descriptors have strong few-shot video-language learners. Advances in Neural Information Processing Systems, 35, 8483-8497.
Link to the paper: https://arxiv.org/pdf/2205.10747.pdf
Drd. Ing. Mihai Masala
Find us every Tuesday same address : 10:00 AM, in room 303, 3rd floor, in the PRECIS building at Universitatea POLITEHNICA din București, entrance from Bulevardul Maniu
19 February 2024
Consistency is the key and research is our passion! Today our dear colleague Drd.Ing. Mihai Masala presented another interesting paper which lead us to many captivating ideas from vision to language.
The paper presented:
Wang, J., Yang, Z., Hu, X., Li, L., Lin, K., Gan, Z., ... & Wang, L. (2022). Git: A generative image-to-text transformer for vision and language. Transactions on Machine Learning Research 11/2022
Link to the paper: https://arxiv.org/pdf/2205.14100.pdf
See you next meeting with a very interesting topic coming up!
Find us every Tuesday same address : 10:00 AM, in room 303, 3rd floor, in the PRECIS building at Universitatea POLITEHNICA din București entrance from Bulevardul Maniu
13 February 2024
Dear colleagues and students,
Tomorrow I propose to continue the discussion about models that try to unify the vision part with the language part. The work I will present proposes a relatively simple model that can successfully solve tasks such as image/video captioning and image/video question answering.
The paper is:
Wang, J., Yang, Z., Hu, X., Li, L., Lin, K., Gan, Z., ... & Wang, L. (2022). Git: A generative image-to-text transformer for vision and language. Transactions on Machine Learning Research 11/2022
Link to the paper: https://arxiv.org/pdf/2205.14100.pdf
Drd.Ing. Mihai Masala
Find us every Tuesday same address : 10:00 AM, in room 303, 3rd floor, in the PRECIS building at the Universitatea POLITEHNICA din București entrance from Bulevardul Maniu
12 February 2024
Always working for excellence with consistency, dedication and passion for research! Today at our weekly meeting at Bucharest Computer Vision Reading Group our colleague Drd.Ing. Mihai Masala presented us a very interesting paper from the domain vision to language. See you next time with new ideas in the same area of research!
The paper presented:
He, X., Chen, S., Ma, F., Huang, Z., Jin, X., Liu, Z., ... & Feng, J. (2023).
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending. arXiv preprint arXiv:2305.13167.
Link to the paper: https://arxiv.org/pdf/2305.13167.pdf
See you next meeting with the same curiosity and enthusiasm!
Find us every Tuesday same address : 10:00 AM, in room 303, 3rd floor, in the PRECIS building at the Universitatea POLITEHNICA din București entrance from Bulevardul Maniu
30 January 2024
Dear colleagues and students,
After discussing multimodal transformers and unsupervised learning, it's time to explore in more detail methods that try to bring information from video and text into the same space. The work that Drd. Ing. Mihai Masala will present tomorrow, 30 January proposes the adaptation of the methods used for the alignment of images and texts (CLIP) to understand the temporal information in videos. Thus, using a relatively small model, they manage to obtain state-of-the-art results on three tasks (captioning, video question answering and retrieval) in front of much larger models that were trained on significantly more data.
The paper is:
He, X., Chen, S., Ma, F., Huang, Z., Jin, X., Liu, Z., ... & Feng, J. (2023).
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending. arXiv preprint arXiv:2305.13167.
Link to the paper: https://arxiv.org/pdf/2305.13167.pdf
We are waiting for you tomorrow,
Drd.Ing. Mihai Masala
Find us every Tuesday same address : 10:00 AM, in room 303, 3rd floor, in the PRECIS building at the Universitatea POLITEHNICA din București, entrance from Bulevardul Maniu
29 January 2024
In these cold days let science, research and discovery to warm us up!
Prof. Univ. Dr. Marius Leordeanu presented the paper:
Cho, Jang Hyun, Utkarsh Mall, Kavita Bala, and Bharath Hariharan.
"Picie: Unsupervised semantic segmentation using invariance and equivariance in clustering."
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16794-16804. 2021.
Link to the paper: https://openaccess.thecvf.com/.../Cho_PiCIE_Unsupervised...
See you next Tuesday!
If it is Tuesday it is Bucharest Computer Vision Reading Group!
Find us every Tuesday same address : 10:00 AM, in room 303, 3rd floor, in the PRECIS building at the Universitatea POLITEHNICA din București entrance from Bulevardul Maniu
23 January 2024
Dear colleagues and students,
Now the time has come to resume the discussions from our Bucharest Computer Vision Reading Group and which, from the feedback we received from you but also from what I felt, turned out to be really useful and interesting.
I propose now to go a little in another direction, but also in the area of unsupervised learning, this time on the topic of unsupervised semantic segmentation. There are very few works in the literature that seriously attack the problem of unsupervised segmentation at the semantic level - this is also one of the most difficult problems in learning.
How do we learn about object classes without having human annotations?
How do we learn about new classes of objects, about which we did not know before?
The problem is certainly general enough to go beyond the field of computer vision.
The work that I will present and that will lay the foundations for this discussion, I hope as captivating as possible, is:
Cho, Jang Hyun, Utkarsh Mall, Kavita Bala, and Bharath Hariharan.
"Picie: Unsupervised semantic segmentation using invariance and equivariance in clustering."
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16794-16804. 2021.
Link to the paper: https://openaccess.thecvf.com/.../Cho_PiCIE_Unsupervised...
Find us every Tuesday same address : 10:00 AM, in room 303, 3rd floor, in the PRECIS building at the Universitatea POLITEHNICA din București, entrance from Bulevardul Maniu
We are waiting for your with the same enthusiasm!
Prof. Univ. Dr. Marius Leordeanu
22 January 2024
The joy of making science and research is our passion! What an amazing first meeting we had at our Bucharest Computer Vision Reading Group for this new year with great and new ideas ready to make them reality!
Prof. Univ. Dr. Marius Leordeanu presented the paper:
Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text. Advances in Neural Information Processing Systems, 34, pp. 24206-24221.
Akbari, H., Yuan, L., Qian, R., Chuang, W.H., Chang, S.F., Cui, Y. and Gong, B., 2021.
The presentation can be found here: https://proceedings.neurips.cc/.../cb3213ada48302953cb0f1...
See you next Tuesday with the same enthusiasm!
If it is Tuesday it is Bucharest Computer Vision Reading Group!
Find us every Tuesday same address : 10:00 AM, in room 303, 3rd floor, in the PRECIS building at the Universitatea POLITEHNICA din București entrance from Bulevardul Maniu
9 January 2024
My dear students and colleagues,
First of all, I want to wish you a warm Happy New Year! May you have a 2024 full of health, joy of life and fulfillment on all levels!
And now the time has come to continue our discussions at the Bucharest Computer Vision Reading Group
I will present you an impactful paper also from the Transformers area for several methods and which this time also addresses the problem of Self-supervised Learning:
Akbari, H., Yuan, L., Qian, R., Chuang, W.H., Chang, S.F., Cui, Y. and Gong, B., 2021. Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text. Advances in Neural Information Processing Systems, 34, pp. 24206-24221.
Link to the paper:
https://proceedings.neurips.cc/.../cb3213ada48302953cb0f1...
When: Tomorrow 9 January 2024 at 10AM
Where: Precis building, floor 3, room 303 Universitatea POLITEHNICA din București
We look forward to seeing you and have a nice day everyone!
Prof.Univ.Dr Marius Leordeanu
8 January 2024
What an amazing meeting we had today at Bucharest Computer Vision Reading Group and it was the last one for this year!
Thank you all and see you next year with the same enthusiasm and new ideas!
Prof. Univ. Dr. Marius Leordeanu presented the paper
"4M: Massively Multimodal Masked Modeling"
David Mizrahi, Roman Bachmann, Oğuzhan Fatih Kar, Teresa Yeo, Mingfei Gao, Afshin Dehghan, Amir Zamir. NeuroIPS 2023
Link to the paper: https://arxiv.org/abs/2312.06647
Have a beautiful time for Christmas and a Happy New Year!
21 December 2023
Dear students and colleagues,
The time has come for a new meeting at the Bucharest Computer Vision Reading Group - the last of this year, where we will continue the discussion about Multi-modal Transformers, with a last-minute paper, presented at NeurIPS 2023 just last week (when we held the last reading group) and amazingly published by the same research group!
"4M: Massively Multimodal Masked Modeling"
David Mizrahi, Roman Bachmann, Oğuzhan Fatih Kar, Teresa Yeo, Mingfei Gao, Afshin Dehghan, Amir Zamir. NeuroIPS 2023
Link to the paper: https://arxiv.org/abs/2312.06647
There is no such thing as a coincidence! So, I am very excited about tomorrow's discussion, where we will definitely learn and discover many interesting things!
So, we look forward to seeing you,
Tomorrow, Tuesday,
December 19, 2023, at Universitatea POLITEHNICA din București at the Precis building,10:00 AM, room 303, floor 3 ... as usual!
Have a beautiful day!
Prof.Univ.Dr Marius Leordeanu
19 December 2023
What a fantastic day of connecting to real science at Bucharest Computer Vision Reading Group!
Today Prof. Univ.Dr. Marius Leordeanu presented a recent work in a very interesting direction, which connects the Transformer models
to Multi-task and self-supervised learning.
Bachmann, Roman, David Mizrahi, Andrei Atanov, and Amir Zamir. "Multimae: Multi-modal multi-task masked autoencoders." In European Conference on Computer Vision, pp. 348-367. Cham: Springer Nature Switzerland, 2022.
Link to the paper: https://arxiv.org/pdf/2204.01678.pdf
See you next Tuesday!
If it is Tuesday it is Bucharest Computer Vision Reading Group time! Find us every Tuesday same address : 10:00 AM, in room 303, 3rd floor, in the PRECIS building at the Universitatea POLITEHNICA din București entrance from Bulevardul Maniu
12 December 2023
Very excited for our second meeting of our Bucharest Computer Vision Reading Group together with dear students, professors and colleagues with fascinating discussions and new ideas in research!
Prof. Univ. Dr. Marius Leordeanu presented a very interesting paper from CVPR 2023: Kang, Dahyun, Piotr Koniusz, Minsu Cho, and Naila Murray. "Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19627-19638. 2023.
You can find the paper below:
https://openaccess.thecvf.com/.../Kang_Distilling_Self...
If it is Tuesday it is Bucharest Computer Vision Reading Group time! Find us every Tuesday same address : 10:00 AM, in room 303, 3rd floor, in the PRECIS building at the University Politehnica of Bucharest, entrance from Bulevardul Maniu
28 November 2023
Dear friends, we are very excited to restart the Bucharest Computer Vision Reading Group that we started together in 2016!
Our meetings are open to all those who want to be up to date with the new discoveries in computer vision, deep learning, natural language processing and everything related to artificial intelligence! We invite and welcome all those who are passionate about science to discover the best scientific works in the field, for interesting discussions and new ideas for research and creation for the following scientific works and new projects for the AI community in Romania!
Let's all gather around some wonderful ideas that can make our world better!
We are waiting for you every Tuesday starting at 10:00 AM, in room 303, 3rd floor, in the PRECIS building at the University Politehnica of Bucharest!
Here we are at our first meeting after more than 3 years, happy to be together in large number, curious and passionate about ideas and interesting discussions
21 November 2023
Professor Univ. Dr. Marius Leordeanu after today's presentation
21 November 2023
One of our first meetings at The Institute of Mathematics of the Romanian Academy!