News in Pictures

Dear colleagues and friends,


We want to thank  you for our memorable meeting of today when our minds and heart smiled in the beauty of making research! More than a presentation, a beautiful brainstorming about the meaning of intelligence towards the future if AI.

Let us meet always in this beautiful place where theory and practice meets in person!


The paper presented by Drd. Ing. Mihai Masala:

Wang, Z., Li, M., Xu, R., Zhou, L., Lei, J., Lin, X., ... & Ji, H. (2022). Language models with image descriptors have strong few-shot video-language learners. Advances in Neural Information Processing Systems, 35, 8483-8497.

Link to the paper: https://arxiv.org/pdf/2205.10747.pdf

Find us every Tuesday same address : 10:00 AM, in room 303, 3rd floor, in the PRECIS building at Universitatea POLITEHNICA din București, entrance from Bulevardul Maniu


See you next Tuesday with another interesting proposal!

20  February 2024



Dear colleagues and friends,


Tomorrow I propose to discuss a work that aims to build video-language models capable of solving various tasks (i.e. video captioning, video question answering, text-video retrieval and event prediction).

The authors propose a general method that does not require training and uses image-language models together with an LLM. The method obtains, in the fewshot context, very good results, even close to other strongly pre-trained models.

The paper is:

Wang, Z., Li, M., Xu, R., Zhou, L., Lei, J., Lin, X., ... & Ji, H. (2022). Language models with image descriptors have strong few-shot video-language learners. Advances in Neural Information Processing Systems, 35, 8483-8497.

Link to the paper: https://arxiv.org/pdf/2205.10747.pdf

Drd. Ing. Mihai Masala


Find us every Tuesday same address : 10:00 AM, in room 303, 3rd floor, in the PRECIS building at Universitatea POLITEHNICA din București, entrance from Bulevardul Maniu


19  February 2024


Consistency is the key and research is our passion! Today our dear colleague Drd.Ing. Mihai Masala presented another interesting paper which lead us to many captivating ideas from vision to language.


The paper presented:

Wang, J., Yang, Z., Hu, X., Li, L., Lin, K., Gan, Z., ... & Wang, L. (2022). Git: A generative image-to-text transformer for vision and language. Transactions on Machine Learning Research 11/2022

Link to the paper: https://arxiv.org/pdf/2205.14100.pdf


See you next meeting with a very interesting topic coming up!

Find us every Tuesday same address : 10:00 AM, in room 303, 3rd floor, in the PRECIS building at Universitatea POLITEHNICA din București entrance from Bulevardul Maniu


13 February 2024

Dear colleagues and students,


Tomorrow I propose to continue the discussion about models that try to unify the vision part with the language part. The work I will present proposes a relatively simple model that can successfully solve tasks such as image/video captioning and image/video question answering.

The paper is:

Wang, J., Yang, Z., Hu, X., Li, L., Lin, K., Gan, Z., ... & Wang, L. (2022). Git: A generative image-to-text transformer for vision and language. Transactions on Machine Learning Research 11/2022

Link to the paper: https://arxiv.org/pdf/2205.14100.pdf


Drd.Ing. Mihai Masala

Find us every Tuesday same address : 10:00 AM, in room 303, 3rd floor, in the PRECIS building at the Universitatea POLITEHNICA din București entrance from Bulevardul Maniu


12  February 2024

Always working for excellence with consistency, dedication and passion for research! Today at our weekly meeting at Bucharest Computer Vision Reading Group our colleague Drd.Ing. Mihai Masala presented us a very interesting paper from the domain vision to language. See you next time with new ideas in the same area of research! 

The paper presented:

He, X., Chen, S., Ma, F., Huang, Z., Jin, X., Liu, Z., ... & Feng, J. (2023).

VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending. arXiv preprint arXiv:2305.13167.

Link to the paper: https://arxiv.org/pdf/2305.13167.pdf

See you next meeting with the same curiosity and enthusiasm! 

Find us every Tuesday same address : 10:00 AM, in room 303, 3rd floor, in the PRECIS building at the Universitatea POLITEHNICA din București entrance from Bulevardul Maniu

30 January 2024

Dear colleagues and students,


After discussing multimodal transformers and unsupervised learning, it's time to explore in more detail methods that try to bring information from video and text into the same space. The work that Drd. Ing. Mihai Masala will present tomorrow, 30 January proposes the adaptation of the methods used for the alignment of images and texts (CLIP) to understand the temporal information in videos. Thus, using a relatively small model, they manage to obtain state-of-the-art results on three tasks (captioning, video question answering and retrieval) in front of much larger models that were trained on significantly more data.

The paper is:

He, X., Chen, S., Ma, F., Huang, Z., Jin, X., Liu, Z., ... & Feng, J. (2023).

VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending. arXiv preprint arXiv:2305.13167.

Link to the paper: https://arxiv.org/pdf/2305.13167.pdf

We are waiting for you tomorrow,


Drd.Ing. Mihai Masala

Find us every Tuesday same address : 10:00 AM, in room 303, 3rd floor, in the PRECIS building at the Universitatea POLITEHNICA din București, entrance from Bulevardul Maniu


29 January 2024

In these cold days let science, research and discovery to warm us up!


Prof. Univ. Dr. Marius Leordeanu presented the paper:

Cho, Jang Hyun, Utkarsh Mall, Kavita Bala, and Bharath Hariharan.

"Picie: Unsupervised semantic segmentation using invariance and equivariance in clustering."

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16794-16804. 2021.


Link to the paper: https://openaccess.thecvf.com/.../Cho_PiCIE_Unsupervised...

See you next Tuesday!


If it is Tuesday it is Bucharest Computer Vision Reading Group!

Find us every Tuesday same address : 10:00 AM, in room 303, 3rd floor, in the PRECIS building at the Universitatea POLITEHNICA din București entrance from Bulevardul Maniu


23 January 2024

Dear colleagues and students,


Now the time has come to resume the discussions from our Bucharest Computer Vision Reading Group and which, from the feedback we received from you but also from what I felt, turned out to be really useful and interesting.

I propose now to go a little in another direction, but also in the area of ​​unsupervised learning, this time on the topic of unsupervised semantic segmentation. There are very few works in the literature that seriously attack the problem of unsupervised segmentation at the semantic level - this is also one of the most difficult problems in learning.


How do we learn about object classes without having human annotations?

How do we learn about new classes of objects, about which we did not know before?

The problem is certainly general enough to go beyond the field of computer vision.


The work that I will present and that will lay the foundations for this discussion, I hope as captivating as possible, is:

Cho, Jang Hyun, Utkarsh Mall, Kavita Bala, and Bharath Hariharan.

"Picie: Unsupervised semantic segmentation using invariance and equivariance in clustering."

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16794-16804. 2021.

Link to the paper: https://openaccess.thecvf.com/.../Cho_PiCIE_Unsupervised...


Find us every Tuesday same address : 10:00 AM, in room 303, 3rd floor, in the PRECIS building at the Universitatea POLITEHNICA din București, entrance from Bulevardul Maniu


We are waiting for your with the same enthusiasm!


Prof. Univ.  Dr. Marius Leordeanu

22 January 2024



The joy of making science and research is our passion! What an amazing first meeting we had at our Bucharest Computer Vision Reading Group for this new year with great and new ideas ready to make them reality!


Prof. Univ. Dr. Marius Leordeanu presented the paper:

Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text. Advances in Neural Information Processing Systems, 34, pp. 24206-24221.

Akbari, H., Yuan, L., Qian, R., Chuang, W.H., Chang, S.F., Cui, Y. and Gong, B., 2021.

The presentation can be found here: https://proceedings.neurips.cc/.../cb3213ada48302953cb0f1...


See you next Tuesday with the same enthusiasm! 

If it is Tuesday it is Bucharest Computer Vision Reading Group!

Find us every Tuesday same address : 10:00 AM, in room 303, 3rd floor, in the PRECIS building at the Universitatea POLITEHNICA din București entrance from Bulevardul Maniu


9 January 2024

My dear students and colleagues,


First of all, I want to wish you a warm Happy New Year! May you have a 2024 full of health, joy of life and fulfillment on all levels!

And now the time has come to continue our discussions at the Bucharest Computer Vision Reading Group

I will present you an impactful paper also from the Transformers area for several methods and which this time also addresses the problem of Self-supervised Learning:

Akbari, H., Yuan, L., Qian, R., Chuang, W.H., Chang, S.F., Cui, Y. and Gong, B., 2021. Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text. Advances in Neural Information Processing Systems, 34, pp. 24206-24221.

Link to the paper:

https://proceedings.neurips.cc/.../cb3213ada48302953cb0f1...


When: Tomorrow 9 January 2024 at 10AM

Where: Precis building, floor 3, room 303 Universitatea POLITEHNICA din București

We look forward to seeing you and have a nice day everyone!


Prof.Univ.Dr Marius Leordeanu

8 January 2024

What an amazing meeting we had today at Bucharest Computer Vision Reading Group and it was the last one for this year!

Thank you all and see you next year with the same enthusiasm and new ideas!


Prof. Univ. Dr. Marius Leordeanu presented the paper

"4M: Massively Multimodal Masked Modeling"

David Mizrahi, Roman Bachmann, Oğuzhan Fatih Kar, Teresa Yeo, Mingfei Gao, Afshin Dehghan, Amir Zamir. NeuroIPS 2023


Link to the paper: https://arxiv.org/abs/2312.06647


Have a beautiful time for Christmas and a Happy New Year!


21  December 2023

Dear students and colleagues,


The time has come for a new meeting at the Bucharest Computer Vision Reading Group -  the last of this year, where we will continue the discussion about Multi-modal Transformers, with a last-minute paper, presented at NeurIPS 2023 just last week (when we held the last reading group) and amazingly published by the same research group!


"4M: Massively Multimodal Masked Modeling"

David Mizrahi, Roman Bachmann, Oğuzhan Fatih Kar, Teresa Yeo, Mingfei Gao, Afshin Dehghan, Amir Zamir. NeuroIPS 2023

Link to the paper: https://arxiv.org/abs/2312.06647

There is no such thing as a coincidence! So, I am very excited about tomorrow's discussion, where we will definitely learn and discover many interesting things!

So, we look forward to seeing you,

Tomorrow, Tuesday, 

December 19, 2023, at Universitatea POLITEHNICA din București at the Precis building,10:00 AM, room 303, floor 3 ... as usual!


Have a beautiful day!

Prof.Univ.Dr Marius Leordeanu


19 December 2023

What a fantastic day of connecting to real science at Bucharest Computer Vision Reading Group!


Today Prof. Univ.Dr. Marius Leordeanu presented a recent work in a very interesting direction, which connects the Transformer models 

to Multi-task and self-supervised learning. 

Bachmann, Roman, David Mizrahi, Andrei Atanov, and Amir Zamir. "Multimae: Multi-modal multi-task masked autoencoders." In European Conference on Computer Vision, pp. 348-367. Cham: Springer Nature Switzerland, 2022.

Link to the paper: https://arxiv.org/pdf/2204.01678.pdf

See you next Tuesday!

If it is Tuesday it is Bucharest Computer Vision Reading Group time! Find us every Tuesday same address : 10:00 AM, in room 303, 3rd floor, in the PRECIS building at the Universitatea POLITEHNICA din București entrance from Bulevardul Maniu


12 December 2023

Very excited for our second meeting of our Bucharest Computer Vision Reading Group together with dear students, professors and colleagues with fascinating discussions  and new ideas in research!


Prof. Univ. Dr. Marius Leordeanu presented a very interesting paper from CVPR 2023: Kang, Dahyun, Piotr Koniusz, Minsu Cho, and Naila Murray. "Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19627-19638. 2023.


You can find the paper below:

https://openaccess.thecvf.com/.../Kang_Distilling_Self...

If it is Tuesday it is Bucharest Computer Vision Reading Group time! Find us every Tuesday same address : 10:00 AM, in room 303, 3rd floor, in the PRECIS building at the University Politehnica of Bucharest, entrance from Bulevardul Maniu


28 November 2023

Dear friends, we are very excited to restart the Bucharest Computer Vision Reading Group that we started together in 2016!


Our meetings are open to all those who want to be up to date with the new discoveries in computer vision, deep learning, natural language processing and everything related to artificial intelligence! We invite and welcome all those who are passionate about science to discover the best scientific works in the field, for interesting discussions and new ideas for research and creation for the following scientific works and new projects for the AI ​​community in Romania!

Let's all gather around some wonderful ideas that can make our world better!


We are waiting for you every Tuesday starting at 10:00 AM, in room 303, 3rd floor, in the PRECIS building at the University Politehnica of Bucharest!


Here we are at our first meeting after more than 3 years, happy to be together in large number, curious and passionate about ideas and interesting discussions


21 November 2023

Professor Univ. Dr. Marius Leordeanu  after today's presentation

21 November 2023


One of our first meetings at The Institute of Mathematics of the Romanian Academy!