PhD positions in Machine Learning and Computer Vision in Queen Mary University of London


1 PhD position in the area of Computer Vision and Machine Learning for the analysis of multimedia data is available in the group of Multimedia and Vision in the School of Electronic Engineering and Computer Science in Queen Mary University of London. At a methodological level, the work will focus on the development of novel Machine Learning methods for learning from multimodal data, on learning with efficient architectures and on learning from few or no annotations. For more details please see the end of this page.


For further info, please contact Prof. Patras, i.patras@qmul.ac.uk and/or Dr Tzimiropoulos g.tzimiropoulos@qmul.ac.uk with an email with subject that includes the string: [PhD-2023]


About the School of Electronic Engineering and Computer Science at Queen Mary

The PhD Studentship will be based in the School of Electronic Engineering and Computer Science (EECS) at Queen Mary University of London. Queen Mary University of London is a research intensive university and a member of the Russel group. The school of EECS is 8th in the UK for quality of computer science research (REF 2021) and 7th in the UK for quality of electronic engineering research (REF 2021). It is in the top 100 universities in Computer Science (Time Higher Education ranking). The School is a dynamic community of approximately 350 PhD students and 80 research assistants.

Team

The student will be based in the Multimedia and Vision group in the school of EECS. The school has one of the largest teams in Computer Vision in the UK and a very strong team in Computational Linguistics. For more information please see:


For further information about research in the school of Electronic Engineering and Computer Science, please visit: http://eecs.qmul.ac.uk/research/.


Eligibility

  • The candidate should hold, or is expected to obtain an MSc in the Electronic Engineering, Computer Science, or a closely related discipline.

  • The positions is available to China Scholarship Council applicants.



Computing Infrastructure

The team has a Deep Learning computing infrastructure with over 256 CPU cores, 6 large GPU servers with 175,248 CUDA (GPU) cores and 36TB of storage.



Projects

Multimodal Machine Learning for detection of missinformation/dissinformation


In this project we will address the problem of detection and localisation of multimedia content (i.e., images, video, and text) that spreads misinformation and/or disinformation. We will address this as an incremental retrieval problem in which a human expert (e.g., a journalist) is involved in the search of multimodal data (e.g., videos, images and/or text) and, in collaboration with the retrieval engine, annotates as positive (i.e., fake/manipulated/out of context) retrieved items so as to refine the search results. This is in contrast to the vast majority of works in the field that consider the problem either using a single modality (predominantly text) and as a supervised learning problem that assumes the existence of a large annotated dataset of items labeled as spreading misinformation or not. At a methodological level, the work will focus on the development of novel Machine Learning methods at the crossroads of vision and language and on the work of Patras and Kordopatis on Fine-Grained video similarity.


Synthetic Data: Learning and generation

In this project we will address the problem of learning and generation from/of synthetic data. The project will build on recent works of Patras, Tzelepis, Oldfield and Tzimiropoulos in BMVC, ICCV, ICLR.


Multi-modal Understanding of Emotions

This project is in the area of Computer Vision and Machine Learning for recognizing human non-verbal behaviour and emotions. Despite advances in Deep Learning, emotion recognition technology is not good enough to be part of a real-world human-machine interaction system. This project will go beyond the the bulk of existing research efforts focusing on single-modal emotion perception (e.g. using face only, or audio only) by undertaking fundamental research in multi-modal video perception and deep learning in order to advance the state-of-the-art in emotion recognition, building upon prior work by Tzimiropoulos & Patras (the supervisors).



References

[1] M Bishay, G Zoumpourlis, I Patras, “TARN: Temporal Attentive Relation Network for Few-Shotand Zero-Shot Action Recognition”, British Machine Vision Conference, Sept. 2019.

[2] Giorgos Kordopatis-Zilos, Christos Tzelepis, Symeon Papadopoulos, Ioannis Kompatsiaris, Ioannis Patras, “DnS: Distill-and-Select for Efficient and Accurate Video Indexing and Retrieval”, Int'l Journal of Computer Vision, 2022