Short CV

I'm a Full Professor at the University of Bonn, leading the group on Agricultural Engineering and Robotics. My research focus is on vision systems that can automate detection and identification of plants and crops in highly challenging environments such as Agriculture. I've worked on a range of Pattern Recognition and Computer Vision topics including: deep learning, session variability modelling, factor analysis models, and local feature modelling. I have applied these general methods to a variety of tasks including: fine-grained (species-level) classification, object (fruit, plants and people) segmentation, face and bi-modal recognition and action recognition. Below is a brief summary of some of the work I've been involved with, none of this work would have been possible without the fantastic people that I've worked with.

Efficient Deep Learning Approaches: applied to agricultural robotics and automation

We developed multiple deep learning approaches that can be deployed on resource limited environments, such as robots. These deep learning systems have been used for a variety of tasks including: weed classification and semantic segmentation. This work exploited the idea of model distillation to learn a less complex and faster student network from a more complicated teacher network using low resolution images for the student network and high resolution images teacher network; see the top image on the right. When combined with the previous idea of MixDCNNs this allowed us to develop a framework to tradeoff complexity and speed against accuracy for the final student networks. Systems trained using these procedures have been deployed and demonstrated on Harvey for sweet pepper harvesting, see the images on the right for an example of peduncle segmentation, and has the potential to be deployed on other robotic platforms such as AgBot II.

Mixtures and Subset Learning for Deep Networks

This work explored methods to improve classification using deep learning by dividing the data into subsets, with a particular emphasis for the task of fine-grained (species-level) classification. The underlying idea was that classification is often made challenging due to small inter-class variations and large intra-class variations. To overcome this, we explored methods to group visually similar classes together and then learn a network for each sub-group, see the top image on the right. This formed the basis of a range of work in subset learning and feature learning that finally led to a formulation termed a “Mixture of Deep Convolutional Neural Networks” (MixDCNNs). This formulation of parallel deep networks enables them to be trained jointly, end-to-end, in a single large network structure (see the bottom image on the right) to achieve state-of-the-art performance. This idea was then employed with compressed DCNNs to provide a tradeoff between classification performance and speed.

Occluded Crop Detection, Quality Estimation and Tracking

Detecting crops in the field is a key first step for enabling automation (e.g. harvesting) and real-time situational awareness for farmers. It is also a very challenging robotic vision problem as the crop can be occluded by leaves (see the top image on the right with sweet pepper example) and can be similar in colour to the background (green on green). To solve this, we made use of multi-spectral information in conjunction with a range of methods such as conditional random fields using traditional features (colour, local binary patterns, etc.) through to the use of deep learning. Recently, we have been exploring how deep learning can be used to quickly and efficiently train a detection system for new crops (fruits), coined the DeepFruits system. It was shown that annotating as few as 25 images was sufficient to obtain competitive performance. We also explored novel methods to then combine multi-spectral information; see the bottom images on the right. Current work is exploring how to extend the DeepFruits approach to estimate fruit ripeness, using a parallel network structure, and due to the systems high accuracy demonstrate how it could be used to perform individual fruit tracking from video only, using a tracking-via-detection framework.

Factor Analysis Models and Session Variability Modelling

This work extended the use of session variability modelling to face recognition and fine-grained classification. Session variability assumes that a signal is compressed of the underlying identity (class or face identity) along with noise. The aim of session variability modelling is to model the noise, without explicitly defining the type of noise; examples of different session can be seen in the images on the right where there are images of the same person in different conditions. Adapting this idea from speaker recognition to face recognition led a best journal paper award. Later work then explored how to derive a scalable model for probabilistic linear discriminant analysis, which uses a similar formulation, to achieve linear rather than quadratic complexity.