Realistic Synthetic Data Generation for Fruit Detection and Precise Fruit Pose Estimation

Precision agriculture enables farmers to collect high-resolution data about the status of their crops and helps them to make crucial decisions for irrigation, harvesting, sales etc. In contrast, phenotyping methods enable accurate characterization of plant traits leading to the development of better plant varieties. Specialty crops (fruits, vegetables, flowers etc.) are particularly well suited for both precision farming and phenotyping studies, because of their high value, management costs, and variability in growth. The goal of my research is to develop computer vision and deep learning algorithms to create a general infrastructure for handling precision farming and phenotyping tasks for specialty crops.

One of the main bottlenecks for both these applications is the lack of a convenient yield monitoring system. In practice, for yield estimation - images of trees are captured along an arbitrary/predetermined path from 1 – 3 meters away for each row. Extracting intended information from these captured images poses several computer vision challenges. It includes fruit detection, counting fruits from large clusters, recovering underlying 3D geometry for tracking fruits across different image frames, and continuously changing illumination. In contrast, phenotyping studies and other precision farming tasks like pruning, picking - close-up images are necessary. Consequently, in addition to the computer vision problems, we need to address planning (where to take the image from) and manipulation (how to pick fruits) tasks. Existing computer vision algorithms do not generalize well to specialty crops. My previous work has laid the foundation for most of these tasks such as fruit detection, tracking, counting, recovering the underlying geometry and view planning for covering the fruits. Our findings indicate that fruit detection is the most fundamental and precursory to all these tasks. Existing detection methods including my previous work, perform well with an adequate amount of training data. However, labeling fruit boundaries precisely in images is a very challenging task. Even for adept graduate students of computer vision, it takes around 10 minutes or more to label all the fruits on an average apple tree.

My idea is to use synthetic data for the purpose of training deep networks. I can create synthetic data for any fruit with 3D rendering software. As we design the model ourselves, the labeling comes for free. Such data though has a significant domain gap with real data. A network trained on synthetic data is not going to perform well on real data. My plan is to reduce this domain gap with deep neural networks. Generative Adversarial Networks (GANs) (a system of two neural networks competing against each other) are a type of deep learning technique capable of transforming an input distribution to a target distribution (in our case, from a set of synthetic images to a set of real images). In recent times, they have been very successful for many applications such as image stylization, coloring etc. Existing GANs though do not preserve the underlying labels and therefore such data cannot be used for training. My idea is to design a GAN network that preserves the underlying labels. State-of-the-art GANs formulate realistic data generation as an image to image translation problem. With synthetic data though, as we have the exact model of each fruit; we can obtain the pixels belonging to all the fruits (class); the bounding boxes (instance) and pixels belonging to each fruit (mask). Instead of formulating an image to image translation problem, I plan on translating each instance individually and composting the transformed results to create the new translated image and thereby preserve the labels. This network will enable us to stylize the synthetic data to any fruit, lighting condition and environment. Consequently, we will not need to label fruit boundaries anymore.

In addition to the novel domain adaptation for fruit detection, my plan is to perform precise fruit localization and reconstruction. 3D reconstructions from existing methods are not good enough for precise fruit pose and fruit size estimation. My previous work provides us with the ability to track fruits across multiple frames as well as provide an initial 3D reconstruction. Now, I intend to find a precise geometric representation of these fruits within a cluster. Many fruits (such as apples, oranges, peaches, etc.) can be parameterized as quadric surfaces (ellipsoids, spheres, cylinders etc.). Consequently, the fruit clusters can be represented by a collection of quadrics. My idea is to use a recurrent neural network (RNN) (a type of neural network capable of operating on sequences) that takes the tracked images of a cluster of fruits as input and outputs a collection of quadrics representing them. To the crux of the method is a novel loss function (a measure of how good the learning method is doing) that utilizes the camera position and orientation, initially 3D reconstruction and the tracked images of the fruits in an indirect manner. Apart from the tracked images, the camera poses, depths and masks for the detected fruits are only going to be used in training time. Therefore, the network will learn a latent representation of the fruits as well as camera poses and depth. Afterward, I intend to formulate a novel quadric based bundle adjustment (a specific type of optimization for refining visual reconstruction) to fine-tune the quadric representations where we use the network output as the initial solution.

My ultimate goal is to advance state-of-the-art automation for precision agriculture and data science. This research is a big step toward that. I envision the instance preserving GANs as the main workhorse for generating training data (for different domains beyond agriculture- as long as we can obtain 3D models close to reality). Precise fruit size and pose estimation will lead to better yield estimates and help the picking robots plan effectively. In short, the successful completion of this research will ensure a more sustainable infrastructure for intelligent automation for precision farming and phenotyping.