Pravakar Roy - Research Projects

Semantics-Aware Image to Image Translation and Domain Transfer (Paper)

Image to image translation is the problem of transferring an image from a source domain to a target domain. We present a new method to transfer the underlying semantics of an image even when there are geometric changes across the two domains. Specifically, we present a Generative Adversarial Network (GAN) that can transfer semantic information presented as segmentation masks. Our main technical contribution is an encoder-decoder based generator architecture that jointly encodes the image and its underlying semantics and translates both simultaneously to the target domain. Additionally, we propose object transfiguration and cross-domain semantic consistency losses that preserve the underlying semantic labels maps. We demonstrate the effectiveness of our approach in multiple object transfiguration and domain transfer tasks through qualitative and quantitative experiments. The results show that our method is better at transferring image semantics than state of the art image to image translation methods.

Semantic Mapping for Orchard Environments by Merging Two-Sides Reconstructions of Tree Rows (Paper)

Measuring semantic traits for phenotyping is an essential but labor-intensive activity in horticulture. To improve the accuracy of such measurements and to automate the process, we consider the problem of building coherent three dimensional (3D) reconstructions of orchard rows. Even though 3D reconstructions of side views can be obtained using standard mapping techniques, merging the two side-views is difficult due to the lack of overlap between the two partial reconstructions. Our first main contribution in this paper is a novel method that utilizes global features and semantic information to obtain an initial solution aligning the two sides. Our mapping approach then refines the 3D model of the entire tree row by integrating semantic information common to both sides, and extracted using our novel robust detection and fitting algorithms. Next, we present a vision system to measure semantic traits from the optimized 3D model that is built from the RGB or RGB-D data captured by only a camera. Specifically, we show how canopy volume, trunk diameter, tree height and fruit count can be automatically obtained in real orchard environments.

A Comparative Study of Fruit Detection and Counting Methods for Yield Mapping in Apple Orchards (Paper)

We present new methods for apple detection and counting based on recent deep learning approaches and compare them with state-of-the-art results based on classical methods. Our goal is to quantify performance improvements by neural network-based methods compared to methods based on classical approaches. We evaluate the performances of three fruit detection methods and two fruit counting methods on six datasets. Results indicate that the classical detection approach still outperforms the deep learning based methods in the majority of the datasets. For fruit counting though, the deep learning based approach performs better for all of the datasets.

Vision-Based Preharvest Yield Mapping for Apple Orchards (Paper)

We present an end-to-end computer vision system for mapping yield in an apple orchard using images captured from a single camera. Our main technical contributions are - a semi-supervised clustering algorithm that utilizes colors to identify apples and - an unsupervised clustering method that utilizes spatial properties to estimate fruit counts from apple clusters having arbitrarily complex geometry. Additionally, we utilize camera motion to merge the counts across multiple views. Results indicate that the detection method achieves F1-measure .95 -.97 for multiple color varieties and lighting conditions. The counting method achieves an accuracy of 89% - 98%. Additionally, we report merged fruit counts from both sides of the tree rows. Our yield estimation method achieves an overall accuracy of 91.98% - 94.81% across different datasets.

Registering Reconstructions of the Two Sides of Fruit Tree Rows (Paper)

We consider the problem of building accurate three dimensional (3D) reconstructions of orchard rows. This problem arises in many applications including yield mapping and measuring traits (e.g. trunk diameters) for phenotyping. While 3D reconstructions of side views can be obtained using standard methods, merging the two side-views is difficult due to the lack of overlap between the two partial reconstructions. We present a novel method that utilizes global features to constrain the solution. Specifically, we use information from the silhouettes and the ground plane for alignment.

Apple Counting using Convolutional Neural Networks (Paper)

Estimating accurate and reliable fruit and vegetable counts from images in real-world settings such as orchards is a challenging problem that has received significant recent attention. In practice, fruits are often clustered together. Therefore, methods that only detect apples fail to offer general solutions to estimate accurate fruit counts. In this work, we formulate fruit counting from images as a multi-class classification problem and solve it by training a Convolutional Neural Network. We first evaluate the per-image accuracy of our method and compare it with a state of the art method based on Gaussian Mixture Models (GMMs) over four test datasets. Our network outperforms it in three out of four datasets with a maximum of 94% accuracy. Next, we use the method to estimate yield for two datasets for which we have ground truth. It is 96 - 97% accurate.

Active View Planning for Counting Apples in Orchards (Paper)

We consider an agricultural automation scenario where a robot, equipped with a camera mounted on a manipulator, is charged with counting the number of apples in an orchard. We focus on the subtask of planning views so as to accurately estimate the number of apples in an apple cluster. We present a method to efficiently enumerate combinatorially distinct world models and to compute the most likely model from one or more views. These are incorporated into single and multi-step planners. We evaluate these planners in simulation as well as with experiments on a real robot.

Vision-Based Apple Counting and Yield Estimation

Vision-Based Apple Counting and Yield Estimation (Paper)

We present a fruit counting algorithm which takes segmented and registered images of apple clusters as input. It outputs number and location of individual apples in each cluster. Our primary technical contributions are a representation based on a mixture of Gaussians, and a novel selection criterion to choose the number of components in the mixture. The method is experimentally verified on four different datasets using images acquired by a vision platform mounted on an aerial robot, a ground vehicle and a hand-held device. The accuracy of the counting algorithm itself is 91% . It achieves 81–85% accuracy coupled with segmentation and registration which is significantly higher than existing image based methods.

Surveying Apple Orchards with a Monocular Vision System (Paper)

We present computer vision algorithms to collect yield related information in an apple orchard using images collected from a single camera. The goal of our system is to give farmers the capability to use their phones or digital cameras to record images and obtain yield related parameters. There are two challenges in this setup which necessitate novel methods: (i) It is very difficult to generate dense matches using standard image features. (ii) The constrained geometry of the setup causes existing structure from motion algorithms to fail. We present a novel piece-wise incremental structure from motion technique to register and reconstruct the apples which is used for extracting count and diameter information. We validate our approach by presenting results from multiple field trials.

Semantic Mapping of Orchard

Semantic Mapping of Orchards (Paper)

We present a method to construct a semantic map of an apple orchard using a LIDAR and a camera rigidly attached to each other. The system is able to capture the map as a standalone sensor which is light-weight and can be mounted on a variety of platforms. At the geometry level, we present a new method to associate image features captured by the camera with 3D points captured by the LIDAR. We then use this method to register 3D point-clouds onto a common frame. We show that our association method yields superior registration performance compared to common methods which work in indoor or urban settings. At the semantic level, the apples are identified as distinct objects. Their locations and diameters are extracted as relevant attributes. As an example, a semantic map of an orchard row is constructed.

Robotic Surveying of Apple Orchards (Paper)

We present a novel system for surveying apple orchards by counting apples and estimating apple diameters. Existing surveying systems resort to active sensors, or high-resolution close-up images under controlled lighting conditions. The main novelty of our system is the use of a traditional low resolution stereo-system mounted on a small aerial vehicle. Vision processing in this set up is challenging because apples occupy a small number of pixels and are often occluded by either leaves or other apples. After presenting the system setup and our view-planning methodology, we present a method to match and combine multiple views of each apple to circumvent these challenges and report results from field trials. We conclude the paper with an experimental analysis of the diameter estimation error.