Abstract. We propose a novel framework to learn the spatiotemporal variability in longitudinal 3D shape data sets, which contain observations of objects that evolve and deform over time. This problem is challenging since surfaces come with arbitrary parameterizations and thus, they need to be spatially registered. Also, different deforming objects, also called 4D surfaces, evolve at different speeds and thus they need to be temporally aligned. We solve this spatiotemporal registration problem using a Riemannian approach. We treat a 3D surface as a point in a shape space equipped with an elastic Riemannian metric that measures the amount of bending and stretching that the surfaces undergo. A 4D surface can then be seen as a trajectory in this space. With this formulation, the statistical analysis of 4D surfaces can be cast as the problem of analyzing trajectories embedded in a nonlinear Riemannian manifold. However, performing the spatiotemporal registration, and subsequently computing statistics, on such nonlinear spaces is not straightforward as they rely on complex nonlinear optimizations. Our core contribution is the mapping of the surfaces to the space of Square-Root Normal Fields where the L2 metric is equivalent to the partial elastic metric in the space of surfaces. Thus, by solving the spatial registration in the SRNF space, the problem of analyzing 4D surfaces becomes the problem of analyzing trajectories embedded in the SRNF space, which has a Euclidean structure. In this paper, we develop the building blocks that enable such analysis. These include: (1) the spatiotemporal registration of arbitrarily parameterized 4D surfaces in the presence of large elastic deformations and large variations in their execution rates; (2) the computation of geodesics between 4D surfaces; (3) the computation of statistical summaries; and (4) the synthesis of random 4D surfaces.
Publisher site | PDF (ArXiv) | Project Page and Code | Video Tutorial
Abstract. Modern deep learning methods constitute incredibly powerful tools to tackle a myriad of challenging problems. However, since deep learning methods operate as black boxes, the uncertainty associated with their predictions is often challenging to quantify. Bayesian statistics offer a formalism to understand and quantify the uncertainty associated with deep neural network predictions. This tutorial provides an overview of the relevant literature and a complete toolset to design, implement, train, use and evaluate Bayesian Neural Networks, i.e., Stochastic Artificial Neural Networks trained using Bayesian methods.
PDF (Publisher) | PDF (Arxiv) with Supplementary Material | Project Page
Abstract. Estimating depth from RGB images is a long-standing ill-posed problem, which has been explored for decades by the computer vision, graphics, and machine learning communities. Among the existing techniques, stereo matching remains one of the most widely used in the literature due to its strong connection to the human binocular system. Traditionally, stereo-based depth estimation has been addressed through matching hand-crafted features across multiple images. Despite the extensive amount of research, these traditional techniques still suffer in the presence of highly textured areas, large uniform regions, and occlusions. Motivated by their growing success in solving various 2D and 3D vision problems, deep learning for stereo-based depth estimation has attracted growing interest from the community, with more than 150 papers published in this area between 2014 and 2019. This new generation of methods has demonstrated a significant leap in performance, enabling applications such as autonomous driving and augmented reality. In this article, we provide a comprehensive survey of this new and continuously growing field of research, summarize the commonly used pipelines, and discuss their benefits and limitations. In retrospect of what has been achieved so far, we conjecture what the future may hold for deep learning-based stereo for depth estimation research.
PDF (Publisher) | Code | Project Page
Abstract. We propose a deep reinforcement learning-based solution for the 3D reconstruction of objects of complex topologies from a single RGB image. We use a template-based approach. However, unlike previous template-based methods, which are limited to the reconstruction of 3D objects of fixed topology, our approach learns simultaneously the geometry and topology of the target 3D shape in the input image. To this end, we propose a neural network that learns to deform a template to fit the geometry of the target object. Our key contribution is a novel reinforcement learning framework that enables the network to also learn how to adjust, using pruning operations, the topology of the template to best fit the topology of the target object. We train the network in a supervised manner using a loss function that enforces smoothness and penalizes long edges in order to ensure high visual plausibility of the reconstructed 3D meshes. We evaluate the proposed approach on standard benchmarks such as ShapeNet, and in-the-wild using unseen real-world images. We show that the proposed approach outperforms the state-of-the-art in terms of the visual quality of the reconstructed 3D meshes, and also generalizes well to out-of-category images.
PDF (Publisher) | PDF (Arxiv) | Project Page
Abstract. Most weed species can adversely impact agricultural productivity by competing for nutrients required by high-value crops. Manual weeding is not practical for large cropping areas. Many studies have been undertaken to develop automatic weed management systems for agricultural crops. In this process, one of the major tasks is to recognise the weeds from images. However, weed recognition is a challenging task. It is because weed and crop plants can be similar in colour, texture and shape which can be exacerbated further by the imaging conditions, geographic or weather conditions when the images are recorded. Advanced machine learning techniques can be used to recognise weeds from imagery. In this paper, we have investigated five state-of-the-art deep neural networks, namely VGG16, ResNet-50, Inception-V3, Inception-ResNet-v2 and MobileNetV2, and evaluated their performance for weed recognition. We have used several experimental settings and multiple dataset combinations. In particular, we constructed a large weed-crop dataset by combining several smaller datasets, mitigating class imbalance by data augmentation, and using this dataset in benchmarking the deep neural networks. We investigated the use of transfer learning techniques by preserving the pre-trained weights for extracting the features and fine-tuning them using the images of crop and weed datasets. We found that VGG16 performed better than others on small-scale datasets, while ResNet-50 performed better than other deep networks on the large combined dataset. .
Abstract. Australia has a reputation for producing a reliable supply of high-quality barley in a contaminant-free climate. As a result, Australian barley is highly sought after by malting, brewing, distilling, and feed industries worldwide. Barley is traded as a variety-specific commodity on the international market for food, brewing and distilling end-use, as the intrinsic quality of the variety determines its market value. Manual identification of barley varieties by the naked eye is challenging and time-consuming for all stakeholders, including growers, grain handlers and traders. Current industrial methods for identifying barley varieties include molecular protein weights or DNA based technology, which are not only time-consuming and costly but need specific laboratory equipment. On grain receival, there is a need for efficient and low-cost solutions for barley classification to ensure accurate and effective variety segregation. This paper proposes an efficient deep learning-based technique that can classify barley varieties from RGB images. Our proposed technique takes only four milliseconds to classify an RGB image. The proposed technique outperforms the baseline method and achieves a barley classification accuracy of 94% across 14 commercial barley varieties (some highly genetically related).
PDF | Code | Project Page
Abstract. 3D reconstruction is a longstanding ill-posed problem, which has been explored for decades by the computer vision, computer graphics, and machine learning communities. Since 2015, image-based 3D reconstruction using convolutional neural networks (CNN) has attracted increasing interest and demonstrated impressive performance. Given this new era of rapid evolution, this article provides a comprehensive survey of the recent developments in this field. We focus on the works which use deep learning techniques to estimate the 3D shape of generic objects either from single or multiple RGB images. We organize the literature based on the shape representations, the network architectures, and the training mechanisms they use. While this survey is intended for methods that reconstruct generic objects, we also review some of the recent works which focus on specific object classes such as human body shapes and faces. We provide an analysis and comparison of the performance of some key papers, summarize some of the open problems in this field, and discuss promising directions for future research.
PDF | Code | Project Page
Abstract. Cost-based image patch matching is at the core of various techniques in computer vision, photogrammetry, and remote sensing. When the subpixel disparity between the reference patch in the source and target images is required, either the cost function or the target image have to be interpolated. While cost-based interpolation is the easiest to implement, multiple works have shown that image-based interpolation can increase the accuracy of the subpixel matching, but usually at the cost of expensive search procedures. This, however, is problematic, especially for very computation-intensive applications such as stereo matching or optical flow computation. In this paper, we show that closed-form formulae for subpixel disparity computation for the case of one-dimensional matching, e.g., in the case of rectified stereo images where the search space is of one dimension, exists when using the standard NCC, SSD, and SAD cost functions. We then demonstrate how to generalize the proposed formulae to the case of high dimensional search spaces, which is required for unrectified stereo matching and optical flow extraction. We also compare our results with traditional cost volume interpolation formulae as well as with state-of-the-art cost-based refinement methods, and show that the proposed formulae bring a small improvement over the state-of-the-art cost-based methods in the case of one-dimensional search spaces, and a significant improvement when the search space is two dimensional.
PDF | Code | Project Page
Abstract. How can one analyze detailed 3D biological objects, such as neurons and botanical trees, that exhibit complex geometrical and topological variation? In this paper, we develop a novel mathematical framework for representing, comparing, and computing geodesic deformations between the shapes of such tree-like 3D objects. A hierarchical organization of subtrees characterizes these objects -- each subtree has the main branch with some side branches attached -- and one needs to match these structures across objects for meaningful comparisons. We propose a novel representation that extends the Square-Root Velocity Function (SRVF), initially developed for Euclidean curves, to tree-shaped 3D objects. We then define a new metric that quantifies the bending, stretching, and branch sliding needed to deform one tree-shaped object into the other. Compared to the current metrics, such as the Quotient Euclidean Distance (QED) and the Tree Edit Distance (TED), the proposed representation and metric capture the full elasticity of the branches (i.e., bending and stretching) as well as the topological variations (i.e., branch death/birth and sliding). It completely avoids the shrinkage that results from the edge collapse and node split operations of the QED and TED metrics. We demonstrate the utility of this framework in comparing, matching, and computing geodesics between biological objects such as neurons and botanical trees. The framework is also applied to various shape analysis tasks: (i) symmetry analysis and symmetrization of tree-shaped 3D objects, (ii) computing summary statistics (means and modes of variations) of populations of tree-shaped 3D objects, (iii) fitting parametric probability distributions to such populations, and (iv) finally synthesizing novel tree-shaped 3D objects through random sampling from estimated probability distributions.
PDF | Code | Project Page
Abstract. We propose a novel deep reinforcement learning-based approach for 3D object reconstruction from monocular images. Prior works that use mesh representations are template based. Thus, they are limited to the reconstruction of objects that have the same topology as the template. Methods that use volumetric grids as intermediate representations are computationally expensive, which limits their application in real-time scenarios. In this paper, we propose a novel end-to-end method that reconstructs 3D objects of arbitrary topology from a monocular image. It is composed of (1) a Vertex Generation Network (VGN), which predicts the initial 3D locations of the object's vertices from an input RGB image, (2) a differentiable triangulation layer, which learns in a non-supervised manner, using a novel reinforcement learning algorithm, the best triangulation of the object's vertices, and finally, (3) a hierarchical mesh refinement network that uses graph convolutions to refine the initial mesh. Our key contribution is the learnable triangulation process, which recovers in an unsupervised manner the topology of the input shape. Our experiments on ShapeNet and Pix3D benchmarks show that the proposed method outperforms the state-of-the-art in terms of visual quality, reconstruction accuracy, and computational time.
PDF | Code | Project Page
Abstract. The rapid advances in Deep Learning (DL) techniques have enabled rapid detection, localization, and recognition of objects from images or videos. DL techniques are now being used in many applications related to agriculture and farming. Automatic detection and classification of weeds can play an important role in weed management and so contribute to higher yields. Weed detection in crops from imagery is inherently a challenging problem because both weeds and crops have similar colors (‘green-on-green’), and their shapes and texture can be very similar at the growth phase. Also, a crop in one setting can be considered a weed in another. In addition to their detection, the recognition of specific weed species is essential so that targeted controlling mechanisms (e.g. appropriate herbicides and correct doses) can be applied. In this paper, we review existing deep learning-based weed detection and classification techniques. We cover the detailed literature on four main procedures, i.e., data acquisition, dataset preparation, DL techniques employed for detection, location and classification of weeds in crops, and evaluation metrics. We found that most studies applied supervised learning techniques, they achieved high classification accuracy by fine-tuning pre-trained models on any plant dataset, and past experiments have already achieved high accuracy when a large amount of labelled data is available.
PDF | Code | Project Page
Abstract. Generating textual descriptions of images has been an important topic in computer vision and natural language processing. A number of techniques based on deep learning have been proposed on this topic. These techniques use human-annotated images for training and testing the models. These models require a large number of training data to perform at their full potential. Collecting human-generated images with associative captions is expensive and time-consuming. In this paper, we propose an image captioning method that uses both real and synthetic data for training and testing the model. We use a Generative Adversarial Network (GAN) based text to image generator to generate synthetic images. We use an attention-based image captioning method trained on both real and synthetic images to generate the captions. We demonstrate the results of our models using both qualitative and quantitative analysis on popularly used evaluation metrics. We show that our experimental results achieve two-fold benefits of our proposed work: i) it demonstrates the effectiveness of image captioning for synthetic images, and ii) it further improves the quality of the generated captions for real images, understandably because we use additional images for training.
PDF | Code | Project Page
Abstract. Estimating depth from RGB images is a long-standing ill-posed problem, which has been explored for decades by the computer vision, graphics, and machine learning communities. Among the existing techniques, stereo matching remains one of the most widely used in the literature due to its strong connection to the human binocular system. Traditionally, stereo-based depth estimation has been addressed through matching hand-crafted features across multiple images. Despite the extensive amount of research, these traditional techniques still suffer in the presence of highly textured areas, large uniform regions, and occlusions. Motivated by their growing success in solving various 2D and 3D vision problems, deep learning for stereo-based depth estimation has attracted growing interest from the community, with more than 150 papers published in this area between 2014 and 2019. This new generation of methods has demonstrated a significant leap in performance, enabling applications such as autonomous driving and augmented reality. In this article, we provide a comprehensive survey of this new and continuously growing field of research, summarize the commonly used pipelines, and discuss their benefits and limitations. In retrospect of what has been achieved so far, we conjecture what the future may hold for deep learning-based stereo for depth estimation research.
PDF (Publisher) | PDF (Arxiv) | Code | Project Page
Abstract. The root is an important organ of a plant since it is responsible for water and nutrient uptake. Analyzing and modeling variabilities in the geometry and topology of roots can help in assessing the plant’s health, understanding its growth patterns, and modeling relations between plant species and between plants and their environment. In this article, we develop a framework for the statistical analysis and modeling of the geometry and topology of plant roots. We represent root structures as points in a tree-shape space equipped with a metric that quantifies geometric and topological differences between pairs of roots. We then use these building blocks to compute geodesics, i.e., optimal deformations under the metric between root structures, and to perform statistical analysis on root populations. We demonstrate the utility of the proposed framework through an application to a dataset of wheat roots grown in different environmental conditions. We also show that the framework can be used in various applications including classification and regression.
PDF (Publisher) | PDF (Arxiv) | Code (Github) | Project Page
Abstract. In this article, we introduce a family of elastic metrics on the space of parametrized surfaces in 3D space using a corresponding family of metrics on the space of vector-valued one-forms. We provide a numerical framework for the computation of geodesics with respect to these metrics. The family of metrics is invariant under rigid motions and reparametrizations; hence, it induces a metric on the “shape space” of surfaces. This new class of metrics generalizes a previously studied family of elastic metrics and includes in particular the Square Root Normal Field (SRNF) metric, which has been proven successful in various applications. We demonstrate our framework by showing several examples of geodesics and comparing our results with earlier results obtained from the SRNF framework.