Locomotion, Navigation, and Classification with Autonomous Intelligent Microrobots

Nathan O. Lambert , Farhan Toddywala, Brian Liao , Eric Zhu, Lydia Lee , and Kristofer S. J. Pister

Contact: nol@berkeley,edu

Paper Link.

Abstract:

Building intelligent autonomous systems at any scale is challenging. The sensing and computation constraints of a microrobot platform make the problems harder. We present improvements to learning-based methods for on-board learning of locomotion, classification, and navigation of microrobots. We show how simulated locomotion can be achieved with model- based reinforcement learning via on-board sensor data distilled into control. Next, we introduce a sparse, linear detector and a Dynamic Thresholding method to FAST Visual Odometry for improved navigation capable of better performance in the noisy regime of mm scale imagery. We end with a new image classifier capable of classification with fewer than one million multiply-and-accumulate (MAC) operations by combining fast downsampling, efficient layer structures and hard activation functions. These are promising steps toward using state-of-the- art algorithms in the power-limited world of edge-intelligence and microrobots.

Experiments:

SLAM

To evaluate SLAM, we train on the KITTI Odometry Dataset of $1200 \times 375$ 8-bit resolution greyscale images. We use sequences 3, 6 and 0 to represent easy, medium and hard trajectories. Note that a "difficult" trajectory is defined as one with more turns, sharper turns, and longer sequences of moving in a straight line right after a turn, since these instances tend to be where estimates diverge.

We fix the FAST threshold at 50 for all experiments, chosen by cross-validating across accuracy with multiple thresholds. We fix the scale of pose estimation for all experiments to the ground truth scale.

Our Dynamic Thresholding tunes the FAST threshold online for improved performance, as shown below. For Dynamic Thresholding, we pick a range of interest points from 1000 to 2000, and we pick rates of increase and decrease as 1.1 and 0.9 respectively. The base FAST threshold is also 50.

Our experiments focus on the scenario of images with additional i.i.d. 0 mean Gaussian noise across pixels. simulating on-chip variation of millimeter scale cameras. For our first set of experiments, we vary the standard deviation of the sampled additive Gaussian noise in pixels from 5 to 60. The noise used in our experiments is higher than most found in millimeter scale photography, but could account for other process and computation errors. We measure the Euclidean error between the predicted trajectory and the given ground truth for each sequence.

In a second set of experiments, we model a sequence in which images become dynamically corrupted as the robot moves at different velocities--a varying noise level. Similar to a random walk, we update the Gaussian noise standard deviation by adding a random variable of -1, 0, or 1 from a discrete uniform distribution.

The pixel noise is capped between 0 and an upper limit (where we test upper limits of 15, 30 and 45) to see how performance varies with different ranges of noise intensity during the sequence.

The ratio of standard FAST error over average Dynamic Thresholding FAST MSE (with n = 10) as the upper range of the standard deviation of added dynamic i.i.d. Gaussian increases in the range from 5 to 60 over KITTI sequences 0, 3, 6. Here, a higher ratio implies lower error. The Dynamic Thresholding shows a clear trend of improvement as the noise levels continue to increase beyond an upper leve standard deviation of 30. The standard FAST trajectory MSE is on average 7.3, 44.5, and 5.5 times higher for sequences 0, 3 and 6 than the MSE of the Dynamic Thresholding trajectory when the upper limit of the dynamic noise standard deviation is 30 or greater.

The ratio of standard FAST error over average Dynamic Thresholding FAST MSE (with n = 10) as the standard deviation of added static i.i.d. Gaussian increases in the range from 5 to 60 over KITTI sequences 0, 3, 6. Here, a higher ratio implies lower error. The lower noise levels are less consistent, but still show an improvement with Dynamic Thresholding in FAST when the noise levels are constant. The dynamics thresholding shows a clear trend of improvement as the noise levels continue to increase beyond a standard deviation of 25. When static noise exceeds a standard deviation of 20, the standard FAST trajectory MSE is on average 23.1, 82.9, and 4.7 times higher for sequences 0, 3 and 6 than the MSE of the dynamic thresholding trajectory.

Standard FAST

Dynamic Thresholding



An example mapped trajectory from sequence 0 (about halfway through) with and without Dynamic Thresholding.

The red trajectory in both images represents the ground truth trajectory and the green trajectory represents the estimated trajectory.

On the left is the standard FAST algorithm with a threshold of 50.

On the right is FAST with Dynamic Thresholding.

Unsupervised Learning for Interest Point Detection (SLIPD)

FAST

SLIPD


Our experiments with a Sparse Linear Interest Point Detector (SLIPD) showed that a sparse linear function is not quite expressive enough to compete with FAST. We show an example of how SLIPD performs on sequence 3 under static gaussian noise with a standard deviation of 40. It does not handle sharp turns of 90 degrees or greater quite yet. By comparison, the MSE of SLIPD at a noise level of 40 has an MSE of 8860, whereas FAST at a threshold of 50 gets an MSE of around 1220. A more expressive function like a neural network will likely be necessary for this method to be a viable replacement to FAST.

Low Power Classification

We introduce convolutional layers with larger kernel sizes early on to achieve fast downsampling and decrease the kernel size, allowing us to decrease MAC count. Similar to MobileNet V3, we utilize squeeze-and-excite bottlenecks and hard activation functions

MicroBotNet

MicroBotNet applies Fast-Downsampling to MobileNetV3 including squeeze-and-excite layers, h-swish and h-sigmoid, and inverted-residual and linear-bottlenecks layers. Here, we show a new downsampling schedule of bottleneck layers to meet microrobot computing capacity goals.MicroBotNet has 8× fast-downsampling in the first 6 layers because high dimension layers are the majority of forward pass computation cost. We include the width multiplier α which allows the model to generalize based on one’s MAC computation needs. Width multipliers of ×0.25, ×1.00, and ×0.32 are included as reference. A minimum feature dimension of 4 × 4 is set during downsampling to maintain suitable information capacity in our network, differentiating our down-sampling protocol from what is done on larger network designs.

Model specification of MicroBotNet (SE indicates if a Squeeze-And-Excite is used, s indicates the stride used).

Top-1 is accuracy on CIFAR-10. Comparison of MicroBotNet ×0.32 and ×0.25 with similar architectures.

Top-1 is accuracy on CIFAR-10. Comparison of MicroBotNet ×1.00 with similar architectures.

Tradeoff of MAC Count and Accuracy of different models on CIFAR-10. MicroBotNet ×0.32 achieves 79.35% accuracy while only using 932,886 MACs, improving on FDMobileNet ×0.25 while continuing the trend of accuracy towards low-MACs.

Training Details

We evaluated different neural network models on the CIFAR-10 using an 12GB Nvidia K80 GPU. The number of parameters and MAC operations are calculated with THOP

For pre-processing we use random cropping, random horizontal flipping, and normalization of training and testing images

For training, we use stochastic gradient decent with 0.9 momentum, 5e-4 weight decay, and a learning schedule with a learning rate of 0.1 and a decay of 0.1 every 50 epochs for 200 epochs. We process images with a batch size of 256.