Robot Gripper

Summary: We trained the proposed model using 4 out of 5 video sequences of a manipulator performing tabletop manipulation. A video sequence that contained previously unseen counterclockwise motion was held out from training for testing. Each dataset considers different locations of all gears while the visiting arm follows the same order of visitation. Thus the focus of the task is to measure the capacity of the proposed approach to relate visual cues to its prediction. In such settings, standard prediction models would be expected to overfit to the training data and fail to predict the intended motion correctly unless they capture environmental cues.

Component R

batch size: 50
number of epochs: 1200
input: batch_size x W=64 x H=64 x 3
l_t: 64
learning rate: 0.001
optimiser: Adam
KL tolerance: 0.5

Component D

batch_size: 1 (50 during training D)
number of epochs: 100
RNN size: 64
Sequence length: 1 (10 during training D)
maximum number of agents in 1 frame: 1
Decay rate: 0.95
Learning rate: 0.003
Gradient clip: 10
temperature 𝜏 : training: 1.0, inference: 0.35
Mixture of Gaussians: 2
optimiser: Adam
VAE Embedding size: 64
l_t: sampled from mu and sigma obtained from a pre-trained R

Component B

batch_size: 10
number of epochs: 100
RNN size: 128
Sequence length at training: 8
maximum number of agents in 8 consecutive frames: 1
λ for L2 regularisation: 0.0005
Learning rate: 0.0005
Gradient clip: 10
optimiser: RMSProp
Embedding size: 64

Visual Results

Page updated

Google Sites

Report abuse