Conditional Driving from Natural Language Instructions
Junha Roh, Chris Paxton, Andrzej Pronobis, Ali Farhadi, Dieter Fox
Conference on Robot Learning (CoRL 2019) [pdf][code][bibtex]
Conference on Robot Learning (CoRL 2019) [pdf][code][bibtex]
Widespread adoption of self-driving cars will depend not only on their safety but largely on their ability to interact with human users. Just like human drivers, self-driving cars will be expected to understand and safely follow natural-language directions that suddenly alter the pre-planned route according to user's preference or in presence of ambiguities, particularly in locations with poor or outdated map coverage. To this end, we propose a language-grounded driving agent implementing a hierarchical policy using recurrent layers and gated attention. The hierarchical approach enables us to reason both in terms of high-level language instructions describing long time horizons and low-level, complex, continuous state/action spaces required for real-time control of a self-driving car. We train our policy with conditional imitation learning from realistic language data collected from human drivers and navigators. Through quantitative and interactive experiments within the CARLA framework, we show that our model can successfully interpret language instructions and follow them safely, even when generalizing to previously unseen environments.
By using Google Cloud Speech API, we transcribed speech into language instructions and our agent drives to follow those instructions. All examples videos with speech-to-text were done in unseen environment (Town02). We used segmentation model to generate the road segmentation from a color image. The agent starts to move after the first instruction from the user.
Interactive driving example with learned segmentation images. The user gave the initial sentence and three extra sentences during driving:
"you ll take a left turn",
"take your right here then a right",
"you can continue going straight for a while",
"you re going to make a left turn here."
The sentences were randomly sampled from the user keyboard input. The model could follow the language instructions in a unseen environment. (In the video, we added audio in order to help you understand when new instructions are given and this is not part of the paper.) The video was generated at the speed of 30 fps.
We conducted preliminary experiments with a two-player driving game for data collection. The two players drive a car to random goals. Each player sees the same view from the driver and can communicate through headsets. One player is a navigator who reads the map and give the driver directions. Another player is the driver, who listens to directions from the navigator. We transcribed the raw speech data into sentences and used them to generate templates for the project. Below, we show three examples of the preliminary experiments with videos and transcripts.
01.002 --> 03.575, oh there's a map all right go straight
07.228 --> 08.392, and you're going to turn right
14.241 --> 16.199, that's good keep going straight
16.200 --> 17.493, and take your first left
22.651 --> 23.600, and slow down
26.724 --> 28.000, all right can you see the green square
29.629 --> 30.200, great
33.665 --> 36.000, okay so now you want to go straight
39.692 --> 41.900, and you'll take a left at the first building
51.096 --> 53.360, that's good that's good keep going straight
55.148 --> 56.200, and take a left
59.424 --> 60.120, and take a right
63.605 --> 64.163, now straight
66.916 --> 68.112, and take a left
70.414 --> 73.951, went a little too far so reverse and back it up
80.617 --> 81.524, all right you doing good
89.755 --> 90.871, go a little bit forward
92.814 --> 93.585, yep there it is
94.322 --> 94.900, you got it
95.405 --> 97.929, okay so now you're going to want to turn around
105.337 --> 107.105, you're going to back it up a little bit
114.309 --> 116.208, looking good no collisions so far
117.027 --> 118.233, all right now you'll take a right
121.448 --> 121.910, yep
124.764 --> 125.539, now go straight
128.621 --> 129.374, now take a left
133.354 --> 133.822, take a right
137.243 --> 139.194, go straight as fast as you can
143.807 --> 144.726, and you'll take a left
147.803 --> 148.328, now right
152.071 --> 154.292, and the exit is right up here
159.202 --> 160.600, congratulations
01.436 --> 02.686, go straight
03.654 --> 04.900, slow down a little bit
04.900 --> 05.862, make a right turn
07.426 --> 10.460, it's going to be a narrow street so go straight
13.328 --> 17.795, and then you're going to make a left turn when you see the first
19.371 --> 20.333, go straight
20.906 --> 21.975, and make a left turn here
25.939 --> 27.006, make a left turn
27.487 --> 28.297, and go straight
28.667 --> 30.273, and do you see the green spot
31.408 --> 31.958, park there
33.779 --> 34.193, okay
35.045 --> 36.220, go straight
37.721 --> 40.817, turn left turn here
42.932 --> 44.479, and another left turn
46.936 --> 49.288, and you're going to make a right turn here
51.556 --> 53.344, and make another left turn
55.400 --> 56.708, go straight
57.618 --> 58.481, just go straight
60.249 --> 61.609, and make another left turn
62.746 --> 63.195, left turn
67.580 --> 69.512, make another right turn right turn
70.684 --> 71.206, go straight
75.408 --> 76.030, skip this
76.030 --> 77.896, and then make a left turn here left turn
78.396 --> 78.843, left turn
80.917 --> 81.490, left
82.957 --> 83.844, and park there
86.909 --> 87.986, wait for me
88.444 --> 89.713, can you go back
90.759 --> 91.500, reverse
98.408 --> 100.262, and then left turn
102.336 --> 104.646, go little more little more
105.679 --> 106.862, go back back
108.059 --> 109.555, back it out a little more
110.692 --> 111.450, good job
112.259 --> 113.926, okay go straight
117.139 --> 119.615, to your left side to your left side
122.544 --> 123.058, go straight
129.097 --> 130.747, keep going go straight
132.994 --> 135.878, pass the street intersection and then go
136.465 --> 137.013, go straight
140.089 --> 141.828, yeah can you go little faster
144.813 --> 146.419, and then make a left turn here
149.249 --> 150.864, okay try your best
152.825 --> 153.799, make a left turn
155.196 --> 155.908, left
161.045 --> 163.951, and you're going to make another right turn right turn here right right
164.314 --> 164.768, okay
166.104 --> 167.801, go straight just keep going
170.000 --> 171.300, pass this
171.300 --> 172.717, okay slow down a little bit
173.472 --> 175.050, and you going to make a left turn okay
175.050 --> 175.800, go straight
175.800 --> 177.786, and then make a left turn
179.808 --> 182.931, left here and then left
183.646 --> 184.983, make a right turn right away
187.008 --> 188.390, right here right here
189.794 --> 191.156, and then another right
191.837 --> 193.213, right slow down slow down
194.075 --> 195.437, okay go straight
196.348 --> 199.563, and then the green will be on your left side left side
203.673 --> 206.480, cool we are done