AR navigation on Android

Jing Yan / Zhenyu Yang / Junxiang Yao

Overview

The project is about AR navigation on Android devices. Our goal is to create a navigation application that could provide user with a more intuitive way finding experience on the street and a more immersive store information displacement. We expected this application to be displayed on a heads-up 3D device such as google cardboard. Thus, besides the our two main functions, we implement voice control and hand tracking user interface to provide a better user interaction for head-mounted device.

The project is mainly consisted of three parts: navigation (by Zhenyu), location-based information display(by Jing), and user interface(by Junxiang).

Video Demo. 1-3

1. Navigation

/ Zhenyu Yang

The navigation provides the geographic information to the AR application. This part consists of four main parts: Obtaining destination via voice input from the user, requesting direction data from Google maps servers, decoding location data, visualising the direction by rendering an arrow and multiple waypoints in the 3D virtual space.


1.1 Voice recognition

The voice recognition feature was implemented by directly calling the built-in speak-to-text(STT) function in Android system. This function waits for a trigger event to start the listening process. During the listening process, this function monitors the volume of sounds collected from MIC. When the volume drops, this function will stop listening and extract features from sampled sounds. Theses features will be sent to an online server, which gives the recognition results back to the function. The results will be received as a form of String array, which are the best guesses of the users' speech.


1.2 Google maps API

The navigation part greatly depends on the Google maps API since it provides a huge amount of geographic information and responses quickly. Google maps Place ID API and Google maps direction API are two APIs that are widely used in this part. Google maps Place ID API allows us to convert a conventional place name into place ID, an unique string that exclusively describes a place. The place ID is then used by Google maps Direction API to obtain a direction from the user's current location to the destination. For each direction, a string called polyline will be returned from Google maps direction server. This string contains information about all GPS locations along the direction path, and it is usually decoded by the Google maps application. To extract the locations of the waypoints, a java version decoder was devised and implemented in this application.


1.3 Visualization

The navigation is visualised through an arrow and landmarks in the virtual space. Landmarks indicate waypoints in the virtual space. The arrow is fixed in front of the camera and it rotates so that it always points to the next landmark that user should reach. Upon the acquisition of the waypoint locations, these locations will be cast into the virtual space in our application. Each waypoint contains information of latitude and longitude, which will be scaled 200,000 times and used as a landmark's x and y values in the virtual space. The z values of all landmarks are set to zero to make them on the same plane(Ground plane). The rotation of the arrow is limited to yaw, which is based on the fused information from the built-in compass sensor and gyroscope sensor.


2. Location-based Information Display

/ Jing Yan

The second functionality of our AR navigation application is to display location-based information as an overlay on the real scene. Among all the information on street, we choose to display store information, in particular restaurant information, which is closely related to our daily life.


2.1 Scenario

Imagine a common scenario, that you are walking on a street with lots of restaurants. But you have no idea about what they are serving and which one you should go to. Before, people usually look at menus and window displacements of sample dishes. Nowadays, we search on Yelp for better photo references directly uploaded by users.

Our application intends to move one step forward by skipping that searching process. At the first time a restaurant is detected all the related information are rendered automatically using the web based Google Place API. Additionally, in ideal we hope that user can customize his/her augmented world displacement by deciding whether to show the information or not.


2.2 Mixed Reality Mechanisms & Technical Implementation

Here is the basic pipeline of the implementation:

Step1. Get the geographical location(send from GPS module), and track with Vuforia target sets(logo image are uploaded as dataset)

Step2. Get the store location(longitude, latitude) and store name(keyword)

Step3. Get real-time Json file from that store using Google Place API, parse Json file to basic data(store name, rating, price level, image reference, place id)

Step4. Rendering basic data as text(using Texample library) and images(as textures) on screen in openGL ES.


The mixed reality mechanisms of this part mainly consisted of three sections: tracking stores using Vuforia target sets and geographical location, getting and parsing real time data using Google Place API, and rendering text and images on android device using openGL ES.

1. Using Vuforia targets to do store logo tracking is relatively straightforward. I upload three logos from Habit Grill Burger, Panda Express, and Starbucks to Vuforia target manager, download the xml and dat files, and input them into the target data set of the code. Right now, this serves as a prototype of this function. But there is the possibility that users can upload their own logo target and create a real target set. Although, the Vuforia tracking is easy to implement, I found out there are still limitations in distance(<0.7m) and requirements in environment lighting to make the tracking happens outdoors.

2. Using the Google Place API, I collect the real-time store information online. Basically, I get an access token key and combine keyword, location data, and radius information to generate an URL. Using this URL, I get the Json file and then parse it into basic information strings. The tricky part of downloading information using API is that you cannot put it into render function which runs every frame. This will not only slow down computation, blow up memory, and destroy the token key. So I build my own database array depending on new data and retrieve information from the database.

3. Rendering text in openGL ES can not be done directly because Android SDK doesn’t come along with easy way to draw text on openGL views. So, there are different ways to accomplish this: place a textview over the surfaceview, render common strings to textures then draw, write own text rendering code based on a sprite, and using open-source library. In this project, I render the text using an open-source library named Texample which can be used efficiently. However, there are still limits in the design of layout.


2.3 Information Displacement

As a graphic designer, I feel it’s not easy to make nice layout in an augmented view. Because the design is generated on frames of real world video, the design background is complicated and uncontrollable. It’s difficult to keep the information clearly stand out as well as create a unique aesthetic. In this case, I am using a card based design which is clear, easy to understand, and modular based. Also, the shape of card can provide user with better affordance in interaction design.


Additionally, 2D cards can be improved by being constructed into a dynamic 3D object. They can be automatically folded up when users pay no attention to them, and automatically open up when users are focusing on them. Furthermore, in order to create a better information hierarchy and more customized displayment, virtual buttons can be added to the 3D cards to allow basic choosing.

Figure 1. concept sketch

Figure 2. outdoor testing

Figure 3. indoor testing

3. User Interface

/ Junxiang Yao

For the user interface and interaction part, my goal is to experiment some button control in augmented reality environment. Using Vuforia’s library, since I can receive a modelview matrix which enable me to know the position of the marker in the camera coordinates, I decided to use that marker to mark the place of my hand in the camera coordinates by attaching it to my hand. By moving my hand with the marker, I will be able to trigger several events within the AR world.


3.1 Button Control

First of all, I defined a virtual operating plane in the camera coordinate system to place all my UI elements. And that plane is a plane which is parallel with the screen of the Android device, and the distance is 60 in the camera’s coordinate system.

Because I need interfaces to show even if the marker is not found, I drew almost all my UI element except the cursors without the marker finding process using openGL ES 2.0. Also, in order to eliminate the shake and increase the precision while touching on the button, for each button and a cursor, I place a torus around the them to represent the process.

As for the buttons, in order to reach the best effect, I drew all of them from scratch. And attached them to the circle I drew using openGL ES 2.0 as textures. And for the color, since we are using Android voice control API, Google Map and Vuforia, I used the green from Android’s logo, red from Google’s Logo and green from Vufoia’s logo in the microphone button, navigation button and information detail button respectively. The usages of these buttons will be described later.

3.2 Voice Control

The first interface I made is for voice control. After launching the app, user will see this page directly. There are three elements in this layout, which are texts, a microphone icon for user to launch the voice control function and a transparent cursor. The cursor is the the center of the marker projected on the virtual plane I defined in the camera’s coordinate system. Thus, the cursor will follow the movement of the marker attached to hand. The reason why it is transparent is that I do not want this cursor cover texts because the user need to see the complete string even during the waiting process. I think it is important to ensure the user what he or she is choosing. To set off all the visual elements, I also added a transparent black mask to cover the background which is the frame captured by the camera. For the text part, the first string the user will see is “Please indicate your destination”, this string is not selectable and will have no respond effect when cursor is hovering on it. But if the cursor is hovering on the microphone button, the microphone button will enlarge a little as a respond, and after the processing torus around the cursor is done drawing, as a result, the microphone will change from gray to green, this tell the user to tell our app the name of the destination.

After the voice control process is complete, the microphone button will become gray automatically. And three string will be sent to the text part as shown below. And when hovering on the text, it will enlarge, and if the processing torus is complete, that string will be regard as the destination, the whole interface will move towards the camera until it cannot be seen. And the black transparent will be fading away. If these three strings are all not ideal, the user can touch the microphone button again to launch the voice control function again.

3.3 Navigation & Store Information

After the app receiving the destination, the both navigation and store information part will be both automatically turning on. When the camera found the marker, the cursor on the marker will show up. This marker will show the marker’s position in the space intuitively. Although right now the cursor has a dark green texture, I did try make it a phantom object by turning on the glColorMask() function like what we had done in the homework 2 before. But in homework 2, the phantom object is representing the head which will cover the teapot which is rotating around the head. In this situation, phantom object not only has to move a lot, it also has to present the relationship between the marker and the buttons explicitly. If I used phantom object here, if the phantom object is small, it just not that make sense since only part of the marker will cover the button if the marker is in front of the button. If the phantom object is large enough to cover the whole marker, it can be confusing since it can also cover all the buttons at once and give user a hard time to manipulate.

For this interface, I used a golden rectangle as a reference to locate two buttons. Unlike what I did before, the manipulation is limited in the virtual operating space I created in the camera coordinate system, the cursor at this stage should be able to move in three dimension. So I calculated the distance between the marker and both two buttons, and only choose the shorter one as the distance which may trigger the following event. Again, if the distance is smaller than a threshold, the corresponding button will enlarge as a response, and the waiting torus will start to draw. After the drawing process is complete, the button will rotate 180 degrees, and the other side of the button, which is the gray button will show up, and the corresponding functionality will be turned off.

After that, those gray buttons are also touchable and will achieve an opposite result as described above, which are turn 180 degrees more to show the chromatic icon and turn on the corresponding functionality.

For the most of the time, the user will not need to manipulate the buttons, which means the marker, as well as the hand, are not suppose to be seen by the camera all the time. Thus, I hided the buttons when the marker is not found by the camera. Before making this decision, I tried to implement a functionality that user can control whether he or she wants to show or hide the marker by the speed and direction of the movement of the hand. If the camera realizes that the marker is moving from left to right faster than usual and the button is shown in the view, those button will move out, and vice versa. But I realized that if the marker moved too fast, the camera will lose the tracking of the marker. As a result, I replaced this idea with the current one, and also declared an out-of-view counter, which is counting how many frame has past since the marker is not found by the camera anymore. If the counter is smaller than a threshold, the button will not move out the view since this may be caused by the marker moving too fast rather than it is really not in view. By implementing this, jittering of the buttons disappeared and the movements of the buttons became smoother.

CS291A_Final_Zhenyu_Jing_Junxiang.zip

File 1. source code