Week 1 - We worked on completing the presentation by exploring what APIs will be useful and what defining different levels of success
Week 2 - We're trying to implement voice control using Alexa Voice Services. A preliminary android app has been created and we are working our way through immersing ourselves into the Amazon ecosystem. Their system seems quite complicated to us.
Week 3 - Scrapping Amazon ecosystem. The AVS API used to be built in Java but was recently deprecated to C++, would have to use an unfamiliar tool NDK to get it to work. Even with the existing Java code, the sample app only authenticates in the virtual device and not on a real device. Many hours were put into just downloading, installing, and configuring different tools onto Jose's computer. However, the computer has no microphone support so it is unable to use the virtual device to test any commands. Also, the sample app requires a sample server to run, which looks rudimentary (we assume this contributes to why this API was deprecated). The goal would be to use the new C++ API using Amazon's Lambda serverless functions, but using any of their tools has been extremely cumbersome thus far.
Week 4 - Moving on from voice, we started working on using the Google Cloud Vision API. Still plenty of time devoted to simply configuring different packages and SDKs. Their sample code seems pretty robust, we're trying a static image that is located in a resource folder. No camera implementation yet. Even though Google gives us lots of sample code to work with, we are having no luck getting authenticated. Google suggests authenticating using 1 of 3 methods: service account file, OAuth2 tokens, API Key. We cannot make any calls to the API unless we get authenticated and a bit problem I'm encountering using their suggested authentication method (service account file) is that Android Studio does load system environment variables from Windows! The code clearly calls System.getenv in order to find the environment variable I set "GOOGLE_AUTHENTICATOR_CREDENTIALS", but that is set in my Windows machine which does not make it to Android Studio. we can't tell it to look in a different path because the API function I'm invoking calls functions to get authenticated on its own. OAuth2 was also a pain to work with, we ask the user to sign in to a google account and approve the permissions to their account, so even though I have a token, the API function isn't looking at my token, it's looking at predetermined locations. No progress can be made until we get authenticated!
Week 5 - Other people have had to use the Cloud API right? How did they get authenticated? It turns out that getting authenticated is as easy as creating a string with the API key despite the numerous attempts that Google's developer sites made to persuade me from using alternate ways of getting authenticated. Curse you Google developer sites. Anyways, finally some progress has been made. We first made the app get tags of items seen in a gallery image. Then we added the ability to get text recognized as well and display it all the screen. Then we tried using the camera code from HW 2 as a CameraSource to get the image to pass to the API, however, the code has lots of tools we won't use like the graphics overlay and face detectors. After a while of trying to take those items out, the code wouldn't run without them in so we decided to use the regular Camera API hoping it wouldn't be bogged down with the extra tools in HW 2. Then we realized that the Camera API was deprecated so we're currently using the Camera2 API which is a lot more code, but it works! We can push a button, it will take a picture, then display tags of items and text seen in the image. The odd part is that it is very slow at doing this (~8 seconds). Calling the API from a gallery image was quick, and looking at the log cat the image is ready within milliseconds. We'll investigate this further at later point.
Week 5 Part 2 - We added the plain Google sound recognizer function to our project, so on a click of a button, I can say "describe my surroundings" and the app will call "takePicture()" which eventually should call the API. However, even though takePicture() is called after the app realizes the instruction dictated, it says the camera is offline. Maybe it has something to do with threads? Anyways, we don't like having to click the button, we found an open-source project called Porcupine which allows for low power hot word detection, such as Alexa. We think we're close to getting Porcupine implemented, if it works, then we awake the app with "Alexa" hotword (built into Porcupine) and then trigger the listening feature on Google that we're currently triggering with the button. Things are coming together!