Edge Computing using Raspberry Pi and Cloud Computing on AWS
Goal: Design and implement a architecture with response latency within 3 seconds with AWS and minimum 60% accuracy
The implementation of the architecture of the facial recognition application using serverless components (Lambda) and persistent storage systems (DynamoDB and S3) is as shown below.
Explanation of the overall architecture:
Raspberry pi:
Video Capture: A video is captured by the pi camera attached with the raspberry pi. Videos of 0.5 length are sent to the S3 in the format of <timestamp_in_ms>.h264.
Invoke Lambda: Along with storing videos in S3, we are invoking a lambda function whose functionality is to extract the frames. The bucket and key of the uploaded s3 video is sent to the lambda function in the payload of the event.
Lambda 1: Extractor: As soon as the lambda function is invoked from the raspberry pi, the function will extract the frames and store all frames inside the S3 bucket using the ffmpeg python library in a folder by the name of the timestamp at which the video was recorded. Then this bucket and key would be passed onto the lambda invoked next for face recognition.
Lambda 2: Face Recognizer: A new lambda function will be invoked from the extractor lambda function to perform the face recognition.
Recognizer lambda function will download the image and perform recognition on that image.
Once we get the result, the query is executed for fetching the results from the DynamoDB.
Once the results are fetched, the attributes of the person such as name, major, year and the timestamp (at which the video was take) are sent to the SQS queue.
DynamoDB: Stores the rows of Student data. We query from the StudentDB table for the result that we get in Lambda 2.
Raspberry pi: Results are fetched continuously from the SQS queue. Latency is calculated using thetimestamp at which video was recorded (timestamp value received from SQS queue) and the time at which the results are fetched from the SQS queue.
AWS ECR is used to host docker images and those images are used in the lambda functions for creating containers.
1. S3 Trigger: We initially used S3 triggers for both lambda functions. The video will be uploaded to S3 and a trigger iscreated for extractor lambda on this S3 bucket folder. So as soon as the videos are uploaded, the extractorlambda function will be triggered. Then the frames will be extracted and will be stored in S3 bucket. Another triggeron this S3 bucket folder was created for the face recognition lambda. In this approach, there are two triggers based on the PUT object in S3. Using it, we observed a 4-5 second latency. So we removed the S3 triggers givenby AWS and in place directly invoked the lambda from the code after the upload.
2. SQS Trigger: We also tried introducing SQS triggers instead of S3 triggers but there was still some latency for sending messages and using that as a trigger to the lambda.
3. Direct invocation: In this approach, we removed the S3 trigger for recognizer lambda and invoked therecognizer lambda directly from the extractor lambda. While invoking the lambda, the bucket name and file name ispassed in the event and that image will be downloaded in the recognizer lambda. We saw a decrease in latency by 0.5-0.8 seconds using this approach.
4. Direct invocation overall: Due to the above approach, we also thought of removing the first S3 trigger for extractorfunction. So we removed that S3 trigger and invoked the extractor lambda function directly from the pi sending thevideo name and bucket name in the event. This helped to reduce the latency by 0.2-0.3 seconds.
At the end, we were able to get latency between 2-3 seconds using two invocations of lambda functions instead ofS3 triggers.
1. We have implemented a multi-stage dockerfile which reduces the image size. This helps reduce the load time inlambda, eventually decreasing the initial latency of the recognizer lambda function.