The infrastructure behind DSP is a complex circuit. Starting from tools on our local machines and then moving to AWS, we had to create many iterations of the design while addressing problems along the way, until we settled on this final architecture.
Pictured above is a bird's-eye view of how everything is held together. Nearly every portion of the flowchart has undergone some change as we encountered problems and worked out solutions, with details on some more significant issues described later on this page.
First, introducing our toolkit, starting with tools we used locally:
A large portion of our development was done in Google Colab for its ease of isolated environments and convenient access to fairly powerful compute units.
Our sentiment model uses FinBERT from HuggingFace as a basis. Almost all of the sentiment model's functions were with Colab and HuggingFace, and the outputs were saved for the deep learning model.
Moving to the cloud, where we used AWS services, the two main pillars holding everything up are SageMaker AI and EC2:
We worked out of a JupyterLab notebook in SageMaker Studio, with the primary intention to successfully train the model after conforming the code to the SageMaker Estimator framework. After the training job was successful, we further utilized SageMaker's capability to run hyperparameter tuning jobs to tune the deep learning model. The weights of the chosen model were taken to then be used in inference.
We started using an EC2 instance for the benchmark model, which doesn't work with SageMaker training, then later added the final model inference too. Upon launch, the instance spins up a FastAPI app, containing two endpoints (benchmark and model), producing outputs given a payload which we will send from Streamlit.
Lastly, three components to access our models from the Streamlit application: S3, DynamoDB, and Lambda/API Gateway.
In S3, we have two buckets set up: a private one for general use cases, where we store our input data, and a second public bucket intended for sharing files via HTTPS. The public files are the outputs from the models, which Streamlit will retrieve based on a URL.
We store linked references between inputs from the API, and outputs to S3 objects, with DynamoDB. This is to save the compute effort of repeated identical requests, and to allow the correct output to be queried and returned to the API later even if the original calculation request times out.
API Gateway is used to provide the endpoints for the Streamlit application to send inputs to. We use Lambda functions as a middleman between the API Gateway endpoint and the FastAPI endpoint, and it also gives another opportunity for input validation.
Brick-by-brick...
Our initial draft of our architecture was drastically different. The simple plan was to host all models (benchmark, sentiment, deep learning) to SageMaker endpoints, and call them through a Lambda function to return the result.
How did we get so far from that? There were...several issues. We ended up at our current design after several iterations of a cycle: encounter a problem, some thinking time, take apart the structure, reassemble. Here are some of the problems that had major influence on the final design:
1. Our benchmark model doesn't include machine learning components, so we can't put it through a SageMaker training job.
Solution: Host the model through another service.
Result: An API within an EC2 instance can have an endpoint that runs the benchmark model. Since we have the model weights from the deep learning model, we can host that model on the same API (a cost decision).
We started using an EC2 instance primarily to have a way to run our benchmark model, since that had immediately thrown a wrench in our plans to just make SageMaker endpoints.
We did this via a FastAPI application, and assigned an endpoint to the benchmark model that we could send requests to with a JSON payload. The API had to be accessed with the IP address of the instance, which we just routed through a Lambda function to connect to the API Gateway endpoint.
Initially, the instance was only going to be used for the benchmark model, but as we had the weights of the deep learning model from the SageMaker training job, we elected to implement the final model through the instance as well, bypassing the SageMaker model object and endpoint. This was primarily a decision of efficiency, as it was much easier to integrate the model into a simple script with FastAPI than it was to adapt the inference script to fit SageMaker's framework, but it also resulted in a lot of funds being saved, given the cost of active SageMaker endpoints compared to a t3.large instance.
2. The models take more time to run and produce a result than the 30 second timeout window of API Gateway.
Solution: The outputs need to be saved somewhere to be accessed later.
Result: Use a database like DynamoDB to match inputs to outputs. The inputs and outputs need to be linked somehow, so that the correct output is returned to the correct call. Perhaps a unique key based on the input?
The original plan with the database was to assign each input payload a unique key (like datetime). The Lambda function would need to return the key to the requester, who would wait for a bit and then send another API call with the key to retrieve the output.
The strategy evolved into assigning each unique input a key based on that input to deal with identical calls. The script would assign the key, query the database for that key, and either fetch the result or run the model and store the output with the key. The associated result in the table would be a JSON structure. However...
3. The output of multiple tables and figures is too large to be returned through Lambda and API Gateway.
Solution: Instead of storing the output directly in the database, or trying to return it through the API, store the JSON file in a public S3 bucket, accessible via HTTPS.
Result: A second S3 bucket with public read permissions, for retrieving the appropriate output.
The user has a limited combination of inputs in the application interface. Therefore, it's not necessary need to assign every call a unique key, since there's a finite set of inputs/outputs. In the database, we'll use parts of the input as the key. We can save the large JSON output as a file, match inputs to the HTTPS link of the corresponding output, and return a URL, from which the Streamlit application can retrieve the file. Furthermore, API calls will query the database before running inference, so the wall time of repeated calls is massively reduced.