Mobile Internet traffic has surged in recent years, driven by increased mobile app usage and content consumption. With 5G and media-rich applications like VR and AR, this growth is set to accelerate. Ensuring high Quality of Experience (QoE) for users is vital but challenging due to the volatility of wireless connections. Therefore, predicting bandwidth data beforehand is crucially important, as it allows companies to effectively distribute internet resources.
We want to look at the previous bandwidth data and predict the next, or multiple future bandwidth data. Specifically, it can be model as
X( 𝜏 , k)
where x is the prediction function, for example, LSTM. 𝜏 is the history length, and k is the prediction length. LSTM (10, 3) means that we use LSTM, look back on the previous 10 seconds, and predict the next 3 seconds.
Since bandwidth data comes in alone time, our prediction is also real-time. For LSTM (4,1), when the first 4 bandwidth data (4 seconds) comes in, our model produce the first prediction. Then, when the fifth data comes in, our model uses the second to fifth bandwidth data to output the second prediction, and so on. This way, we have a series of prediction. Here is a illustration:
Now we have our input bandwdith series and our prediction series, we can calculate how well our prediction is using Mean Average Error (MAE) and Root Mean Square Error (RMSE). Below is the formula:
Our work continues from PhD Lifan's LSTM and multivariable prediction. Our research process can be separate into 4 stages:
Try out untested models
Draw conclusion
Tackle the issue from previous model
Collect new dataset and draw new conclusion that will help us train our model
Here is the abstract of our Summer Reseach program which introduce the problem and the result from the first stage:
Here is the video that explain the content in our first stage.
The reason we choose Transformer is because it can be used as a strong seq to seq prediction model for multiple seconds predictions. LSTM (10, 5) outputs 5 future bandwidth values from its last hidden layer all at once. However, Transformer (10, 5) output first prediction, than use input [2:10] + output [1] to output the second prediction value.
Google's Time Foundation (TF) models are particularly well-suited for bandwidth prediction due to their ability to leverage a vast range of information and contextual understanding. These models are trained on extensive and diverse datasets, encompassing a wide variety of temporal patterns and scenarios.
Despite Google's TF model has 2 percent improvement of general performance, it's training time is 30x than that of LSTM, and it's prediction time is 2.5x than that of LSTM. Since this pretrained model takes a tremendous process in rendering predictions, it might not fit the real-time purpose we want to achieve.
I tried out all kinds of configuration of transformer, yet the result is a little worse than LSTM. By visualizing the bandwidth data, we found a problem where both our result and Lifan's previous result has.
Here is a upper half of prediction curve from LSTM. We can see that it seems like the prediction is delaying by 1 second from the actual. The peak of the yellow curve comes after the peak of the blue curve. The prediction curves from TF or transformer all have this problem.
We doubt there are two ways this error occurs. First, the model neglect the contribution from all other time steps except the previous time step. Namely, it copies from the previous second. Second, it simply outputs the prediction in wrong direction, which gives the illusion of delaying.
Since models like LSTM is fairly complicated with all the gates and intricated weights, having a marker and visualizing what happened is very hard. I decided to use experimental method to determines the problem.
One way to determine if the model is copying is to get rid off the time series prediction characteristic. More specifically, I tried LSTM (1,1). Since we are looking at only one past second, there is zero notion of pattern as the hidden state value refreshed but only the weights is saved. Here is the result:
We see that the delaying effect still exist. This support the evidence that LSTM can indeed copy the previous second. I then took it to the extreme. I did LSTM (1,5). Prediction next 5 seconds based on previous 1 seconds is impossible. But here is what I obtained:
The left picture above shows the result. It still outputs a nice delayed prediction. The right picture shows the comparison of LSTM (1,1) and LSTM (1,5). We found that they are almost the same with 1s delay. This implies that the fifth prediction copies from the first prediction.
However, there is also evidence showing LSTM doing the actual prediction. For example, if I inputs a complete ramdom number series, the expected output should be the mean of all input.
The output on left is indeed a line.
The picture on the right resulted when I trained LSTM with 5 repeated random sequence, and test it by interpolate noises. We see that LSTM almost perform perfectly as the model remember all the patterns of the input, so the prediction is unaffected by noise.
The result I obtain suggested that LSTM's behavior is severely influence by the randomness in input bandwidth. Therefore, normal LSTM or any time series model is definitely not enough given how random bandwidth data can be. We need a better model that pushes LSTM to capture more general pattern instead of copying, and also other cellular signal data to increase the correlation of bandwidth.
Since the bandwidth sequence appear to be predominately random, using only this information for prediction is quit impossible. Intuitively the copying behavior make sense as the most related choice should be the closest weight to the LSTM output layer.
To tackle this challenge, we need more data, more diverse data on cellular signal that relate to bandwidth and have more pattern.
But even before that, we want LSTM to make more drastic prediction then severely copying the previous value. We do that by narrowing down the problem to predict only the sudden change. Since a sudden change in signal often suggest a handoff between base station, I just call our model handoff prediction.
Instead of giving only bandwidth signal to the LSTM, we give a additional input of sudden change. This is a binary sequence with 1 indicate sudden change, and 0 indicate no sudden change. Here is a code example that illustrate how to build this sequence:
We set a threshold value, if the new value we obtain has a difference that greater than the threshold from the previous value, we put a 1 in the output array.
However, result shows that simply give this information to the LSTM produce nothing exciting, as shown:
The blue curve is the actual data, and the orange is prediction. I let LSTM predict future bandwidth and also handoff. The red dot means LSTM predicts a sudden drop for next value. The MAE value is 27%, even worse than without sudden change input, 23%.
However, if we cheat a little, something happens:
If we compare closely of this image to the previous, we see that the orange line get much closer to actual blue line, some perfectly overlap. Despite having similar copy problem and overshoot, the model is better at doing actual prediction. The MAE is 21% for this one, a significant improvement from 23% given that 23% is already a hard bottleneck for this bandwidth sequence.
Here is how I cheated, in creating the sudden change array, instead of comparing current bandwidth value with previous, I compared current with future—I can do that since we are in post-processing. We live in a causal world and cannot know the future value, therefore this is cheating. Here is the code of how I do it:
Compared with previous code block, this uses difference of i+1 and i.
Here is the improvement of cheating handoff prediction on three different bandwidth traces.
The behavioral change is exciting because even though I cannot cheat in real time, I can cheat in the training section of our LSTM, which is post-processing.
Therefore, I developed a handoff prediction that leverage this. Here is the basic flow chart:
Here is the overall process:
The bandwidth data is separated into 20% testing and 80% training
The training data is being pass into the handoff sequence maker cheated, using future value and current value
It pass into LSTM to induce it to place importance on the binary handoff sequence, since this improves the accuracy of general prediction
Then save the last hidden layer from each time stamp of LSTM
The 20% testing data is throw into a handoff prediction model, that will takes in bandwidth and predict handoff behavior
Then, the predicted handoff sequence will be pass into the pretrained LSTM that uses saved weights
The pretrained LSTM will then make the final prediction using bandwidth and handoff information
This loop seems very redundant and stupid, as we use the data itself to train itself, and then uses result again to train itself. However, this way we indeed get a better prediction.
Here is the final predicted result:
This result is robust as I tested it several times under adjusted parameters. Absolute improvement of 0.7% from baseline LSTM might seems not that impressive, but given the bottleneck, the relative improvement is on average 10% across dataset, which is very significant.
One important catch here is this block:
We need to predict a handoff sequence. This block is nothing but another LSTM that takes in bandwidth and output both bandwidth prediction and handoff sequence. One might ask, if we cannot predict bandwidth well, of course we cannot predict sudden change well. Indeed, the predicted result is bad.
Here I visualize how well the handoff sequence prediction is. I use a Binary Classification method with a threshold of 0.6 for 0 and 1 output. It's only 14% while having equal amount of wrong prediction.
If I adjusted the threshold of the sudden change to 1 instead of 5, I got the above result. We see that the accuracy indeed went off, but there are many false positive, 210 predicted sudden change where there is no sudden change.
To adress this issue, I updated my loss function to punish the false positive, where prediction is 1 while the actual is 0 using the code below.
After numerous try and playing with parameters, I obtain one set that produce pretty good handoff sequence.
Here is a prediction result of bus57 dataset. We can see that despite there are many wrong prediction and copying, many places the prediction overlaps amazingly with the actual bandwidth, meaning that LSTM is dynamically throwing prediction based on sudden drop information. I do not know why baseline LSTM cannot learn that itself and needed to be induce this way.
The handoff prediction method can get rid off too much dependency of RNN based model on previous bandwidth value, but it's not enough for long run.
To build better and stronger model, we need large amount of dataset. Our current endeavor is to expand our dataset. We aim to record many fixed route such as the commute route of NYU Tandon and NYU main campus. We came into contact with SpeedTest Company in quest for more diverse bandwidth data that recorded with a long range of time.
As for the data itself, we collected multiple signal all at once, like the ping, signal strength, and mobile speed alone with bandwidth. Below is graph showing all the multivariate data put together:
The blue curve is the bandwidth. I cannot eye-ball any significant pattern. Here I calculate the correlation of different type of signal with bandwidth with time shifting.
Most of them shows weak to no correlation. However, 20%-40% might be significant to influence bandwidth prediction. I built a multivariate LSTM based model and use multivariate data to prediction bandwidth, but the result is around 27%, much worse than the simple LSTM. However, I came across something interesting:
If I only uses the multivariate data, without bandwidth information, as input to LSTM to predict future bandwidth, we see that in the case of sudden drop in actual bandwidth, the prediction also shows a drop. That means that the multivariate data can tell us something important on the trend.
Collecting data is a difficult process. We went to different location, and collected many dataset under different speed, and network traffic. Below are some highlights of the app, method, and result of our dataset.
Here is the GitHub link to our research models: