Using different techniques, such as YOLO and k-means clustering (Gaussian Mixture), and momentum vector calculations, one can create a program that can track movement speed, and track objects so as to not double count them on following frames. You will see some failures in the below video but given this was the first proof of concept it still does remarkably well.
Taking into consideration a few parameters while recording, post editing, or updating our train/test variables, there is no reason why near 100% accuracy would be impossible. Hope you enjoy the video.
Using gaussian mixture, it is possible to verify current clusters in frame. (Different then YOLO image isolation) This allows one to determine center locations, shrink the blue blobs for cleaning up predicted next coordinates, and then update momentum values from this as well. This is but one hyperparameter that can be fine-tuned. In the video example, clustering cleanup, and momentum updates happen every 10 frames.
Who knows, you'll just have to come back and find out ;)