Video on the right is an initial demonstration of custom object detection on an edge platform - Jetson Nano. Idea is to recognize actions like "putting milk in fridge" or "taking milk out of fridge" using object tracking.
Detections are still noisy due to the small dataset (custom) and a light neural network (for faster inferencing) . However, getting the tracking algorithm working with noisy detections will be a good real world challenge. More data can be collected once project is validated.
Potential application is autonomous inventory control and purchasing. Advanced work could be done to predict requirement for products before they run out.
https://github.com/conq44/Where-to-sire.git
Dijkstra
A_Star Search
RRT
Why cant i have a tube amp too?
https://github.com/conq44/Guitar-emulation.git
Link to the paper -
https://arxiv.org/pdf/1811.07258.pdf
An attempt at utilizing trackletnet for the visdrone dataset was made, local training/fine-tuning became too expensive to continue and I plan to pursue a simpler MOT algorithm coded from scratch soon.
In any case here's how it works in simple words-
A tracklet is an association between what the model thinks is the same object in multiple frames. This is what allows us to track the same object over the video sequence. In this paper the authors essentially propose a graph model in which the different tracklets are the nodes in the graph and the similarity between the different tracklets are the edges. First the objects in every frame are cropped using the ground truth labels and an appearance model is trained. This repo uses the facenet architecture to train using triplet loss. To train the model every cropped object (the anchor) is paired with other objects which are either the same object in a different frame (positive) or a different object(negative). The model is then trained to identify which objects are the same and which are different. An important thing to consider when training this type of model in sequence data is that the validation data must be taken from a different video sequence to check the performance of the model on an unseen object set. The Intersection over Union of the tracklets from their epipolar geometry is also used. Epipolar geometry helps to differentiate between different objects occupying the same spot in consecutive frames in which case simple IOU would identify them as the same object. This step allows to build tracklets form the video sequences. These tracklets form the set V. Keep in mind that there maybe multiple tracklets that are derived from the same starting object and many of these might be very similar.
Next the custom CNN called trackletnet is used to identify similarity between pairs of tracklets from the set V. This then forms the edges of the graph in set E. Clustering is then carried out on the graph and final tracklets are output as a result. These tracklets identify the same objects in each video sequence.