Code & Software
Code and Software
CORDS: COResets and Data Subset selection
Github Link: https://github.com/decile-team/cords
Achieve 3x to 30x speedups for a number of ML tasks and domains by using informative data subsets in each epoch of training
Algorithms implemented: GLISTER, CRAIG, Grad-Match, Submodular Selection (Facility Location, Feature Based Functions, Coverage, Diversity etc.), and Random Selection.
Scenarios: Supervised, semi-supervised learning, and AutoML. Domains: Image Classification, NLP, Speech Recognition, and Tabular Data
172 ⭐ | 25 Forks
DISTIL: Deep dIverSied inTeractIve Learning
Github Link: https://github.com/decile-team/distil
DISTIL implements a number of state of the art active learning algorithms.
Some of the algorithms currently implemented with DISTIL include: Uncertainty Sampling, Margin Sampling, Least Condence Sampling, FASS, BADGE, GLISTER-Active, CoreSetAL, Random Sampling, and Submodular Sampling
83 ⭐ | 14 Forks
SPEAR: Semi-supervised Data Programming Based Weak Supervision
Github Link: https://github.com/decile-team/spear
DISTIL implements a number of state of the art data programming approaches including SPEAR, Snorkel, Imply Loss, Learning to Reweight, etc.
Enables writing Labeling functions for programmatic data labeling and semi-supervision.
72 ⭐ | 7 Forks
SubModLib: A Submodular Optimization Toolkit in C++
Github: https://github.com/decile-team/submodlib
A general-purpose C++ toolkit for large scale submodular function optimization, which includes a large class of algorithms and commonly used submodular functions with python API
Has several memoization and implementation tricks to speed up the algorithms (including the implementations of the Lazy Greedy, Lazier than Lazy Greedy etc.)
Algorithms scale to massive datasets involving ground set sizes of several million instances.
Enables creating applications for several summarization (document/image/video) and data selection applications in a few lines of code!
21 ⭐ | 8 Forks
Jensen: An Easily-Extensible C++ Toolkit for Production-Level Machine Learning and Convex Optimization
Github Link: https://github.com/decile-team/jensen
A modular framework for Convex optimization including several common convex functions and algorithms used in Machine Learning
Implements several convex functions like Logistic Loss, Hinge Loss etc. and most convex optimization algorithms including LBFGS, Trust Region Newton, LBFGS-Owl, Stochastic Gradient Descent, Nesterov’s optimal algorithm, Gradient Descent with various update rules, Conjugate gradient descent etc.
Implements several basic Machine Learning classifiers such as L1/L2 regularized Logistic Regression, SVMs, Probit Regression etc.
42 ⭐ | 20 Forks
Sanjaya: A Scalable C++ deep video analytics engine (See this link)
Implements a scalable real time and post-mortem video analytics engine with several functionalities including object detection, face detection and recognition, human detection and human attribute recognition, vehicle detection and vehicle attribute recognition and face age/gender recognition, video summarization etc.
Integrates several open source software including OpenCV, Caffe, DarkNet, DLib and LibCCV, all in a single engine!
Ability to train customized object detection models and image classification models
Enables model finetuning and transfer learning
Supports live streams from surveillance cameras and several video file formats
Enables creating video analytics applications with a few lines of code!
Led to the development of two products Surakshavyuh (real time analytics and alerting) and Jigyasa (video search analytics). For more details on this, please visit this link.