Using Machine Learning to solve business problems has now become more of a norm than an exception. And yet, we see a large number of AI/ML projects failing to achieve what they set out to. This has a lot to do with the birthing pangs of a nascent field and the inherent complexity of the technology. But I believe a number of these failures can be avoided, or at least detected early, by a more product-centric development process.
The Task-Dataset gap
Products solve tasks for their users. Or at least the good ones do! Sometimes the tasks are simple enough that a team of smart programmers can codify a set of rules for carrying them out and package that neatly as software. But other times the task is too complex to be codified manually. For example, let us say the task you have to solve is recognizing faces in a digital album. Now, you could try writing traditional software for this task. And people have tried it in the past. But very soon it was realized that the variations of input that would lead to the same output are enormous. There are just too many low-level variables to keep track of — the background, the make of the camera, the angle of the shot, the lighting in the room to name a few.
And this is where Machine Learning excels. Especially in its modern avatar of deep learning. Neural nets do a ridiculously effective job of “summarizing” these lower-level variations such that the output of the network is robust to them, at least to an extent. The modern supervised deep-learning recipe that has emerged in the past few years goes something like this:
Collect input-output pairs that represent the input and output for the task at hand. Additionally, you may also collect input-output pairs of closely related tasks on the same data (multi-task learning) or on a related dataset (transfer learning). Create train/test/val splits.
Define a metric to optimize. Decide on the threshold value of this metric to release.
Use a pre-trained model from your domain and tune that to your training set.
Iterate on the data representation, model, hyperparameters till you find a model that passes the threshold on your validation set.
Confirm the results on your test set. If all looks good, release. Else go back to step 3.
Looks done, right? If only, life were that easy. You release the model to staging and the next day the testing team comes back with a litany of issues. “The model does not work well for older people”. “It makes too many errors for people with darker skin”. “There is a sequence of shots captured in a burst but it identifies people in some of them and not in others. How can it work for one image but not the other? I thought this thing was smart”.
Sigh! And now you try to explain that this is a statistical model. For examples that are not common in the training set, it is expected to perform worse. And while it is robust to a lot of variations, certain types of small variations can throw it off. And no, this thing is not “smart”. At least not in any way that you would call a human smart. The performance overall is above the threshold we decided, isn’t it? Why are we hung up on these specific errors?
This is what I call the Task-Dataset gap. It is the gap between fulfilling all the requirements of the task vs just optimizing a seemingly good metric on a training set. And this stems from not thinking through the product requirements in steps 1. and 2. above. By the time you realize this mistake, you have already spent a lot of time in steps 3, 4, and 5. Not only have you lost a lot of time in the process, but you may have also lost the trust of the product team — especially if your organization has just started trying to use ML for solving problems. And this is where Product centric AI can be of great use.
Product centric AI
The core tenet of product-centric AI is straightforward — keep the product requirements front and center while defining, designing, and deploying AI systems. This is closely related to, but slightly distinct from the data-centric approach picking steam, led by the AI pioneer Andrew Ng. While the data-centric approach places the emphasis on data, the product-centric approach advocates contextualizing the data in terms of product requirements. The modified recipe would look something like this:
Along with the product team, users and any other stakeholders define the concrete requirements of the task your AI model is supposed to learn. This can be in the form of use-cases (“The model should work comparably well for all age groups”). Or user-stories. (“The user should be able to take a burst of images sequentially and the model should give consistent outputs for the images in the sequence”). Or any other requirement specification format that works best for the problem
Collect input-output pairs such that you have sufficient data covering all the requirements. If you have data collected in bulk, annotate it such that you can create slices of the data which correspond to each requirement. Sometimes you could do this programmatically or by using metadata information. Other times you may need to do this annotation manually. In this case, you may get a subset annotated to meet time and cost constraints. Finally, define an appropriate metric and threshold on each data slice which would serve as a go/no-go for the corresponding requirement being fulfilled.
, 4. and 5. are largely unchanged. Except that you may want to design your loss function to explicitly model these requirements in some cases. In other cases, you may expect these requirements to be satisfied given the data you have collected.
Note that I am not recommending a waterfall-like process where requirements have been set in stone and the only iteration is in the subsequent steps. It is very much expected that the data-science or product team discovers more scenarios as they are building the model which should be captured as requirements with their own metrics. In which case these should be added to the list and tracked accordingly.
Advantages of the product-centric approach
The obvious advantage of the above modification is that it ensures the data-science and product teams are forced to think through various scenarios in which the product is used. This comes naturally in traditional software projects as they are typically built feature-by-feature and user story-by-user story. For data science projects, sometimes there is a temptation to lower the design rigor and “let the data do the talking”.
Secondly, it gives visibility to the organization into the performance of the models in a context that they understand. This would require some tooling, but imagine a dashboard with product level requirements and a green/red label that indicates whether the current model satisfies that particular requirement on the test dataset. You can, of course, add more details to such a dashboard and this can be invaluable for managers/executives to quickly monitor the health of ML models and even prioritize areas of improvement.
There are a couple of other advantages to the product-centric approach. It fosters better communication between the data-science team and the product team. At times, data-scientists can be guilty of viewing the world in terms of data points rather than human actions which generate them. Putting the product requirements prominently in the process can break this silo and improve communication between data science and product teams.
The other potential advantage of this approach is that it can build towards a more meaningful measure of what it means to have a difference in the test and prod distributions. Or what it means for data to drift in production. Instead of relying on blind statistical measures of drift, we could detect drift in terms of more meaningful measures. For example, the context-specific drift detection method here.
Granted that all this might take more work. Especially before you get started with the “fun” work of playing with the latest models. But hey, no one said it would be easy. And no one said it would be cheap. First, we get it right. Then we do it cheap.