My Summary Of Hidden Technical Debt in Machine Learning Systems

4 min readApr 27, 2020

The following article is my summary of a popular machine learning system paper that I decided to analyse as it introduces a variety of concepts that try to capture a holistic understanding of the difficulty of building machine learning systems — especially scalable/different ones. These problems are only exacerbated in NLP and Computer Vision contexts where the cost of analysing model distributions soar significantly. I also introduce a few thoughts on the concepts mentioned (there is a large number of overlapping concepts in the paper — or perhaps the nuances are tiny and I failed to notice it) and briefly mention open-source packages that deal with the problems mentioned in this 2015 paper. Overall, I found the paper to be quite mind-opening in terms of problems that are being dealt with and offered vague solutions to dealing with them. However, being aware of these problems in the first place made this paper worth the read.

Link to paper here: https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf

Problem statement: Machine learning systems are inherently complex as they combine all the technical issues with maintaining a code-base compounded by machine learning related issues (more related to data/modelling). Changes in the external world can also influence a variety of issues.

ML systems can produce a number of issues previously not known in the software development landscape. An example of such issues are:

Entanglement — This is the idea that everything is intertwined. The concept CACE (Change Anything Change Everything) reigns true in ML systems and is a major difference between these systems and traditional software systems. For example — changing inputs means changing hyperparameters, learning settings and sampling methods, convergence thresholds.
Correction cascades: Minimising data dependencies and model dependencies (More below).
Undeclared customers (visibility debt): Other people using model output without the original creators’ intention. This can create dependency issues like no tomorrow if input signals were to change.

Data dependencies also need to be understood as they can be quite expensive. These are broken down into 2 main types:

Unstable data dependencies (these are volatile data sources in nature): This is the idea that the data itself can change (think of what happened during COVID-19 and the noise that this makes on our models). As a result, machine learning systems need to be handled in place. There is also a recommendation that we adopt model versioning systems to handle this.
Underutilised data dependencies (these can creep in after a while): Previous dependencies on not-so-relevant data (for example choosing incorrectly correlated variables or using old data schemas). This can be managed by using static data analysis (I think tools like data versioning tools can be quite useful here — Kedro, DVC)

Then there are considerations for how the current system rewards itself/other machine learning models. For example — different models for related products and relevant products for side-ads might create dependencies on one another that may impact each other’s systems.

Another concern for developers is the number of unfamiliar abstractions in machine learnings systems that can create issues with code-readability (internal packages that have their own custom abstractions should try their best to fit to best-practice open source) and transfering of ownership. A common example would be to use scikit-learn’s fit_predict method for algorithms as they are very useful.

It is important to also understand trade-offs in machine learning systems:

Trade-off between using another language to improve performance and ease of transferring ownership
Time spent prototyping vs time spent scaling — if a significant amount of time is spent prototyping then this suggests that there may be issues with scaling in the environment — hence we need to be aware of any changes
Configuration debt (I think this largely overlaps with understanding different features) — to be honest, I failed to understand how this differed from data dependencies of old schemas in the example provided.

Dealing with changes in the external world is also a consideration:

Fixed thresholds may no longer apply and may require automatic thresholding in order to get good trade-offs
Monitoring and testing can be difficult as unit tests and integration tests are insufficient. Hence, we need to adapt by also understanding where biases can come from and testing those. The paper offers the following points to start: prediction bias (distribution of training set should attempt to match distribution of test set), action limits (automated alerts should fire/trigger manual investigation or intervention), upstream producers (considerations for processes that use your outputs)

The paper then considers other types of debts (importance of reproducibility, process management debt) that are largely considered. Then, there is cultural debt. Cultural debt in this case depends on managing rearch/engineering/business interests in order to maximise reward for effort. This can be difficult to balance if you have a non-technical manager who prefers not to code in order to maintain high-level overview of the business. In these cases, one needs to manage the interests of these managers with respect to others.

A useful summary of questions is appears at the bottom for questions to consider prior to coding:

How can a new algorithmic approach be tested at scale
What is the transitive closure of all data dependencies?
How precisely can impact of new change to the system be measured?
Does improving one model or signal degrade others?
How quickly can new members of the team be brought up to speed?

One thing the paper does not discuss is debt from technical infrastructure. Improper planning of data capturing can lead to wrong analysis that does not prove useful and is simply costly. I think organisations need to have a proper data capturing strategy along proper formatting to allow data scientists to be at their most useful.

My Summary Of Hidden Technical Debt in Machine Learning Systems

Written by Jacky

No responses yet