My Learnings From My First Kaggle Silver Medal

5 min readMar 23, 2020

Before I explore what I have learnt, I think it is important to share what I think Kaggle is an accurate representation of and what it is not. Kaggle is a first and foremost a competition platform. While the domain is in data science — it simply offers problems in data science for those who are interested in solving them using a very linear problem-solving methodology. Many Kaggle GrandMasters have already built pipelines in order to adapt their methodology to solve a large number of problems!

It focuses on one component (although an incredibly important component) of data science. It differs from practical data science because it misses the following marks:

Engineering requirements to build the dataset
Business understanding to determine appropriate labels within your dataset (how to define your response variable)
Missing Productionisation of Models

What it will teach/give you is:

How to prototype damn fast
What current state-of-the-art in AutoML (Automated Machine Learning) can do (and how you compare against them)
An opportunity to refine modelling approaches, explore proper statistical modelling ideas
A way to benchmark your learning speed against others (understanding what others did/have done/how they optimised their testing)
An exciting add to your resume that is respected amongst some data science circles (not all but some do recognise your hard work!)

However, despite its shortcomings, I do want to stress why I think this is important. As modelling becomes more and more automated, we will come to find that AutoML packages such as MLBox can already beat a large number of data scientists in competitions (see: https://mlbox.readthedocs.io/en/latest/). It is also therefore safe to assume that data scientists that focus purely on modelling will inevitably not have jobs. Ideally, not only do we want to build models better than current machines but by a large enough margin that our salaries are worth it!

My Learnings

The following learnings are a mix of ideas about deep learning/improving competitive performance in data science competitions. I apologise for the lack of structure here but I think the ideas were too broad/general/few to figure out an intelligent way to structure these thoughts. So without further ado:

Find fast ways to prototype early and quickly. You always start every competition with a billion ideas. The question is how can you set up a competition framework quickly so that you spent most of your item on ideas and not on figuring out why your submission had a submission error. Our schedule was as follows: Day 1 — Run a few kernels. Day 2/3 — Get submission for baseline model going. Day 4— Try pre-processing and so forth. For NLP-specific tasks, prototyping is tougher because one pass through your network can take a really long time. One idea as shared by Dieter (a Kaggle competitor) was to utilise distilled models to test ideas faster.
Use public kernels as much as possible to minimise re-inventing the wheel/bicycle. I used FastAI and transformers in my models. One of the first things I searched for when my submissions failed was other kernels that utilised FastAI and transformers and copied their submission steps. Why is this important? This is important because Kaggle does not give submission errors. (Also — be sure to give thumbs up to kernels! People open-source their work for others to learn from and the least you can do is acknowledge them while borrowing their code!) One inspiring solution shared by yuva reina (another Kaggle competitor) was that he would debug his code by running try-except blocks on each portion of the code to determine which part was failing him. The interesting kernel can be found here below¹.
Tweaking neural network architectures was a waste of time unless someone has stated they had experience with success in a certain architecture. This was surprising to me especially as so many neural networks from winning solutions from other competitions seem to have nuanced solutions that seem to have been the result of careful experimentation. Stability does however seem to be a bit of an issue with recent NLP models².
Deep learning seems to not make sense in a lot of ways. Vladimir Iglovikov (a data scientist whom I respect) suggests it is more alchemy than science or art. A statement which is true to me through just a short number of experiments and competitions I have done with neural networks. I am not sure why things work sometimes but implementing papers does sometimes help when results have already been proven. One such example was testing the different ways in which to use Bert layers for classification (was it the second last layer or the last layer or perhaps a max and average concatenation of the layers).
GroupKFold worked better than MultiLabelStratifiedFold. This was really interesting. For those who are not aware of what these are, these refer to splitting techniques for your data to split between training and cross-validation. The first refers to a data splitting technique based on a specific column that ensures the entirety of one group is on one side whereas the latter tries to ensure all the labels are evenly split between training and test setse. The problem we explored was a multi-label problem (in other words, you had multiple response columns as opposed to one). This is an idea that I will continue to experiment with. I think multi-label stratified fold can have huge difficulties because the boundaries at which to split are difficult to determine but it presents an interesting statistical problem.
(Try tricks?) In our competition — there were a few post-processing tricks (i.e. small little mathematical functions applied to your predictions in order to improve your overall result) that worked incredibly well. For context — the trick we applied to optimise for SpearmanRho Correlation was rounding. I have this in question marks as this does not appear to be good practice in data science but for the chosen metric that we were optimising for — it worked wonders and was simply amazing.

Finally

I was lucky to have been able to partner up with my friend, Matthew Olsen, in Sydney at the time! So huge thanks to him for being a great team-mate during this competition and tossing a bunch of awesome ideas!

Links from annotations:

¹ Kaggle kernel describing a competitor’s rigorous approach to solving a submission error: https://www.kaggle.com/c/google-quest-challenge/discussion/120368

² On Model Stability As A Function Of Random Seed (https://www.aclweb.org/anthology/K19-1087.pdf)

My Learnings From My First Kaggle Silver Medal

My Learnings

Finally

Written by Jacky

No responses yet