Kaggle is a popular platform host to many data science competitions, often offering monetary prizes for winners which propose the best solution to the various problems posted by a multitude of organisations.
Despite the intimidating perception of the competitive platform, boasting hundreds of entrants with varying levels of experience, Kaggle competitions are often one of the first suggestions mentioned for current and aspiring Data Scientists alike to pursue and gain valuable experience to develop themselves and their skills.
This quick guide explores which categories to look out for, the benefits of entering Kaggle competitions and some handy tips from our Senior Data Scientist, Kevin Lemagnen.
Kaggle competitions typically fall under the following categories:
- Featured — Usually sponsored by organisations and governments. These have the largest prize pools up for grabs.
- Research — As the title suggests, these are research-orientated. They also have non-traditional processes for submission and offer nothing to small monetary prizes.
- Recruitment — These are ran by organisations looking to recruit Data Scientists.
- Getting Started — These are aimed at beginners and, whilst structured like competitions, there’s no prize pool. The reason these are recommended as a great starting point is because they feature easier datasets, tutorials and rolling submission windows — enabling you to be eased into the Kaggle ways.
What are the benefits to entering Kaggle competitions?
- Experience — As with everything in life, the best way to learn is by doing. If you take away the stress and competitive nature of the competitions, you have a great opportunity to hone in on and practice your coding, analytical and communication skills on interesting challenges.
- Insight — Every competition has its own discussion boards and debriefs with the competition winners (example below), enabling you to understand the thought-processes of experienced data scientists.
- Portfolio building — When you’re going to job interviews, you’ll want to have evidenced the skills employers are looking for. This is a great way of using the skills you’ve learnt to solve problems and eventually build up a portfolio of projects/competitions undertaken.
Tips from our Senior Data Scientist, Kevin Lemagnen:
1. Choose a language and stick to it; at least for the beginning so you can focus learning the concepts. Then you can learn new languages.
Whilst Python and R are popular on Kaggle and the general Data Science community, we recommend Python as it can be used for many other tasks such as building a website, automating tasks, and more.
4. Pick a ‘getting started’ competition. We recommend the titanic which is quite beginner friendly and has a lot of resources to learn from.
5. Download the data. Check available ‘Kernels’ in the language of your choice, read them, try to understand and do the same on your own.
6. Train your first model, keep it simple. Make your first submission.
7. Read about model evaluation, make sure you properly test your model locally.
8. Improve your model with new features, new algorithms and better tuning. Iterate.
9. Once you’re beginning to feel more confident, try reading more advanced kernels.
10. Once you’re bored with your first ‘getting started’ competition, go for a real one. Don’t focus on the leaderboard too much, most of the value is in reading other people’s code (Kernels), learning from it and applying it to your problem.
Get started here
You can click here to view popular kernels authored by various users to comprehensively guide you whilst you’re still getting to grips with the site.
That’s it from us, feel free to leave us comments below to let us know how your Kaggle ventures go! We’d love to hear about them.