Kaggling
I have been doing stuff in the Machine Learning and Data Science domain for the past 1 - 1.5 years now and if I were to get an another chance to go back to 1st year and probably fix one of my biggest mistakes, then, it'd be participating in Kaggle competitions even more.
So, to assure that I do not keep the same thoughts in the final year of my college, I started participating aggressively on Kaggle from the start of my 3rd year. Here's how you can get started with it as a beginner in it.
So, since you are new to this website, I'd recommend not to jump into a very hard Problem Statement (assuming you possess pretty much a beginner or an intermediate skill level in ML). The best place to start it to head over to the Playground Section.
Follow the following steps -
1.) Head over to Kaggle's official website.
2.) Hover over to the left panel and select Competitions.
3.) You will see some options pop up below the search bar, All Competitions, Featured, Getting Started, etc.
4.) Click on the right arrow to find the Playground section. After finding that, go there.
Now we are in the Playground section. This is probably the best place to start your Kaggle journey (In my opinion). Even I started from here after one of my seniors suggested me this.
After getting in, you can see many problem statements, you can choose any one which is going on or you can even choose the ones which have ended. The only drawback in choosing that will be people won't be actively submitting solutions on it, it will be just you submitting your solutions and seeing how you rank against others but it won't benefit you awards wise since the contest has already ended but yeah, good for experience.
Now I will explain the rest of the part with the example of the current competition I am participating in. This contest is named Predicting the Beats - per - Minute of Songs. Let's say I have just entered this competitions, then, here are the usual steps that I am going to take.
1.) Read the Overview to understand what the competition is all about. It gives us the basic idea of the problem being Regression based or Classification and also some rough idea about the possible models we might be using in this.
2.) I hover over to the dataset part to understand what kind of features (columns) I have and also the size of it that I'll be dealing with. This helps me get an idea as to what kind of pre - processing techniques I might have to use.
3.) These datasets are synthetically generated to serve the purpose of making the competition easy for beginners by not giving them messy data to deal with. So, it is generated using an already existing dataset of a similar kind and pre processing it to fit our needs. Hovering back to the original datasets gives us a lot of idea about the features as there are clear descriptions about it (Sometimes, they do not give that description, in that case, use Google or any LLM of your choice).
4.) Now the most important part of it is to check the Evaluation criteria. In my case, it is Root Mean Squared Error (RMSE).
5.) There are other sections which we can briefly view such as the Timeline, Prizes, etc but they do not give so much information so to say.
Now, I generally headover to either Jupyter Notebook or Google Colab (Good habit to use Colab). If you are just starting out, you can use Jupyter but make sure to switch to Colab soon. You can also use VS Code if you have a dedicated GPU on your system but if not, prefer using Google Colab.
Many people also suggest taking part in the Titanic Dataset, House Prediction competition which run indefinitely and are inherently made for when a person is a complete beginner in Kaggle and ML / DS but I found Playground to be a better way to learn on a general purpose. There is not messy data you need to deal with so the barrier is pretty low because of that.
I hope you were able to understand a little bit about Kaggle. It's going to take sometime to get well versed with everything in Kaggle so, get started now itself.
In the next blog, I'll cover the part of taking part in your first competition.