Stage 3 Talent

Noodling Around with Data

Bowl of noodles with chopsticks and soup soon

One of my favorite things about tech is the ability to work with data. It’s everywhere, and it has stories to tell. Data analysts, engineers, and scientists can peek at the data, shape the data, and create visualizations that tell those stories.

Kaggle is one of the many resources I use as I work on building a practical data engineering curriculum. Recently, I discovered the Ramen Ratings dataset. I grew up with Maruchan, and I married into a Chinese family, so I’ve been exposed to various other brands as well.

I wanted to poke around the data and generate some visualizations to see what stories I could find about ramen.

Brands in the Ramen Ratings Dataset

With the 2580 records, I was curious to see which brands had the most entries in the dataset. Using pandas, I created a bar graph to show the top 10 brands with the most amount of frequency in the dataset.

Bar Graph of Brands with Most Entries in Ramen Ratings Dataset

Looking at this graph, Nissin had the most ratings by a longshot. Do we know why this is? No. But there are quite a few hundred more Nissin ratings compared to the others. Also, keep in mind that I am looking at the top 10 Brands with values – there are many others. In fact, there are 355 different Brand values in this dataset.

Countries with 5-Star Rated Ramen

I was curious to see if the country mattered when it came to 5-star rated ramen. There are many countries in the list with 5-star rated ramen. This horizontal bar graph below shows that Japan had the most 5-star ratings, with mostly Asian countries except the USA.

Horizontal bar graph of Top Countries with 5-Star Rated Ramen

Star Ratings Distribution

As I wanted to see how the ratings were distributed in the dataset, I created a histogram to understand this:

Histogram of Star Ratings in Ramen Ratings dataset

Looking at the histogram, there are a higher amount of ratings between 3.5-5.0.

Does Style of Ramen Matter?

Since Style is one of the fields we can access, I was curious to see how the ratings stacked up for each style. I created a boxplot, styled with Seaborn, to see how the ratings were by Style:

Boxplot of Ramen Style

Looking at this box plot, most of the styles rated between 3 and 4.5, with Box ramen having a lot of 5-star ratings. There is one Can entry in this dataset, with a 3.5-star rating – Pringles Nissin Top Ramen Chicken Flavor Potato Crisps. And that Bar… for ramen eaters who are thinking… “Ramen Bar?!? What…?!?” Komforte Chockolate released a Ramen Bar that has a 5-star rating in this dataset.

Conclusion

This gives you a high level overview of what to expect if you ever visit The Ramen Rater. It looks like he reviews a variety of brands – though Nissin is the most common by a longshot. There are a variety of forms that ramen comes in, and the form doesn’t seem to matter – the reviews are still across the board in the average to awesome range, as opposed to less than 3.

This also gives you some insight on creating visualizations – these were created using Python, pandas, and Seaborn.

If you want to learn more about how I created this visualizations, reach out to my teammates at Stage 3 Talent via email at contact@stage3talent.com and ask them about the data engineering course that is coming soon!