🍜🌟 Ramen Reviews EDA w/ R 📈
Context
Ramen is a Japanese noodle soup dish that has gained popularity worldwide. It typically consists of a flavorful broth, thin wheat noodles, and a variety of toppings such as braised pork, eggs, scallions, and bamboo shoots.
The dish originated as a Japanese adaptation of Chinese wheat noodle soups and has evolved into numerous regional variations with different broths, noodles, and toppings. Ramen is a significant part of Japanese cuisine and is enjoyed in various forms, including fresh and instant varieties. 1,2,3
The dish is known for its rich flavors and has become a cultural icon in Japan, with a wide range of regional styles and toppings. While ramen noodles themselves provide limited nutritional value, the overall healthiness of ramen depends on the specific ingredients and preparation methods used.4,5
The growth prospects for the ramen market are significant in the next few years. According to a report on the global instant noodles and ramen market, the market size is estimated to be worth USD 34,220 million in 2022 and is forecast to reach a size of USD 43,600 million by 2028, with a compound annual growth rate (CAGR) of 4.1% during the review period6
Another report anticipates a considerable rise in the glabal instant noodles and ramen market between 2023 and 2029, with the market growing at a steady rate and expected to rise over the projected horizon Latest Market Research Updates.
The demand for instant noodles, including ramen, has been influenced by factors such as changing consumer preferences, urbanization, and the convenience of on-the-go consumption, contributing to the market’s growth7. The ramen noodles market is also expected to witness significant growth, with a considerable portion of the annual revenue attributed to regions such as India, Southeast Asia, Brazil, and others.
Business analysis is important for several reasons. It helps in understanding consumer preferences, market trends, and competitive forces, which are crucial for making informed business decisions. By conducting a thorough analysis, companies can identify growth opportunities, assess potential risks, and develop effective marketing and product strategies to meet consumer demand and stay competitive in the market.
Let’s dive right in.
Data’s description
The Ramen Rater is a product review website for the hardcore ramen enthusiast (or “ramenphile”), with over 2500 reviews to date. This dataset is an export of “The Big List” (of reviews), converted to a CSV format.
Each record in the dataset is a single ramen product review. Review numbers are contiguous: more recently reviewed ramen varieties have higher numbers. Brand, Variety (the product name), Country, and Style (Cup? Bowl? Tray?) are pretty self-explanatory. Stars indicate the ramen quality, as assessed by the reviewer, on a 5-point scale; this is the most important column in the dataset!
Note that this dataset does not include the text of the reviews themselves. For that, you should browse through https://www.theramenrater.com/ instead!
Data Import and Initial Processing
- Importing the Data
- The core of our analysis begins with the importation of the ramen
reviews dataset. Utilizing the
readr
package, we efficiently load theramen-ratings.csv
file. This package is chosen for its speed and ability to handle larger datasets more effectively than the base R functions. Additionally, we handle missing values at this stage by specifying empty strings and the term “Unrated” as NA (Not Available) during the import process. - Alongside the ramen reviews, we import a second dataset, `Countries
- The core of our analysis begins with the importation of the ramen
reviews dataset. Utilizing the
-Continents.csv`, using the same method. This dataset is anticipated to provide valuable geographical context for our analysis, linking each ramen review to its corresponding continent.
- Initial Exploration of the Data
- A primary inspection of both datasets is conducted using the
glimpse()
function. This function offers a quick overview of the data structure, including the types of variables and the first few entries in each column. This step is crucial for familiarizing ourselves with the dataset’s layout, identifying any apparent inconsistencies, and planning subsequent data processing steps.
- A primary inspection of both datasets is conducted using the
# Reading the data
ramen <- readr::read_csv("../data/ramen-ratings.csv", na = c("", "Unrated"))
country_continent <- readr::read_csv("../data/Countries-Continents.csv")
# Initial glimpse (or str) at the data
glimpse(ramen)
## Rows: 2,580
## Columns: 7
## $ `Review #` <dbl> 2580, 2579, 2578, 2577, 2576, 2575, 2574, 2573, 2572, 2571,~
## $ Brand <chr> "New Touch", "Just Way", "Nissin", "Wei Lih", "Ching's Secr~
## $ Variety <chr> "T's Restaurant Tantanmen", "Noodles Spicy Hot Sesame Spicy~
## $ Style <chr> "Cup", "Pack", "Cup", "Pack", "Pack", "Pack", "Cup", "Tray"~
## $ Country <chr> "Japan", "Taiwan", "USA", "Taiwan", "India", "South Korea",~
## $ Stars <dbl> 3.75, 1.00, 2.25, 2.75, 3.75, 4.75, 4.00, 3.75, 0.25, 2.50,~
## $ `Top Ten` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
## Rows: 194
## Columns: 2
## $ Continent <chr> "Africa", "Africa", "Africa", "Africa", "Africa", "Africa", ~
## $ Country <chr> "Algeria", "Angola", "Benin", "Botswana", "Burkina", "Burund~
Overview of Ramen Reviews
Take a look to the first row :
## # A tibble: 6 x 7
## `Review #` Brand Variety Style Country Stars `Top Ten`
## <dbl> <chr> <chr> <chr> <chr> <dbl> <chr>
## 1 2580 New Touch T's Restaurant Tantan~ Cup Japan 3.75 <NA>
## 2 2579 Just Way Noodles Spicy Hot Ses~ Pack Taiwan 1 <NA>
## 3 2578 Nissin Cup Noodles Chicken V~ Cup USA 2.25 <NA>
## 4 2577 Wei Lih GGE Ramen Snack Tomat~ Pack Taiwan 2.75 <NA>
## 5 2576 Ching's Secret Singapore Curry Pack India 3.75 <NA>
## 6 2575 Samyang Foods Kimchi song Song Ramen Pack South ~ 4.75 <NA>
In the ramen, there are 2580 ramen reviews and 7 columns.
For each review, there are informations the following informations :Review #, Brand, Variety, Style, Country, Stars, Top Ten.
Count of Reviews by Brand
# Calculate the number of unique brands
unique_brands_count <- length(unique(ramen$Brand))
# Count the number of reviews for each brand
brand_reviews_count <- ramen %>%
group_by(Brand) %>%
summarize(ReviewCount = n()) %>%
arrange(desc(ReviewCount))
# Calculate the number of brands with less than 5 reviews
Brand_reviews_less_5 <- sum(brand_reviews_count$ReviewCount < 5)
# Calculate the percentage of brands with less than 5 reviews
Brand_reviews_less_5_percent <- (Brand_reviews_less_5 / unique_brands_count) * 100
There are a total of 355 unique brands, but 233 of these brands have fewer than 5 reviews, which constitutes 65.6338028% of all brands.
Data Cleaning and Transformation
This section outlines the key steps taken in the data preparation phase, essential for accurate and insightful analysis.
By carefully cleaning the data, removing irrelevant or sparse columns, and handling missing values, we have ensured the integrity and reliability of our dataset.
The transformation of the Stars
rating into a numeric
format is particularly important for quantitative analyses, such as
statistical testing or trend identification.
The enrichment of our data with geographic information opens new avenues for analysis, enabling us to investigate potential regional preferences or differences in ramen ratings. This aspect is particularly valuable for understanding the global ramen market and consumer trends.
The text analysis of the Variety
column is a pivotal
part of our project. By breaking down these descriptions into individual
words and focusing on the most common terms, we gain insights into
prevalent flavors, ingredients, or other notable characteristics that
define popular ramen products.
Data Cleaning and Preprocessing
- Removal of Sparse Columns
- The
Top.Ten
column, characterized by a high frequency of missing values, was identified as a candidate for removal. This step is crucial to enhance the dataset’s quality and focus on more reliable variables.
- The
- Filtering Out Uncommon Ramen Styles
- We filtered out ramen styles that are rare in our dataset (‘Bar’, ‘Box’, and ‘Can’). This decision aligns with our objective to concentrate on more common and representative ramen styles for a more focused analysis.
- Conversion of Rating Scale
- The
Stars
column, representing the ramen quality rating, was transformed from its original format to a numeric scale. This transformation is vital for enabling quantitative analysis and comparison across different ramen products.
- The
- Exclusion of Incomplete Records
- Our dataset cleansing included the removal of any records with missing data across any columns, ensuring completeness and reliability in the subsequent stages of our analysis.
Data Enrichment and Integration
- Augmentation with Geographic Data
- By performing an inner join with the
country_continent
dataset, we have enriched our ramen reviews with continent information. This enrichment allows us to explore geographical trends and patterns in ramen preferences and ratings.
- By performing an inner join with the
Text Data Transformation
- Unpacking the
Variety
Column- A key step in our text analysis involved breaking down the
Variety
column, which contains descriptive text about each ramen product, into individual words. This process was essential for uncovering common themes and descriptors used in the product variety.
- A key step in our text analysis involved breaking down the
- Identification of Common Words
- We filtered the words to retain only those appearing 100 or more times, categorizing them as “common words.” This focus helps us understand the most prevalent terms and descriptors in our dataset, shedding light on popular characteristics or ingredients in ramen varieties.
- Pattern Creation for Advanced Filtering
- Using the identified common words, we crafted a regex pattern to
filter our dataset further. This pattern allows us to retain reviews
where the
Variety
description includes any of these common terms, thereby focusing our analysis on the most representative and frequently mentioned attributes in ramen varieties.
- Using the identified common words, we crafted a regex pattern to
filter our dataset further. This pattern allows us to retain reviews
where the
Determining the Third Quartile of Reviews
- Counting Reviews per Brand
- The analysis begins with the creation of a new data frame
ramen_nb_commentaire
which focuses on the count of reviews for each ramen brand. This is achieved by selecting theBrand
column from theramen
data, grouping the data byBrand
, and then tallying the number of occurrences for each brand. This process helps in understanding how many reviews each brand has received, which is crucial for identifying popular or frequently reviewed brands.
- The analysis begins with the creation of a new data frame
- Calculating the Third Quartile of Review Counts
- The script calculates the third quartile (75th percentile) of the
review counts across all brands using the
quantile
function. The valuenb_commentaire_075
represents the threshold above which the top 25% of brands fall in terms of the number of reviews. Brands that have a number of reviews greater than or equal to this threshold are considered to be more commonly reviewed or possibly more popular.
- The script calculates the third quartile (75th percentile) of the
review counts across all brands using the
Filtering Brands Based on Review Counts
- Identifying Brands Above the Threshold
- A second data frame
ramen_brand
is created, which once again tallies the number of reviews per brand but also filters out those brands that do not meet the third quartile threshold. This is a key step as it narrows down the dataset to brands that are more commonly reviewed, ensuring that the analysis focuses on brands that have a significant presence in the dataset.
- A second data frame
- Refining the Main Dataset
- The final step involves filtering the main
ramen
dataset to include only those brands that are identified in theramen_brand
data frame. This is done using the%in%
operator, which checks if theBrand
in theramen
dataset is present in theramen_brand
dataset. This filtering step ensures that subsequent analyses are concentrated on the most frequently reviewed brands, providing a more focused and potentially more insightful examination of the data.
- The final step involves filtering the main
Rationale Behind Excluding These Styles
- Focus on Mainstream Ramen Varieties
- The decision to exclude these particular styles—Bar, Box, and Can—likely stems from an analytical focus on more traditional or mainstream types of ramen. These excluded styles might represent a minor portion of the market, specialty products, or formats that are not central to the primary interest of the analysis.
- Enhancing Data Relevance
- By removing less common or less relevant styles, the analysis becomes more streamlined and focused on the types of ramen that are more commonly consumed or reviewed. This helps in ensuring that the findings and insights derived from the dataset are applicable to a broader audience and reflect more general trends in ramen consumption and preference.
# Handling missing values and outliers
ramen <- ramen %>%
select(-`Top Ten`) %>% # Removing Top.Ten due to high NA
filter(Style != "Bar", Style != "Box", Style != "Can") %>% # Filtering out rare styles
mutate(Stars = as.numeric(as.character(Stars))) %>% # Convert Stars to numeric
drop_na() # Drop rows with NA
# Joining with country_continent data
ramen <- inner_join(ramen, country_continent, by = "Country")
# Transforming text data in Variety
ramen_words <- ramen %>%
unnest_tokens(word, Variety) %>%
count(word, sort = TRUE)
# ramen <- ramen %>% unnest_tokens(word, Variety) %>%
# dcast(Review.. + Brand + Style + Country + Stars ~ word)
# Filter for common words (e.g., n >= 100 most frequent)
top_words <- ramen_words %>% filter(n >= 100)
# Create a regular expression pattern that matches any of the top words
pattern <- top_words$word %>%
paste(collapse = "|") # Collapse into a single string separated by '|'
# Filter ramen reviews where Variety contains any of the top words
ramen <- ramen %>%
filter(str_detect(Variety, pattern))
# Determining the 3rd quartile of review counts by brand
ramen_brand_reviews <- ramen %>%
count(Brand) %>%
filter(n >= quantile(n, 0.75))
# Filtering the main dataset to include only brands above the 3rd quartile of review counts
ramen <- ramen %>%
filter(Brand %in% ramen_brand_reviews$Brand)
# Focus on Mainstream Ramen Varieties
ramen <- ramen %>%
filter(!Style %in% c("Bar", "Box", "Can"))
There is a significant reduction in the number of reviews from over 2500 to just 49 after applying the filtering based on the third quartile of review counts per brand.
Exploratory Data Analysis
In this section of the script, several steps are undertaken to analyze the ramen dataset both at individual variable levels (univariate analysis) and in terms of relationships between two variables (bivariate analysis).
## Review # Brand Variety Style
## Min. : 126 Length:49 Length:49 Length:49
## 1st Qu.: 929 Class :character Class :character Class :character
## Median :1460 Mode :character Mode :character Mode :character
## Mean :1455
## 3rd Qu.:1948
## Max. :2552
## Country Stars Continent
## Length:49 Min. :0.500 Length:49
## Class :character 1st Qu.:3.500 Class :character
## Mode :character Median :3.750 Mode :character
## Mean :3.837
## 3rd Qu.:4.750
## Max. :5.000
Characteristic | N = 491 |
---|---|
Review # | 1,460 (929, 1,948) |
Brand | |
Chewy | 4 (8.2%) |
Lucky Me! | 4 (8.2%) |
Mama | 6 (12%) |
MAMA | 3 (6.1%) |
MyKuali | 6 (12%) |
Myojo | 4 (8.2%) |
Nissin | 8 (16%) |
Paldo Vina | 3 (6.1%) |
Sichuan Baijia | 3 (6.1%) |
Vifon | 3 (6.1%) |
Vina Acecook | 5 (10%) |
Variety | |
Artificial Pickled Cabbage Fish Flavor Instant Vermicelli | 1 (2.0%) |
Bestcook Hot spicy Tom Yum Shrimp | 1 (2.0%) |
Cup Rice Vermicelli Shrimp Creamy Tom Yum | 1 (2.0%) |
Cup Rice Vermicelli With Clear Soup | 1 (2.0%) |
Good Artificial Minced Pork Bean Vermicelli | 1 (2.0%) |
Good Chicken Abalone Bean Vermicelli | 1 (2.0%) |
Good Chicken Bean Vermicelli | 1 (2.0%) |
Good Tomyum Kung Bean Vermicelli | 1 (2.0%) |
GooTa Demi Hamburg-Men | 1 (2.0%) |
Hot spicy Flavor Instant Vermicelli | 1 (2.0%) |
Instant Noodles chicken Green Curry Flavour | 1 (2.0%) |
Instant Rice Vermicelli Bihun Goreng Original Flavour | 1 (2.0%) |
Instant Rice Vermicelli Clear Soup | 1 (2.0%) |
Instant Rice Vermicelli Yentafo Tom Yam Mohfai | 1 (2.0%) |
Ippei-chan Yomise-No Yakisoba Oriental | 1 (2.0%) |
Ippei-chan Yomise No Yakisoba Teriyaki Mayo Flavor | 1 (2.0%) |
Koreno Premium Ginseng Flavor | 1 (2.0%) |
Koreno Premium Mushroom Flavor | 1 (2.0%) |
Koreno Premium Shrimp Flavor | 1 (2.0%) |
Lomi Seafood Vegetable | 1 (2.0%) |
MeeKuali spicy Fried Noodle | 1 (2.0%) |
Mennippon Oumi Chanpon | 1 (2.0%) |
Moo Nam Tok Rice Vermicelli | 1 (2.0%) |
Nippon Onomichi Ramen | 1 (2.0%) |
Oriental Kitchen Instant Rice Vermicelli In Gravy | 1 (2.0%) |
Oriental Style Instant Vermicelli Sour Crab Flavour Soup | 1 (2.0%) |
Penang Hokkien Prawn Soup Rice Vermicelli (Bihun) | 1 (2.0%) |
Penang Red tom Yum Goong Noodle (New Version) | 1 (2.0%) |
Penang Red tom Yum Goong Noodle Authentic Taste | 1 (2.0%) |
Penang Red Tom Yum Goong Rice Vermicelli Soup | 1 (2.0%) |
Penang White Curry Rice Vermicelli Soup | 1 (2.0%) |
Pickled Cabbage Flavor Instant Vermicelli | 1 (2.0%) |
Pomidorowa (Mild Tomato) | 1 (2.0%) |
Premium Instant Noodles Roasted Beef Flavour | 2 (4.1%) |
Premium Instant Noodles Spicy Beef Flavour | 2 (4.1%) |
Premium Instant Noodles XO Sauce Seafood Flavour | 1 (2.0%) |
Rice Vermicelli Satay Chicken | 1 (2.0%) |
Rice Vermicelli Spicy Beef With Chilli Flavour | 1 (2.0%) |
Spicy Beef Mami Instant Noodle Soup | 1 (2.0%) |
Stir Rice Vermicelli Indonesian Gado Gado | 1 (2.0%) |
Stir Rice Vermicelli Singaporean Laksa | 1 (2.0%) |
Supreme Instant Mami Noodles With Free Crackers | 1 (2.0%) |
Supreme Sotanghon Artificial Chicken Vermicelli | 1 (2.0%) |
Tom Yam Koong Rice Vermicelli | 1 (2.0%) |
Viet Cuisine Bun Rieu Cua Sour Crab Soup Instant Rice Vermicelli | 1 (2.0%) |
Yomise No Yakisoba Karashi Mentaiko Flavor | 1 (2.0%) |
Yomise No Yakisoba Shiodare Flavor With Black Pepper Mayonnaise | 1 (2.0%) |
Style | |
Bowl | 13 (27%) |
Cup | 3 (6.1%) |
Pack | 27 (55%) |
Tray | 6 (12%) |
Country | |
Cambodia | 1 (2.0%) |
China | 7 (14%) |
Japan | 7 (14%) |
Malaysia | 6 (12%) |
Philippines | 4 (8.2%) |
Poland | 1 (2.0%) |
Singapore | 5 (10%) |
Thailand | 8 (16%) |
Vietnam | 10 (20%) |
Stars | 3.75 (3.50, 4.75) |
Continent | |
Asia | 48 (98%) |
Europe | 1 (2.0%) |
1 Median (IQR); n (%) |
The summary table provides a comprehensive breakdown of the characteristics of 491 ramen reviews. Here’s an analysis of the various components:
- Brand Distribution:
- The brands are fairly varied, with the most reviewed brand being ‘Nissin’ (16% of the reviews), followed by ‘Mama’ and ‘MyKuali’ (each with 12%).
- Other brands like ‘Lucky Me!’, ‘Myojo’, and ‘Chewy’ contribute to a smaller fraction of the reviews (around 8.2% each).
- This distribution gives an idea of which brands are more commonly reviewed, suggesting their popularity or prevalence in the market.
- Variety of Ramen:
- The varieties of ramen are highly diverse, with most varieties being reviewed only once (2% each).
- The most reviewed varieties are ‘Premium Instant Noodles Roasted Beef Flavour’ and ‘Premium Instant Noodles Spicy Beef Flavour’, each having 2 reviews (4.1%).
- This diversity indicates a wide range of ramen types and flavors under consideration.
- Style Distribution:
- The majority of the ramen reviews are for ‘Pack’ style (55%), followed by ‘Bowl’ (27%), ‘Tray’ (12%), and ‘Cup’ (6.1%).
- This suggests that ‘Pack’ and ‘Bowl’ are the most common styles among the reviewed ramen.
- Country Distribution:
- The reviews cover a variety of countries, with the most reviews coming from Vietnam (20%) and Thailand (16%).
- Other significant contributors include China, Japan, Malaysia, and Singapore.
- The geographical spread indicates the global appeal and diversity of ramen cuisine.
- Star Ratings:
- The median star rating is 3.75, with an IQR from 3.50 to 4.75.
- This suggests that most ramen products are rated above average, indicating general satisfaction among the reviewers.
- Continent Distribution:
- A vast majority of the reviews (98%) are for ramen from Asia, with a small percentage (2%) from Europe.
- This aligns with ramen’s origins and popularity in Asian cuisine.
# Bivariate Analysis - Visualizing relationships
ggplot(ramen, aes(x = Continent, y = Stars)) +
geom_boxplot() +
theme_minimal()
# Calculating median Stars for each brand
median_stars_per_brand <- ramen %>%
group_by(Brand) %>%
summarize(median_stars = median(Stars, na.rm = TRUE)) %>%
arrange(median_stars)
# Reordering the Brand factor levels based on median Stars
ramen$Brand <- factor(ramen$Brand, levels = median_stars_per_brand$Brand)
# Creating the plot
ggplot(ramen, aes(x = Brand, y = Stars)) +
geom_boxplot() +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
Summary of Findings from the Ramen Reviews Analysis
Our comprehensive analysis of the ramen reviews dataset has led to several insightful findings:
Brand Diversity: The dataset features a significant number of unique ramen brands, indicating a diverse and competitive market.
Review Concentration: A substantial proportion of brands have fewer than 5 reviews, suggesting market dominance by a few popular brands.
Variety in Flavors and Styles: The wide array of ramen flavors and styles, primarily from Asia, highlights the culinary diversity and broad consumer preferences in the ramen industry.
High Median Ratings: The generally high median star ratings across the dataset suggest overall customer satisfaction with the ramen products.
Brand Performance Insights: Sorting brands by median ratings revealed differences in brand performance, providing valuable insights for consumers and manufacturers about quality and preferences.
Market Segmentation Potential: The analysis points to possible market segments, useful for targeted marketing and product development in the ramen industry.
In conclusion, this analysis not only sheds light on the diverse and complex landscape of the ramen market but also opens avenues for further in-depth studies on consumer behavior, market trends, and competitive strategies within the food industry.
References
wikipedia Ramen - Wikipedia↩︎
thespruceeats What Is Ramen?↩︎
kikkoman What is Ramen?↩︎
eatingwell Are Ramen Noodles Bad for You? Here’s What a Dietitian Has to Say↩︎
Latest Market Research Updates Instant Noodles and Ramen Market Outlook: Opportunities and Threats in the Decade Ahead | Industry Research Biz.↩︎
Polaris Market - Instant Noodles Market Share, Size, Trends, Industry Analysis Report, By Product (Cup, Packet); By Distribution Channel; By Region; Segment Forecast, 2022-2030↩︎