🍜🌟 Ramen Reviews EDA w/ R 📈

Context

Ramen is a Japanese noodle soup dish that has gained popularity worldwide. It typically consists of a flavorful broth, thin wheat noodles, and a variety of toppings such as braised pork, eggs, scallions, and bamboo shoots.

The dish originated as a Japanese adaptation of Chinese wheat noodle soups and has evolved into numerous regional variations with different broths, noodles, and toppings. Ramen is a significant part of Japanese cuisine and is enjoyed in various forms, including fresh and instant varieties. 1,2,3

The dish is known for its rich flavors and has become a cultural icon in Japan, with a wide range of regional styles and toppings. While ramen noodles themselves provide limited nutritional value, the overall healthiness of ramen depends on the specific ingredients and preparation methods used.4,5

The growth prospects for the ramen market are significant in the next few years. According to a report on the global instant noodles and ramen market, the market size is estimated to be worth USD 34,220 million in 2022 and is forecast to reach a size of USD 43,600 million by 2028, with a compound annual growth rate (CAGR) of 4.1% during the review period6

Another report anticipates a considerable rise in the glabal instant noodles and ramen market between 2023 and 2029, with the market growing at a steady rate and expected to rise over the projected horizon Latest Market Research Updates.

The demand for instant noodles, including ramen, has been influenced by factors such as changing consumer preferences, urbanization, and the convenience of on-the-go consumption, contributing to the market’s growth7. The ramen noodles market is also expected to witness significant growth, with a considerable portion of the annual revenue attributed to regions such as India, Southeast Asia, Brazil, and others.

Business analysis is important for several reasons. It helps in understanding consumer preferences, market trends, and competitive forces, which are crucial for making informed business decisions. By conducting a thorough analysis, companies can identify growth opportunities, assess potential risks, and develop effective marketing and product strategies to meet consumer demand and stay competitive in the market.

Let’s dive right in.

Data’s description

The Ramen Rater is a product review website for the hardcore ramen enthusiast (or “ramenphile”), with over 2500 reviews to date. This dataset is an export of “The Big List” (of reviews), converted to a CSV format.

Each record in the dataset is a single ramen product review. Review numbers are contiguous: more recently reviewed ramen varieties have higher numbers. Brand, Variety (the product name), Country, and Style (Cup? Bowl? Tray?) are pretty self-explanatory. Stars indicate the ramen quality, as assessed by the reviewer, on a 5-point scale; this is the most important column in the dataset!

Note that this dataset does not include the text of the reviews themselves. For that, you should browse through https://www.theramenrater.com/ instead!

Data Import and Initial Processing

  1. Importing the Data
    • The core of our analysis begins with the importation of the ramen reviews dataset. Utilizing the readr package, we efficiently load the ramen-ratings.csv file. This package is chosen for its speed and ability to handle larger datasets more effectively than the base R functions. Additionally, we handle missing values at this stage by specifying empty strings and the term “Unrated” as NA (Not Available) during the import process.
    • Alongside the ramen reviews, we import a second dataset, `Countries

-Continents.csv`, using the same method. This dataset is anticipated to provide valuable geographical context for our analysis, linking each ramen review to its corresponding continent.

  1. Initial Exploration of the Data
    • A primary inspection of both datasets is conducted using the glimpse() function. This function offers a quick overview of the data structure, including the types of variables and the first few entries in each column. This step is crucial for familiarizing ourselves with the dataset’s layout, identifying any apparent inconsistencies, and planning subsequent data processing steps.
# Reading the data
ramen <- readr::read_csv("../data/ramen-ratings.csv", na = c("", "Unrated"))
country_continent <- readr::read_csv("../data/Countries-Continents.csv")

# Initial glimpse (or str) at the data
glimpse(ramen)
## Rows: 2,580
## Columns: 7
## $ `Review #` <dbl> 2580, 2579, 2578, 2577, 2576, 2575, 2574, 2573, 2572, 2571,~
## $ Brand      <chr> "New Touch", "Just Way", "Nissin", "Wei Lih", "Ching's Secr~
## $ Variety    <chr> "T's Restaurant Tantanmen", "Noodles Spicy Hot Sesame Spicy~
## $ Style      <chr> "Cup", "Pack", "Cup", "Pack", "Pack", "Pack", "Cup", "Tray"~
## $ Country    <chr> "Japan", "Taiwan", "USA", "Taiwan", "India", "South Korea",~
## $ Stars      <dbl> 3.75, 1.00, 2.25, 2.75, 3.75, 4.75, 4.00, 3.75, 0.25, 2.50,~
## $ `Top Ten`  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
glimpse(country_continent)
## Rows: 194
## Columns: 2
## $ Continent <chr> "Africa", "Africa", "Africa", "Africa", "Africa", "Africa", ~
## $ Country   <chr> "Algeria", "Angola", "Benin", "Botswana", "Burkina", "Burund~

Overview of Ramen Reviews

Take a look to the first row :

head(ramen)
## # A tibble: 6 x 7
##   `Review #` Brand          Variety                Style Country Stars `Top Ten`
##        <dbl> <chr>          <chr>                  <chr> <chr>   <dbl> <chr>    
## 1       2580 New Touch      T's Restaurant Tantan~ Cup   Japan    3.75 <NA>     
## 2       2579 Just Way       Noodles Spicy Hot Ses~ Pack  Taiwan   1    <NA>     
## 3       2578 Nissin         Cup Noodles Chicken V~ Cup   USA      2.25 <NA>     
## 4       2577 Wei Lih        GGE Ramen Snack Tomat~ Pack  Taiwan   2.75 <NA>     
## 5       2576 Ching's Secret Singapore Curry        Pack  India    3.75 <NA>     
## 6       2575 Samyang Foods  Kimchi song Song Ramen Pack  South ~  4.75 <NA>
var_names <- colnames(ramen)

In the ramen, there are 2580 ramen reviews and 7 columns.

For each review, there are informations the following informations :Review #, Brand, Variety, Style, Country, Stars, Top Ten.

Count of Reviews by Brand

# Calculate the number of unique brands
unique_brands_count <- length(unique(ramen$Brand))

# Count the number of reviews for each brand
brand_reviews_count <- ramen %>%
  group_by(Brand) %>%
  summarize(ReviewCount = n()) %>% 
  arrange(desc(ReviewCount))


# Calculate the number of brands with less than 5 reviews
Brand_reviews_less_5 <- sum(brand_reviews_count$ReviewCount < 5)

# Calculate the percentage of brands with less than 5 reviews
Brand_reviews_less_5_percent <- (Brand_reviews_less_5 / unique_brands_count) * 100

There are a total of 355 unique brands, but 233 of these brands have fewer than 5 reviews, which constitutes 65.6338028% of all brands.

Data Cleaning and Transformation

This section outlines the key steps taken in the data preparation phase, essential for accurate and insightful analysis.

By carefully cleaning the data, removing irrelevant or sparse columns, and handling missing values, we have ensured the integrity and reliability of our dataset.

The transformation of the Stars rating into a numeric format is particularly important for quantitative analyses, such as statistical testing or trend identification.

The enrichment of our data with geographic information opens new avenues for analysis, enabling us to investigate potential regional preferences or differences in ramen ratings. This aspect is particularly valuable for understanding the global ramen market and consumer trends.

The text analysis of the Variety column is a pivotal part of our project. By breaking down these descriptions into individual words and focusing on the most common terms, we gain insights into prevalent flavors, ingredients, or other notable characteristics that define popular ramen products.

Data Cleaning and Preprocessing

  1. Removal of Sparse Columns
    • The Top.Ten column, characterized by a high frequency of missing values, was identified as a candidate for removal. This step is crucial to enhance the dataset’s quality and focus on more reliable variables.
  2. Filtering Out Uncommon Ramen Styles
    • We filtered out ramen styles that are rare in our dataset (‘Bar’, ‘Box’, and ‘Can’). This decision aligns with our objective to concentrate on more common and representative ramen styles for a more focused analysis.
  3. Conversion of Rating Scale
    • The Stars column, representing the ramen quality rating, was transformed from its original format to a numeric scale. This transformation is vital for enabling quantitative analysis and comparison across different ramen products.
  4. Exclusion of Incomplete Records
    • Our dataset cleansing included the removal of any records with missing data across any columns, ensuring completeness and reliability in the subsequent stages of our analysis.

Data Enrichment and Integration

  1. Augmentation with Geographic Data
    • By performing an inner join with the country_continent dataset, we have enriched our ramen reviews with continent information. This enrichment allows us to explore geographical trends and patterns in ramen preferences and ratings.

Text Data Transformation

  1. Unpacking the Variety Column
    • A key step in our text analysis involved breaking down the Variety column, which contains descriptive text about each ramen product, into individual words. This process was essential for uncovering common themes and descriptors used in the product variety.
  2. Identification of Common Words
    • We filtered the words to retain only those appearing 100 or more times, categorizing them as “common words.” This focus helps us understand the most prevalent terms and descriptors in our dataset, shedding light on popular characteristics or ingredients in ramen varieties.
  3. Pattern Creation for Advanced Filtering
    • Using the identified common words, we crafted a regex pattern to filter our dataset further. This pattern allows us to retain reviews where the Variety description includes any of these common terms, thereby focusing our analysis on the most representative and frequently mentioned attributes in ramen varieties.

Determining the Third Quartile of Reviews

  1. Counting Reviews per Brand
    • The analysis begins with the creation of a new data frame ramen_nb_commentaire which focuses on the count of reviews for each ramen brand. This is achieved by selecting the Brand column from the ramen data, grouping the data by Brand, and then tallying the number of occurrences for each brand. This process helps in understanding how many reviews each brand has received, which is crucial for identifying popular or frequently reviewed brands.
  2. Calculating the Third Quartile of Review Counts
    • The script calculates the third quartile (75th percentile) of the review counts across all brands using the quantile function. The value nb_commentaire_075 represents the threshold above which the top 25% of brands fall in terms of the number of reviews. Brands that have a number of reviews greater than or equal to this threshold are considered to be more commonly reviewed or possibly more popular.

Filtering Brands Based on Review Counts

  1. Identifying Brands Above the Threshold
    • A second data frame ramen_brand is created, which once again tallies the number of reviews per brand but also filters out those brands that do not meet the third quartile threshold. This is a key step as it narrows down the dataset to brands that are more commonly reviewed, ensuring that the analysis focuses on brands that have a significant presence in the dataset.
  2. Refining the Main Dataset
    • The final step involves filtering the main ramen dataset to include only those brands that are identified in the ramen_brand data frame. This is done using the %in% operator, which checks if the Brand in the ramen dataset is present in the ramen_brand dataset. This filtering step ensures that subsequent analyses are concentrated on the most frequently reviewed brands, providing a more focused and potentially more insightful examination of the data.

Rationale Behind Excluding These Styles

  1. Focus on Mainstream Ramen Varieties
    • The decision to exclude these particular styles—Bar, Box, and Can—likely stems from an analytical focus on more traditional or mainstream types of ramen. These excluded styles might represent a minor portion of the market, specialty products, or formats that are not central to the primary interest of the analysis.
  2. Enhancing Data Relevance
    • By removing less common or less relevant styles, the analysis becomes more streamlined and focused on the types of ramen that are more commonly consumed or reviewed. This helps in ensuring that the findings and insights derived from the dataset are applicable to a broader audience and reflect more general trends in ramen consumption and preference.
# Handling missing values and outliers
ramen <- ramen %>%
  select(-`Top Ten`) %>% # Removing Top.Ten due to high NA
  filter(Style != "Bar", Style != "Box", Style != "Can") %>% # Filtering out rare styles
  mutate(Stars = as.numeric(as.character(Stars))) %>% # Convert Stars to numeric
  drop_na() # Drop rows with NA

# Joining with country_continent data
ramen <- inner_join(ramen, country_continent, by = "Country")

# Transforming text data in Variety
ramen_words <- ramen %>% 
  unnest_tokens(word, Variety) %>% 
  count(word, sort = TRUE)

# ramen <- ramen %>% unnest_tokens(word, Variety) %>% 
#   dcast(Review.. + Brand + Style + Country + Stars  ~ word)

# Filter for common words (e.g., n >= 100 most frequent)
top_words <- ramen_words %>% filter(n >= 100)

# Create a regular expression pattern that matches any of the top words
pattern <- top_words$word %>% 
  paste(collapse = "|") # Collapse into a single string separated by '|'

# Filter ramen reviews where Variety contains any of the top words
ramen <- ramen %>%
  filter(str_detect(Variety, pattern))

# Determining the 3rd quartile of review counts by brand
ramen_brand_reviews <- ramen %>%
  count(Brand) %>%
  filter(n >= quantile(n, 0.75))

# Filtering the main dataset to include only brands above the 3rd quartile of review counts
ramen <- ramen %>%
  filter(Brand %in% ramen_brand_reviews$Brand)

# Focus on Mainstream Ramen Varieties
ramen <- ramen %>% 
  filter(!Style %in% c("Bar", "Box", "Can"))

There is a significant reduction in the number of reviews from over 2500 to just 49 after applying the filtering based on the third quartile of review counts per brand.

Exploratory Data Analysis

In this section of the script, several steps are undertaken to analyze the ramen dataset both at individual variable levels (univariate analysis) and in terms of relationships between two variables (bivariate analysis).

# Univariate analysis
summary(ramen)
##     Review #       Brand             Variety             Style          
##  Min.   : 126   Length:49          Length:49          Length:49         
##  1st Qu.: 929   Class :character   Class :character   Class :character  
##  Median :1460   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :1455                                                           
##  3rd Qu.:1948                                                           
##  Max.   :2552                                                           
##    Country              Stars        Continent        
##  Length:49          Min.   :0.500   Length:49         
##  Class :character   1st Qu.:3.500   Class :character  
##  Mode  :character   Median :3.750   Mode  :character  
##                     Mean   :3.837                     
##                     3rd Qu.:4.750                     
##                     Max.   :5.000
# Creating a summary table
statistics_table <- tbl_summary(ramen)
statistics_table
Characteristic N = 491
Review # 1,460 (929, 1,948)
Brand
    Chewy 4 (8.2%)
    Lucky Me! 4 (8.2%)
    Mama 6 (12%)
    MAMA 3 (6.1%)
    MyKuali 6 (12%)
    Myojo 4 (8.2%)
    Nissin 8 (16%)
    Paldo Vina 3 (6.1%)
    Sichuan Baijia 3 (6.1%)
    Vifon 3 (6.1%)
    Vina Acecook 5 (10%)
Variety
    Artificial Pickled Cabbage Fish Flavor Instant Vermicelli 1 (2.0%)
    Bestcook Hot spicy Tom Yum Shrimp 1 (2.0%)
    Cup Rice Vermicelli Shrimp Creamy Tom Yum 1 (2.0%)
    Cup Rice Vermicelli With Clear Soup 1 (2.0%)
    Good Artificial Minced Pork Bean Vermicelli 1 (2.0%)
    Good Chicken Abalone Bean Vermicelli 1 (2.0%)
    Good Chicken Bean Vermicelli 1 (2.0%)
    Good Tomyum Kung Bean Vermicelli 1 (2.0%)
    GooTa Demi Hamburg-Men 1 (2.0%)
    Hot spicy Flavor Instant Vermicelli 1 (2.0%)
    Instant Noodles chicken Green Curry Flavour 1 (2.0%)
    Instant Rice Vermicelli Bihun Goreng Original Flavour 1 (2.0%)
    Instant Rice Vermicelli Clear Soup 1 (2.0%)
    Instant Rice Vermicelli Yentafo Tom Yam Mohfai 1 (2.0%)
    Ippei-chan Yomise-No Yakisoba Oriental 1 (2.0%)
    Ippei-chan Yomise No Yakisoba Teriyaki Mayo Flavor 1 (2.0%)
    Koreno Premium Ginseng Flavor 1 (2.0%)
    Koreno Premium Mushroom Flavor 1 (2.0%)
    Koreno Premium Shrimp Flavor 1 (2.0%)
    Lomi Seafood Vegetable 1 (2.0%)
    MeeKuali spicy Fried Noodle 1 (2.0%)
    Mennippon Oumi Chanpon 1 (2.0%)
    Moo Nam Tok Rice Vermicelli 1 (2.0%)
    Nippon Onomichi Ramen 1 (2.0%)
    Oriental Kitchen Instant Rice Vermicelli In Gravy 1 (2.0%)
    Oriental Style Instant Vermicelli Sour Crab Flavour Soup 1 (2.0%)
    Penang Hokkien Prawn Soup Rice Vermicelli (Bihun) 1 (2.0%)
    Penang Red tom Yum Goong Noodle (New Version) 1 (2.0%)
    Penang Red tom Yum Goong Noodle Authentic Taste 1 (2.0%)
    Penang Red Tom Yum Goong Rice Vermicelli Soup 1 (2.0%)
    Penang White Curry Rice Vermicelli Soup 1 (2.0%)
    Pickled Cabbage Flavor Instant Vermicelli 1 (2.0%)
    Pomidorowa (Mild Tomato) 1 (2.0%)
    Premium Instant Noodles Roasted Beef Flavour 2 (4.1%)
    Premium Instant Noodles Spicy Beef Flavour 2 (4.1%)
    Premium Instant Noodles XO Sauce Seafood Flavour 1 (2.0%)
    Rice Vermicelli Satay Chicken 1 (2.0%)
    Rice Vermicelli Spicy Beef With Chilli Flavour 1 (2.0%)
    Spicy Beef Mami Instant Noodle Soup 1 (2.0%)
    Stir Rice Vermicelli Indonesian Gado Gado 1 (2.0%)
    Stir Rice Vermicelli Singaporean Laksa 1 (2.0%)
    Supreme Instant Mami Noodles With Free Crackers 1 (2.0%)
    Supreme Sotanghon Artificial Chicken Vermicelli 1 (2.0%)
    Tom Yam Koong Rice Vermicelli 1 (2.0%)
    Viet Cuisine Bun Rieu Cua Sour Crab Soup Instant Rice Vermicelli 1 (2.0%)
    Yomise No Yakisoba Karashi Mentaiko Flavor 1 (2.0%)
    Yomise No Yakisoba Shiodare Flavor With Black Pepper Mayonnaise 1 (2.0%)
Style
    Bowl 13 (27%)
    Cup 3 (6.1%)
    Pack 27 (55%)
    Tray 6 (12%)
Country
    Cambodia 1 (2.0%)
    China 7 (14%)
    Japan 7 (14%)
    Malaysia 6 (12%)
    Philippines 4 (8.2%)
    Poland 1 (2.0%)
    Singapore 5 (10%)
    Thailand 8 (16%)
    Vietnam 10 (20%)
Stars 3.75 (3.50, 4.75)
Continent
    Asia 48 (98%)
    Europe 1 (2.0%)
1 Median (IQR); n (%)

The summary table provides a comprehensive breakdown of the characteristics of 491 ramen reviews. Here’s an analysis of the various components:

  1. Brand Distribution:
    • The brands are fairly varied, with the most reviewed brand being ‘Nissin’ (16% of the reviews), followed by ‘Mama’ and ‘MyKuali’ (each with 12%).
    • Other brands like ‘Lucky Me!’, ‘Myojo’, and ‘Chewy’ contribute to a smaller fraction of the reviews (around 8.2% each).
    • This distribution gives an idea of which brands are more commonly reviewed, suggesting their popularity or prevalence in the market.
  2. Variety of Ramen:
    • The varieties of ramen are highly diverse, with most varieties being reviewed only once (2% each).
    • The most reviewed varieties are ‘Premium Instant Noodles Roasted Beef Flavour’ and ‘Premium Instant Noodles Spicy Beef Flavour’, each having 2 reviews (4.1%).
    • This diversity indicates a wide range of ramen types and flavors under consideration.
  3. Style Distribution:
    • The majority of the ramen reviews are for ‘Pack’ style (55%), followed by ‘Bowl’ (27%), ‘Tray’ (12%), and ‘Cup’ (6.1%).
    • This suggests that ‘Pack’ and ‘Bowl’ are the most common styles among the reviewed ramen.
  4. Country Distribution:
    • The reviews cover a variety of countries, with the most reviews coming from Vietnam (20%) and Thailand (16%).
    • Other significant contributors include China, Japan, Malaysia, and Singapore.
    • The geographical spread indicates the global appeal and diversity of ramen cuisine.
  5. Star Ratings:
    • The median star rating is 3.75, with an IQR from 3.50 to 4.75.
    • This suggests that most ramen products are rated above average, indicating general satisfaction among the reviewers.
  6. Continent Distribution:
    • A vast majority of the reviews (98%) are for ramen from Asia, with a small percentage (2%) from Europe.
    • This aligns with ramen’s origins and popularity in Asian cuisine.
# Bivariate Analysis - Visualizing relationships
ggplot(ramen, aes(x = Continent, y = Stars)) +
  geom_boxplot() +
  theme_minimal()

ggplot(ramen, aes(x = Style, y = Stars)) +
  geom_boxplot() +
  theme_minimal()

# Calculating median Stars for each brand
median_stars_per_brand <- ramen %>%
  group_by(Brand) %>%
  summarize(median_stars = median(Stars, na.rm = TRUE)) %>%
  arrange(median_stars)

# Reordering the Brand factor levels based on median Stars
ramen$Brand <- factor(ramen$Brand, levels = median_stars_per_brand$Brand)

# Creating the plot
ggplot(ramen, aes(x = Brand, y = Stars)) +
  geom_boxplot() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

Summary of Findings from the Ramen Reviews Analysis

Our comprehensive analysis of the ramen reviews dataset has led to several insightful findings:

  1. Brand Diversity: The dataset features a significant number of unique ramen brands, indicating a diverse and competitive market.

  2. Review Concentration: A substantial proportion of brands have fewer than 5 reviews, suggesting market dominance by a few popular brands.

  3. Variety in Flavors and Styles: The wide array of ramen flavors and styles, primarily from Asia, highlights the culinary diversity and broad consumer preferences in the ramen industry.

  4. High Median Ratings: The generally high median star ratings across the dataset suggest overall customer satisfaction with the ramen products.

  5. Brand Performance Insights: Sorting brands by median ratings revealed differences in brand performance, providing valuable insights for consumers and manufacturers about quality and preferences.

  6. Market Segmentation Potential: The analysis points to possible market segments, useful for targeted marketing and product development in the ramen industry.

In conclusion, this analysis not only sheds light on the diverse and complex landscape of the ramen market but also opens avenues for further in-depth studies on consumer behavior, market trends, and competitive strategies within the food industry.

References