Listening to Online Reviews

PART A

Part A of the "Listening to Online Reviews" project analyzes customer feedback from Sephora to understand trends in star ratings, review length, and sentiment using data analytics and visualization techniques in RStudio. By examining review distributions, sentiment scores (using VADER), and customer behavior patterns, the project provides actionable insights for brand managers to enhance marketing strategies and customer engagement.

Star Rating Analysis

CODE:

# Calculate mean star rating

mean_rating <- mean(reviews$rating, na.rm = TRUE)

print(paste("Average Star Rating:", round(mean_rating, 2)))

# Plot histogram of star ratings

ggplot(reviews, aes(x = rating)) +

geom_histogram(binwidth = 1, fill = "blue", color = "black") +

labs(title = "Distribution of Star Ratings", x = "Star Rating", y = "Count")

OUTPUT:

"Average Star Rating: 4.04"

Analysis

Average Rating = 4.04
- The average star rating is 4.04, meaning most customers are generally satisfied with the products.
- However, it's not a perfect 5.0, indicating some areas for improvement.
Star Rating Distribution (Histogram)
- The majority of reviews are 5-star, showing that most customers had a positive experience.
- There are fewer 1-, 2-, and 3-star reviews, but they still exist, meaning some customers were dissatisfied.
- 4-star ratings are present but noticeably lower than 5-star ratings.

Marketing Implication: Sephora should highlight positive reviews in marketing while addressing concerns from lower ratings to improve customer satisfaction.

Review Length and Sentiment Analysis

CODE:

# Calculate review length

reviews$review_length <- nchar(as.character(reviews$review_text))

# Plot histogram of review lengths

ggplot(reviews, aes(x = review_length)) +

geom_histogram(binwidth = 50, fill = "purple", color = "black") +

labs(title = "Distribution of Review Lengths", x = "Review Length (characters)", y = "Count")

# Define long and short reviews based on median length

median_length <- median(reviews$review_length, na.rm = TRUE)

reviews <- reviews %>%

mutate(review_type = ifelse(review_length > median_length, "Long", "Short"))

# Apply VADER sentiment analysis

reviews$sentiment <- vader_df(reviews$review_text)$compound

# Function to compare sentiment scores

compare_sentiment <- function(data, rating_value) {

subset_data <- data %>% filter(rating == rating_value)

long_reviews <- subset_data %>% filter(review_type == "Long")

short_reviews <- subset_data %>% filter(review_type == "Short")

long_mean <- mean(long_reviews$sentiment, na.rm = TRUE)

short_mean <- mean(short_reviews$sentiment, na.rm = TRUE)

t_test_result <- t.test(long_reviews$sentiment, short_reviews$sentiment, na.rm = TRUE)

list(

long_mean = long_mean,

short_mean = short_mean,

p_value = t_test_result$p.value

)

}

# Compare sentiments for one-star reviews

one_star_comparison <- compare_sentiment(reviews, 1)

print(one_star_comparison)

# Compare sentiments for five-star reviews

five_star_comparison <- compare_sentiment(reviews, 5)

print(five_star_comparison)

OUTPUT:

$long_mean

[1] 0.2847692

$short_mean

[1] 0.1069118

$p_value

[1] 0.05727717

$long_mean

[1] 0.8346481

$short_mean

[1] 0.6845736

$p_value

[1] 3.180911e-12

Analysis

Review Length Distribution
- The histogram shows that most reviews are short (around 200-500 characters), with fewer very long reviews.
- There are some extremely long reviews, but they are rare.
Sentiment Analysis (VADER Scores)
- One-Star Reviews
  1. Long reviews have an average sentiment score of 0.28 (slightly positive).
  2. Short reviews have an average sentiment score of 0.11 (closer to neutral).
  3. p-value = 0.057 → This means the difference is almost statistically significant (just above the 0.05 threshold), suggesting longer one-star reviews may have slightly more positive sentiment.
- Five-Star Reviews
  1. Long reviews have an average sentiment score of 0.83 (highly positive).
  2. Short reviews have an average sentiment score of 0.68 (still positive but lower).
  3. p-value = 3.18e-12 (very close to 0) → This is statistically significant, meaning longer five-star reviews tend to have significantly more positive sentiment than shorter ones.

Marketing Implication: Longer reviews tend to express stronger sentiment, both positive and negative, so Sephora can leverage detailed positive reviews in marketing while analyzing lengthy negative reviews to identify key areas for product improvement.

Part B

This analysis explores Sephora’s customer reviews to identify key sentiment trends and areas for improvement. Most reviews are positive, with longer reviews expressing stronger emotions, both positive and negative. Using these insights, the "Real Results, Real Reviews" campaign leverages user-generated content and influencer collaborations to boost engagement and trust, ensuring Sephora continues to connect with its audience effectively.

Word Cloud: Common Words in Reviews

CODE:

# Load text processing libraries

library(tidytext)

library(tm)

library(wordcloud)

# Create word cloud for positive reviews (4 & 5 stars)

positive_reviews <- reviews_clean %>% filter(rating >= 4)

wordcloud(positive_reviews$review_text, max.words = 100, colors = "blue")

# Create word cloud for negative reviews (1 & 2 stars)

negative_reviews <- reviews_clean %>% filter(rating <= 2)

wordcloud(negative_reviews$review_text, max.words = 100, colors = "red")

Common Words in Positive Reviews

Common words in positive reviews highlight what customers love about Sephora’s products.

Common Words in Negative Reviews

Negative review words reveal potential areas for improvement (e.g., “greasy,” “dry,” “expensive”).

Analysis:

Positive Reviews

Common words include “skin,” “cream,” “moisturizer,” “hydrated,” “love,” and “smooth”.
Customers highlight the hydration benefits, with words like “moisturizer,” “hydrated,” and “smooth” appearing prominently.
“Love” and “amazing” show that many customers are highly enthusiastic about their purchases.
The word “Mer” suggests that the brand La Mer is frequently mentioned in positive reviews, possibly indicating strong brand loyalty.

Managerial Implication: Marketing should lean into the strengths customers love—hydration, smooth application, and overall quality. Leveraging user-generated content (e.g., customer testimonials, influencer collaborations) to emphasize these strengths can help reinforce the brand's reputation. Additionally, Sephora can use “hydration” and “smooth skin” messaging in ad campaigns to align with what customers value most.

Negative Reviews

Common words include “dry,” “price,” “thick,” “greasy,” “expensive,” and “hype”.
Pricing is a recurring issue, indicating that some customers feel the product is overpriced for its effectiveness.
Texture-related complaints (“thick,” “greasy,” “dry”) suggest a mismatch between product feel and customer expectations.
The word “hype” appearing frequently may indicate that customers expected more based on marketing but were disappointed with the actual results.

Managerial Implication: Sephora should adjust marketing messaging to better set expectations—if a product has a thick consistency, emphasizing “rich hydration” or “luxurious feel” could shift the perception positively. Additionally, offering sample sizes or mini versions could reduce price-related complaints, allowing customers to try the product before committing to a full purchase.

Review Length vs. Sentiment Score

CODE:

# Scatter plot: Review Length vs. Sentiment Score

ggplot(reviews_clean, aes(x = review_length, y = sentiment)) +

geom_point(alpha = 0.4, color = "purple") +

geom_smooth(method = "lm", color = "red") +

labs(title = "Review Length vs. Sentiment Score", x = "Review Length (characters)", y = "Sentiment Score")

Analysis

The positive correlation suggests that longer reviews tend to express stronger sentiment, either highly positive or highly negative.
Shorter reviews are more neutral or slightly negative, which may indicate quick feedback or dissatisfaction.

Managerial Implication: Encouraging longer, detailed reviews from satisfied customers could strengthen brand credibility and provide deeper insights into product benefits. At the same time, analyzing lengthy negative reviews can help pinpoint specific areas for improvement.

Average Sentiment per Star Rating

CODE:

# Calculate average sentiment per star rating

avg_sentiment <- reviews_clean %>%

group_by(rating) %>%

summarize(avg_sentiment = mean(sentiment, na.rm = TRUE))

# Bar chart of Average Sentiment by Rating

ggplot(avg_sentiment, aes(x = factor(rating), y = avg_sentiment, fill = factor(rating))) +

geom_col() +

labs(title = "Average Sentiment by Star Rating", x = "Star Rating", y = "Average Sentiment Score")

Analysis

As expected, higher-star ratings have higher sentiment scores, confirming that VADER sentiment analysis aligns with numerical ratings.
Interestingly, even 1- and 2-star reviews have some degree of positive sentiment, meaning customers often acknowledge some good aspects of the product despite their dissatisfaction.
The jump from 3-star to 4-star sentiment scores is notable, suggesting that most neutral reviewers lean positive rather than negative.

Managerial Implication: This confirms that not all negative reviews are fully critical—some dissatisfied customers still mention positive aspects of their experience. Highlighting these mixed reviews in marketing (e.g., "Not for me, but still a quality product!") could help build trust and manage expectations for new customers. Sephora can also proactively address common issues in FAQs or product descriptions to prevent misunderstandings.

Histogram of Star Ratings

CODE:

# Load necessary libraries

library(ggplot2)

library(dplyr)

# Histogram of Star Ratings

ggplot(reviews_clean, aes(x = rating)) +

geom_bar(fill = "blue", color = "black") +

labs(title = "Distribution of Star Ratings", x = "Star Rating", y = "Count")

Analysis

The majority of reviews are 5 stars, indicating that most customers are highly satisfied with their purchases.
A significant drop is observed in 1-4 star ratings, with 1-star reviews being slightly more common than 2- and 3-star reviews.
This suggests that while most customers love the product, those who dislike it tend to give it the lowest possible rating rather than a neutral 2- or 3-star rating.

Managerial Implication: Sephora is positively perceived overall, but the sharp contrast between 5-star and 1-star reviews suggests a polarizing experience for some customers. This could indicate that expectations are not always met, leading to some extreme negative feedback. A closer look at 1-star reviews (via word clouds and sentiment analysis) can help identify common complaints, allowing Sephora to adjust product messaging or improve formulations where necessary.

Social Media Campaign Strategy (Dragonfly & STEPPS Frameworks)

Campaign Theme:

“Real Results, Real Reviews” – Showcasing authentic customer experiences.

Content Plan:

UGC (User-Generated Content): Encourage customers to share their before-and-after skincare results.
Influencer Partnerships: Partner with beauty influencers to create tutorial videos.
Social Proof: Feature long, positive reviews in Instagram Reels & TikTok videos.

Platforms & Timing:

Instagram, TikTok, and YouTube Shorts
Best Posting Time: Evenings (6-9 PM, Tues & Thurs)

Metrics to Track:

Engagement Rate (likes, shares, comments)
Sentiment Analysis (before vs. after campaign)
Increase in 4-5 Star Reviews

Part C

This analysis explores how customer reviews reflect differences in sentiment between L’Occitane and La Mer. I used text analysis, star ratings, and sentiment scoring to understand what customers consistently praise or criticize. The results helped uncover patterns in satisfaction based on product features and customer characteristics.

Sentiment Analysis vs. Star Ratings

CODE:

# Use sentimentr

vader_sentiments <- full_data_filtered %>%

mutate(sentiment = sentiment_by(review_text)$ave_sentiment)

# Compare with rating

ggplot(vader_sentiments, aes(x = factor(rating), y = sentiment)) +

geom_boxplot() +

labs(title = "Sentiment Score by Star Rating", x = "Star Rating", y = "VADER Sentiment")

Analysis

The VADER sentiment scores showed a clear trend: the higher the star rating, the more positive the sentiment. However, even 5-star reviews varied widely in sentiment, indicating that numerical ratings alone don’t always capture the full emotional tone of a review.

Positive-to-Negative Review Ratio

CODE:

# Bar chart showing average ratio per brand

brand_ratios <- ratio_data %>%

group_by(brand_name) %>%

summarize(mean_ratio = mean(pos_neg_ratio, na.rm = TRUE))

ggplot(brand_ratios, aes(x = brand_name, y = mean_ratio, fill = brand_name)) +

geom_col(width = 0.5) +

labs(

title = "Average Positive/Negative Review Ratio by Brand",

x = "Brand",

y = "Mean Positive/Negative Ratio"

) +

theme_minimal()

Analysis

La Mer has a higher average ratio of positive to negative reviews compared to L’Occitane, suggesting more consistent customer satisfaction. This quick metric highlights La Mer’s stronger performance in meeting customer expectations.

Word Clouds: High vs. Low Ratings

CODE:

# Plot wordclouds by rating group and brand

plot_wordcloud <- function(data, brand, group) {

data %>%

filter(brand_name == brand, rating_group == group) %>%

count(word, sort = TRUE) %>%

with(wordcloud(word, n, max.words = 100, colors = brewer.pal(8, "Dark2")))

}

# Examples:

plot_wordcloud(review_words, "L'Occitane", "High")

plot_wordcloud(review_words, "L'Occitane", "Low")

plot_wordcloud(review_words, "La Mer", "High")

plot_wordcloud(review_words, "La Mer", "Low")

"L'Occitane", "High"

Analysis:

Customers praised the cream, hydration, luxury feel, and effectiveness, often describing the products as a “miracle” or “worth the money.”

"L'Occitane", "Low"

Analysis:

Common complaints centered around price, hype, and acne breakouts. Words like "didn't," "thick," and "greasy" were frequent.

"La Mer", "High"

Analysis:

Words like skin, amazing, smooth, and soft appeared most often, reflecting satisfaction with texture, smell, and hydration.

"La Mer", "Low"

Analysis:

Negative reviews emphasized smell, sensitivity, and oily texture, with some referencing ineffective results or product irritation.

Consumer Type Preferences

CODE:

# Average rating by skin type, eye color, etc.

full_data_filtered %>%

group_by(skin_type) %>%

summarize(avg_rating = mean(rating, na.rm = TRUE), n = n()) %>%

ggplot(aes(x = reorder(skin_type, avg_rating), y = avg_rating)) +

geom_bar(stat = "identity", fill = "steelblue") +

coord_flip() +

labs(title = "Average Rating by Skin Type", x = "Skin Type", y = "Average Rating")

Analysis

When breaking down average ratings by skin type, customers with combination and normal skin gave the highest ratings overall, while oily skin types tended to give slightly lower ratings. This could suggest formulation mismatches or differing expectations based on skin type.

Summary

Both the VADER sentiment scores and the positive-to-negative ratio method yielded consistent brand comparisons. However, VADER provided a more nuanced understanding of emotional tone, while the ratio method was better for quick, interpretable benchmarking.

Managerial Implications: This analysis highlights the importance of looking beyond star ratings to understand the language customers use when describing their experiences. While both L’Occitane and La Mer receive praise for hydration and texture, customers who give lower ratings often mention issues like greasiness, breakouts, or skin irritation, especially among those with oily or sensitive skin. These patterns suggest that product formulas or marketing strategies should be adjusted to directly address these concerns. Additionally, the demographic insights show that customers with certain skin types consistently give higher ratings, which can guide more targeted campaigns or inspire new product lines. The comparison between L’Occitane and La Mer shows that even small differences in review sentiment and satisfaction ratios can impact how brands are perceived in the market. By combining sentiment analysis with customer attributes, brands can make more strategic decisions about product development, positioning, and competitive differentiation.

Part D

Part D explores a seasonal collaboration between L’Occitane and Starbucks to create a fast beauty subscription box inspired by Starbucks’ drink themes. I used ChatGPT to help brainstorm the Spring box, which centers on lavender, a calming ingredient that aligns with wellness trends and seasonal routines. This concept blends scent, skincare, and lifestyle to deliver a sensory experience that fits both brands and resonates with their audiences.

Starbucks x L’Occitane: Customer-Centric Fast Beauty Subscription Box

This spring, L’Occitane partners with Starbucks to launch a limited-edition fast beauty subscription box inspired by Starbucks’ seasonal drink themes. The first edition centers on lavender, a calming, spring-forward ingredient that appeals to skincare users seeking relaxation, wellness, and sensory experience. This collaboration connects two lifestyle brands to meet customers where they already are—driven by routine, ritual, and seasonality.

Applying the Customer-Centric Framework

Customer Acquisition:
1. This box offers a way to reach both Starbucks loyalists and beauty enthusiasts through joint email campaigns, co-branded packaging, and digital ad targeting. Launching pop-up stations in select Starbucks stores with QR codes for sign-ups could boost conversions and create real-world buzz.
Customer Retention:
1. By releasing a new box each season tied to a Starbucks drink theme (e.g., Pumpkin Spice in Fall, Peppermint Mocha in Winter), the partnership creates a subscription cadence tied to customers' natural habits. Incentives like early access, loyalty rewards, or personalized item selection could encourage renewals.
Customer Profitability:
1. Data from repeat purchases, seasonal engagement, and social sharing behavior would help identify high-value segments. Targeting these customers with personalized upsells—like full-size versions or exclusive seasonal drops—would increase lifetime value.
Customer Experience:
1. The packaging and unboxing should evoke the Starbucks drink experience through color, scent, and product selection. Calming lavender aromas, pastel tones, and a short, relaxing product routine build emotional connection and delight.

Additional Analytics and Data Collection

To strengthen customer insights and refine offerings, the following data should be gathered:
Search trends for lavender-related skincare across platforms like Google Trends and TikTok
Customer reviews from past calming or spring-themed products for language analysis
Survey data on scent preferences and skincare habits by season
Engagement data from Starbucks and L’Occitane’s social media and loyalty programs

Social listening could reveal rising seasonal ingredients or wellness trends that inform future boxes.

Calculations to Perform

Customer Acquisition Cost (CAC) vs. Customer Lifetime Value (CLV) by channel
Churn rate of seasonal subscribers
Net Promoter Score (NPS) by box
A/B test results for product pairings or scent intensity
Sentiment analysis on user reviews and social comments related to the lavender theme

These metrics would guide what to keep, change, or expand in future boxes.

Recommended Products for the Spring “Lavender” Box

To bring the lavender theme to life, the box should focus on calming, easy-to-use beauty products that fit into customers’ existing routines without feeling overwhelming or overly complex. Each product should feel like a moment of self-care, just like a Starbucks drink break.

Lavender Hydrating Face Mist or Toner
1. A refreshing spray that can be used morning or night to prep the skin or reset during the day. This product is ideal for busy customers who want a sensory boost without disrupting makeup or skincare.
Lightweight Calming Moisturizer
1. A gel-cream or light lotion that absorbs quickly and soothes redness or irritation. Formulated with lavender extract and minimal fragrance, this product would be safe for sensitive skin and perfect for spring weather transitions.
Under-Eye Masks or Cooling Patches
1. A pack of individually wrapped eye gels infused with lavender and caffeine to reduce puffiness and provide a calming experience. These would be a hit with customers who want visible results with minimal effort.
Lavender Pillow Spray
1. A crossover wellness product that connects beauty and sleep. Customers could spritz this on their pillows or sheets as part of a relaxing nighttime ritual. The lavender scent ties directly to the theme and encourages daily use.
Travel-Size Lavender Hand Cream
1. Practical, portable, and on-brand. A non-greasy formula with a soft lavender scent supports hand care on the go, and the mini size feels perfect for gifting or tossing into a purse.
Starbucks-Inspired Lip Balm (Lavender Vanilla)
1. This bonus product could be shaped like a tiny coffee cup or feature seasonal Starbucks branding. A subtle flavor like lavender vanilla connects the two brands and gives the box a playful, collectible item that stands out.

All items should reflect simplicity, softness, and serenity to embody both the spring season and the essence of lavender. Packaging should be pastel, minimal, and aligned with Starbucks’ seasonal aesthetic to create a cohesive unboxing experience.

Page updated

Report abuse

Listening to Online Reviews

Common Words in Positive Reviews

Common Words in Negative Reviews

Analysis

Analysis

Analysis

Analysis

Analysis

"L'Occitane", "High"

"L'Occitane", "Low"

"La Mer", "High"

"La Mer", "Low"

Analysis

Recommended Products for the Spring “Lavender” Box

Questions? Email me at jle323@lehigh.edu ☺