This project used statistical analysis, a/b testing, and visualization to decide whether the new landing page of an online news portal (E-news Express) is effective enough to gather new subscribers or not.
The simulated dataset has certain important metrics such as converted status and time spent on the page that will help to conclude the effectiveness of the new landing page. Apart from that, the dependence of conversion on the preferred language will also be analyzed in this project.
Skills Covered
Hypothesis Testing
a/b testing
Data Visualization
Statistical Inference
Tools Used:
Python: Jupyter Notebook
Libraries: Numpy, Pandas, Matplotlib, Seaborn, scikit-learn.
Executive Summary
● Users spending more time on the new page indicates that the outline & recommended content of the new page is more likely to keep customers engaged long enough to make a decision to subscribe,
● The conversion rate for the new page is greater than the conversion rate of the old page, indicating that the new page is more likely to gather new subscribers than the existing page,
● The conversion status is independent of the preferred language,
● The time spent on the new page does not differ with content in different languages, indicating that irrespective of the language, the outline & recommended content of the new page are engaging,
● It is recommended that the news company uses the new landing page to gather more subscribers as the tests conducted support the business logic to design a page that people spend time in, conversion will follow.
Business Problem Overview and Solution Approach
● E-news Express is an online news portal that aims to expand its business by acquiring new subscribers based on their analysis of user actions to understand user interests and determine how to drive better engagement.
● The executives at E-news Express are of the opinion that there has been a decline in new monthly subscribers compared to the past year because the current webpage is not designed well enough in terms of the outline & recommended content to keep customers engaged long enough to make a
decision to subscribe.
● The design team of the company has researched and created a new landing page that has a new outline & more relevant content shown compared to the old page. The Data Science team experimented by displaying the old and the new landing pages to two sets of users and recording their activity. They want to test the effectiveness of the new landing page based on the data collected from the experiment.
● The task is to explore the data and perform a statistical analysis (at 5% significance) to determine the effectiveness of the new landing page in gathering new subscribers for the news portal by testing certain hypotheses.
● The time spent on the page seems to have a fairly normal distribution.
● There are no outliers in this column.
Group & Landing page
There are 2 unique groups - control and treatment.
The distribution of the number of users across the two groups are the same.
● There are 2 landing_pages - new and old.
● The distribution of the number of users across the two landing pages are the same.
Converted status & Preferred language
There are 2 unique groups - control and treatment.
The distribution of the number of users across the two groups are the same.
● There are 2 landing_pages - new and old.
● The distribution of the number of users across the two landing pages are the same.
● The median time spent by the different language users are approximately equal.
● Overall, the users who get converted seem to spend more time on the page.
● From the sample data, it is observed that the median time spent on the new landing page is higher than that on the old landing page.
● The hypothesis that users spend more time on the new landing page than on the existing landing page was tested.
● The test results showed enough statistical evidence to conclude that the mean time spent by the users on the new page is greater than the mean time spent by the users on the old page.
Conversion rate of old vs new landing page
● From the sample data, it is observed that the number of users who get converted to subscribers (conversion rate) is more for the new page than the old page.
● The hypothesis that the conversion rate for the new page is greater than the old page was tested.
● The test results showed enough statistical evidence to conclude that the conversion rate for the new page is greater than the conversion rate for the old page.
Dependence of conversion on preferred language
● From the sample data, it is observed that among the converted users, the count of English users is highest, and among the non-converted users, the count of French users is the highest.
● The hypothesis that the converted status is dependent on the preferred language was tested.
● The test results showed that there isn’t enough statistical evidence to say that the converted status depends on the preferred language.
Time spent on new landing page across different languages
● From the sample data, it is observed that the median time spent on the new page by English users is a bit higher than that by French and Spanish users.
● The hypothesis that at least one of the mean times spent on the new page by English, French, and Spanish users is unequal.
● The test results showed that there isn’t enough statistical evidence to say that at least one of the mean times spent on the new page by English, French and Spanish users is unequal.
Hypothesis Testing Details
The null hypothesis
H0 : μ1 = μ2
was tested against the alternative hypothesis
Ha: μ1> μ2 where μ1and μ2 denote the mean time spent by the users on the new and the old landing pages respectively
● As we have to test for the equality of means with unequal and unknown standard deviations from two independent populations, a two-sample independent t-test was performed.
● A p-value of 0.0001 was obtained, which is lower than the significance level of 0.05, so the null hypothesis was rejected.
The null hypothesis
H0: p1= p2
was tested against the alternative hypothesis
Ha: p1> p2 where p1and p2 denote the conversion rate for the new and old page respectively
● As we have to test for the equality of proportions from two independent populations, a two-sample proportion z-test was performed.
● A p-value of 0.0080 was obtained, which is lower than the significance level of 0.05. So, the null hypothesis was rejected.
The null hypothesis
H0: The converted status is independent of the preferred language.
was tested against the alternative hypothesis
Ha: The converted status is not independent of the preferred language.
● As we have to test the independence between two categorical variables, a chi-square test for independence was performed
● A p-value of 0.2130 was obtained, which is higher than the significance level of 0.05 So, we fail to reject the null hypothesis.
The null hypothesis
H0: The mean times spent on the new page by English, French, and Spanish users are equal.
was tested against the alternative hypothesis .
Ha: At least one of the mean times spent on the new page by English, French, and Spanish users is unequal.
● As we have to test the equality of means from three independent populations, a one-way ANOVA test was performed.
● A p-value of 0.4320 was obtained, which is higher than the significance level of 0.05 So, we fail to reject the null hypothesis.
Conclusion & Recommendations
The users spend more time on the new page:
This indicates that the outline & recommended content of the new page is more likely to keep customers engaged long enough to make a decision to subscribe
The conversion rate for the new page is greater than the conversion rate of the old page:
This indicates that the new page is more likely to gather new subscribers than the existing page
The conversion status is independent of the preferred language
The time spent on the new page does not differ with the language of the content
This indicates that irrespective of the language, the outline & recommended content of the new page are engaging
It is recommended that the news company uses the new landing page to gather more subscribers
The business logic would be to design a page that people spend time in, conversion will follow.