Trade&Ahead is a financial consultancy firm which provides its customers with personalized investment strategies.
Skills Covered
EDA,
Kmeans Clustering,
Hierarchical Clustering,
Cluster Profiling
Tools Used:
Python: Jupyter Notebook
Libraries: Numpy, Pandas, Matplotlib, Seaborn, scikit-learn, Scipy.
Executive Summary
This project provides insights derived from a cluster analysis of company stocks based on their financial performance. The analysis aims to assist Trade&Ahead in offering more personalized investment strategies to their clients. The analysis identifies four distinct clusters:
Cluster 0: Poor performers, primarily from the Energy sector, requiring deeper financial statement analysis.
Cluster 1: High-growth, high-priced stocks (Healthcare & IT) with strong earnings justifying valuations. High P/E ratios suggest potential overvaluation or high growth expectations.
Cluster 2: Low-priced, strong performers with high trading volumes. Low P/E ratios might indicate undervaluation, requiring further investigation.
Cluster 3: Moderately priced, less volatile stocks with good performance and moderate trading volumes. Moderate EPS indicates good potential returns.
Business Problem Overview
Trade&Ahead aims to provide personalized investment advice but struggles with overwhelming financial data. Manually analyzing numerous stocks across various metrics is time-consuming and inefficient. This analysis addresses this challenge by leveraging cluster analysis to:
Group stocks with similar characteristics.
Identify low-correlation stocks for portfolio diversification.
Offer insights into each cluster for tailored investment recommendations.
Solution Approach
Analyze the stocks data, grouping the stocks based on the attributes provided, and sharing insights about the characteristics of each group.
We aim at using clustering algorthm to categorize stocks based on:
Current price
Price change
Volatility
Financial ratios (ROE, P/E, P/B, etc.)
This data-driven approach will allow us for:
Improved Stock Analysis: Invest in a more targeted manner by understanding the characteristics of each cluster.
Enhanced Diversification: Select low-correlation stocks from different clusters to minimize portfolio risk.
Next steps to be considered
Incorporating More Indicators: Include additional technical and fundamental indicators for a more comprehensive analysis.
Sector-Specific Clustering: Conduct cluster analysis for individual economic sectors to provide even more precise investment recommendations.
EDA RESULTS
The Exploratory Data Analysis will base on answering the following major questions:
What does the distribution of stock prices look like?
The stocks of which economic sector have seen the maximum price increase on average?
How are the different variables correlated with each other?
Cash ratio provides a measure of a company's ability to cover its short-term obligations using only cash and cash equivalents. How does the average cash ratio vary across economic sectors?
P/E ratios can help determine the relative value of a company's shares as they signify the amount of money an investor is willing to invest in a single share of a company per dollar of its earnings. How does the P/E ratio vary, on average, across economic sectors?
The distribution is heavily right-skewed and very few stocks have a price of more than 200 dollars.
The distribution of percentage change in stock price is very close to normal (Symmetric, not normal!)
The distribution is slightly right-skewed and few stocks show very high volatility.
30% of the companies in the data are from Financials and Industrials sectors.
Telecommunication Services has the least weightage of companies in the data.
Most of the companies in the data belong to the Oil and Gas Exploration and Production sub-industry.
Stocks from the Healthcare sector witnessed the maximum price increase, with an ~9% increase, on average.
Stocks from the Energy sector were the only ones to witness a price drop on average.
IT sector companies have the highest cash ratio on average, so they are more likely to be able to cover their short-term obligations using only cash and cash equivalents.
Utilities sector companies have the lowest cash ratio on average, so they are less likely to be able to cover their short-term obligations using only cash and cash equivalents.
The P/E ratio is highest (on average) for the Energy sector companies but they also witness an average price decrease over the past quarter.
This might indicate that the stocks of this sector are overvalued.
None of the variables show a very high correlation with each other.
The net income of a company is positively correlated with the earnings per share and number of outstanding shares.
This makes sense as the more money a company makes, the more each shareholder will earn and more people will be willing to invest in the company.
Percentage change in price and price volatility are negatively correlated.
Selecting Elbow
For 4 clusters, there is a nick in the elbow plot and the silhouette score is high too.
So, we will move ahead with k=4.
Insights:
Cluster 0
There are 277 companies in this cluster.
The stocks of the companies in this cluster have a moderate price, low volatility, and have witnessed a 5% price rise on average.
These companies have a low cash ratio and low positive net income.
These companies have a low P/E ratio and a low number of outstanding shares.
Cluster 1
There are 11 companies in this cluster.
The stocks of the companies in this cluster have a low price, low volatility, and have witnessed a 6% price rise on average.
These companies have a low cash ratio and high positive net income.
These companies have a low P/E ratio and a high number of outstanding shares.
Cluster 2
There are 27 companies in this cluster.
The stocks of the companies in this cluster have a low price, high volatility, and have witnessed an 15% price drop on average.
These companies have a low cash ratio and negative net income.
These companies have a high P/E ratio and a low number of outstanding shares.
Cluster 3
There are 25 companies in this cluster.
The stocks of the companies in this cluster have a high price, moderate volatility, and have witnessed a 13.5% price rise on average.
These companies have a high cash ratio and low positive net income.
These companies have a moderate P/E ratio and low to moderate number of outstanding shares.
Let's move ahead with 3 clusters, Euclidean distance, and average linkage.
We see that there is one single company cluster, one cluster of two companies, and all other companies have been grouped into one cluster. This clustering does not look good as the clusters do not have enough variability.
Cluster 0
There are 29 companies in this cluster.
The stocks of the companies in this cluster have a low price, high volatility, and have witnessed an 11% price drop on average.
These companies have a low cash ratio and negative net income.
These companies have a high P/E ratio and a low number of outstanding shares.
Cluster 1
There are 15 companies in this cluster.
The stocks of the companies in this cluster have a high price, moderate volatility, and have witnessed a 10.5% price rise on average.
These companies have a high cash ratio and low positive net income.
These companies have a high P/E ratio and a low number of outstanding shares.
Cluster 2
There are 11 companies in this cluster.
The stocks of the companies in this cluster have a low price, low volatility, and have witnessed a 6% rise drop on average.
These companies have a moderate cash ratio and high positive net income.
These companies have a low P/E ratio and a high number of outstanding shares.
Cluster 3
There are 285 companies in this cluster.
The stocks of the companies in this cluster have a moderate price, low volatility, and have witnessed a 5% price rise on average.
These companies have a moderate cash ratio and moderate positive net income.
These companies have a moderate P/E ratio and a low number of outstanding shares.
Both K-means and Hierarchical clustering yield two similar clusters and two slightly dissimilar clusters.
Few companies were swapped between the clusters obtained using both techniques.
The sklearn implementations of both the techniques take nearly the same amount of time to execute. However, dendrograms in Hierarchical clustering take a more time to render.
Cluster 0: companies are the ones who performed poorly in the previous quarter. Most of the companies in this cluster are from the Energy sector, which as a whole performed poorly in the previous quarter. Investors will have to be careful and dig deeper into the financial statements of these companies for a better analysis.
Cluster 1: companies have highly-priced stocks and have shown the most growth in the previous quarter. Large proportions of stocks in this cluster are from Healthcare and IT sectors. The high earnings per share also indicate that these stocks will justify their high price tag with good returns. The high P/E ratio indicates that these stocks are either overvalued or investors expect good growth potential from these companies.
Cluster 2: companies are the ones which have a low price and have had a great previous quarter. They are less riskier investments and are traded in large volumes. The low P/E ratio might also indicate that these stocks are undervalued, but we will have to dig deeper to conclude the same with confidence.
Cluster 3: company stocks are moderately priced and are less volatile. They had a good previous quarter and their stocks are traded in moderate volumes. The moderate earnings per share also indicate that these stocks will yield good returns.
Trade&Ahead should look into more financial (fundamental and technical) indicators to make better predictions of stock price movements and assessment of company valuation.
They should also conduct cluster analysis separately for each of the economic sectors as it will help them to provide better investment recommendations to their clients.