The details of the data collection and analysis steps are listed below. The full dataset as well as the code used in processing is available here.
Price data of the benchmark index NIFTY 50 for the entire 11 year period was downloaded from the NSE website.
Daily Price Volume Archive files (providing information about every single stock traded in a day, such as price range, volume, turnover etc.) for the period were also downloaded from NSE's website using a bot. This was done to ensure that analysis was not affected by survivorship bias. These files are generated by the exchange everyday after trading hours and are not edited afterwards. Therefore they contain the entire set of stocks that were traded on any particular day. Getting price data from any other source would have risked excluding the companies that did not survive the period of analysis.
A file containing the list of companies who had changed their symbol on the exchange was downloaded from here.
A file with the details of every stock split during the period was generated from this webpage using a screen capture program.
Files with profit/loss and assets data for each company(found by running a SELECT DISTINCT query on the price volume archive dataset and then eliminating financial and utility companies which are not included in the model because of their unique financial structure) were generated by running a screen capture program on the website screener.in. This website was used specifically because it lists data even for companies that have been merged or delisted.
The price volume archive files were appended into one single file and imported into SQL.
The new symbol field in symbol change data was updated (using a python program) to the latest symbol for companies whose symbol was changed more than once. This data was then imported into SQL.
The splits file was processed manually before importing into SQL. For each split, a factor was calculated to divide the price of the share in records before the date of the split. For shares that were split multiple times, the factor was multiplied by all split factors coming after it.
The profit/loss files were combined into one file with each record containing the symbol of the company, a date representing the financial year for which the report was filed and financial data such as sales, cost of goods sold, interest on debt, profit before tax, earnings per share among others. Due to the way that text was structured(containing only month and year) on the source webpage, the date in each record was set to the first of the month as a proxy. This was kept in mind in joining the tables later on.
The assets files were also combined into one file as above with a single record containing the symbol of the company, the financial year of the report and the total assets of the company.
The price volume archive and symbol change tables were left joined and the symbol field in the former table was updated to the latest symbol for all records where the new symbol field was not empty.
The price volume archive and splits tables were left joined and the price in the former table was divided by the splits factor for all records where the multiplier field was not empty.
The price volume archive table was then queried to find the median price of each stock in a given month. The median price was selected specifically to ensure that the stock could easily be sold for the price in that month. The resulting table (henceforth called the PVA table) containing the symbol, month, year, median price for the month and a proxy date(the first of the month) representing that month was used as the price reference for all analysis thereafter.
The assets and profit/loss tables were joined into a single table named reports.
A company has two months from the end of a financial year to file its financial reports. Each report is therefore available only two months after the end of the financial year. And since the proxy date for each record in report was set to the first day in the last month of the financial year, the report was available three months after this proxy date. Each report was also assumed to be in effect for 1 year after its availability (or 14 months from the proxy date in the report record), after which either a new report was available or the the stock was not included in the analysis for the subsequent period until the next report was filed.
The above constraints were used to left join the PVA and reports tables where each record in PVA was joined with a record in reports if an applicable report was found for it. This table was used for the rest of the analysis.
Price/Earnings(P/E) ratio was easily calculated by dividing the price for the month by the EPS number from the report.
Return on assets(ROA) was calculated by adding the profit before tax and interest columns and dividing the result by total assets.
P/E and ROA values for all the stocks were then assigned ranks (increasing and decreasing respectively) partitioned by months. Records with P/E values less than 5 were ignored since such abnormally low values indicate unusual circumstances for the company.
These ranks were then added and a row number (also partitioned by months) was assigned in increasing order for this sum. This row number is the final rank that is used to grade the relative value proposition of the stocks.
The NIFTY data was also queried for the median of the daily closing price grouped by months. Each record was assigned a row number of zero and appended to the above table.
A column was created for the price of each stock after 12 months and populated using VLOOKUP.
This final table was used to generate the Power BI report.