Panel Data creation from Unstructured Data

A fully automated algorithm to create panel data from unstructured data just by a click

Panel data is widely used in many research and data analysis tasks. But it is complicated to form panel data in a proper format. Generally, we found the dataset for different years separately. But for panel data analysis we've to arrange those data in proper format i.e. data should be arranged in the following way :

demo.xlsx

If the data are in this format then it would be helpful to handle panel data. Here, I'll introduce an algorithm by which it can be done easily. Only one task is to create a data set in a particular format. Suppose we have data of 5 years, 3 separate cross-sections, and 3 different variables. Then, it is easy to generate a data set in the following format:

panel.xlsx

These above-mentioned data should be arranged in order so that we get the columns by years for a particular variable together. If we use the rearrange command in Python then we get this but we also get the cross_section_id column in between this. So we have to shift it to the front column. Then we have the data which can be directly used for the proposed algorithm. This is mentioned in the following :

datanew.xlsx

Now we can fit this into my proposed algorithm to get the final panel data. The final data will look like this :

panel_data.xlsx

Complete code can be found on Github.