Plotting raw input data. Here the high low open close data rows are in range of 1000s while elements in volume ranges from 100000. Because of that we can only see a flat line at the bottom where we don't see changes compared to volume data.
After normalizing we can see all the data elements are in a common range
Code for the reinforcement learning model.