Christopher S. Vieira 18 January ,2021
During our work (Nanoindentation Project), we encountered an issue involving null data values in our data set. Our data was in the format of a series of data points with an x value, a y value, and a data value. Unfortunately, this meant that we could not simply remove all null values and use the data as is it is due to the missing points. If the x,y coordinate (say 1,2) was missing then all points that were set after that would be off according to our algorithms. To solve this program, we looked into SciPy’s interpolate library.
SciPy’s interpolate library is invaluable for interpolating data. Best described by their documentation at https://docs.scipy.org/doc/scipy/reference/interpolate.html, “this sub-package contains spline functions and classes, 1-D and multidimensional (univariate and multivariate) interpolation classes, Lagrange and Taylor polynomial interpolators, and wrappers for FITPACK and DFITPACK functions”. In our case, we used it for interpolating two-dimensional data. A missing data point would be aware of its neighbors and their values. We found that the “griddata” method was especially useful to us. This method could be used with two methods, a “cubic method” which was more accurate but did not always fill all null values, and a “nearest method” which was less accurate but always filled every null value that exists. We were able to combine these two methods — first running the cubic method and then running the nearest method to ensure we fill every null value accurately.
Once we had completed filling the data using the SciPy interpolate library, we were able to proceed as usual with our analysis. Not only was the library useful, but it was easier to set up and can be implemented in a way to easily abstract out the exact data we need.