Banking, Finance and Investment Industry
Software Developer: Francis Lim
Project Categories: Data-Mining, Data Cleansing, Financial Modelling, Numerical Analysis
Data Cleansing is the process of ensuring that data is up to date and free of duplication or error. It is to be put in place to ensure the item descriptions remain clean and standardized.
There are increasingly sophisticated financial and statistical methods available for analyzing numerical financial data and decision making. But in practice the data that will be used by these methods can be full of errors caused by dirty data. The presence of bad values in the data can be resulted in making bad observations and thus making bad-decisions. It is wise to make sense to deal with the bad data before the modeling takes place. Improve the quality of the data and you are very likely to improve the quality of the results. So before using the data it should be cleaned, that is, as many of the errors in the data as possible are corrected.
Are my data wrong? There is a hierarchy of problems that are encountered:
1. No values have been input
This could be a structural or observational missing. Structural missing or missing data fields with no values you would not expect to be there, for instance, share price changes will not be available when stock markets are closed at weekends or holidays. The models used need to be able to cope with such values, inventing values to fill such gaps is not a good way to proceed. Observational missing values that have gone wrong or zero value.
2. Impossible or unlikely values have been input
Impossible values can be data of high spike values or non-positive values. These errors are generally straightforward like negative prices when positive ones are expected. If correct values cannot be entered, the observation needs to be moved up and fill-in the missing value.
3. Inconsistent values have been input
Inconsistent values when several values together break a rule. One possible approach is when stock data for today's Low price is higher than today's High Price which is inconsistent.
System Requirements
Windows XP, Windows Vista, Windows 7