Outliers mining in large data sets
* Block Nested Loop algorithm and Local Outlier Factor algorithm, flowchart and implementation in C#
// in progress
SYSTEM OUTLIER MINING Outlier detection has recently become an important problem in many industrial and financial applications. Data objects which differ significantly from the remaining data objects are referred to as outliers. Outlier detection is concerned with discovering exceptional behaviors of objects. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, instrument error or simply human error. Their detection can identify system faults, fraud and can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. My program implement over two algorithms, the first one is for finding out distancebased outliers based on nested loops along outliers. That is based on the distance of a point from its ”nearest neighbor” and rank each point on the basis of its distance to it’s nearest neighbor and declare the top points in this ranking to be outliers. The second algorithm is for detecting densitybased local outliers by Local Outlier Factor algorithm. It is local in that the degree depends on how isolated the object is with respect to the surrounding neighborhood. "System Outliers Mining" contains both algorithms. Testing demonstrate that LOF can be used to find outliers which appear to be meaningful, but can otherwise not be identified with existing approaches. Densitybased algorithm is more powerful than the distancebased scheme when a dataset contains diverse characteristics. download program: download article: LOF Identifying DensityBased Local Outliers.pdf Parallel Alg For Distance And Densitybased Outliers.pdf LSC(and LOF)Mine: Algorithm for Mining Local Outliers.pdf Mining DistanceBased Outliers in Near Linear Time with Randomization and a Simple Pruning Rule.pdf AllNearestNeighbors Queries in Spatial Databases.pdf Novelty Detection for Robot Neotaxis.pdf A DensityBased Algorithm for Discovering Clusters in Large Spatial Databases with Noise.pdf  

http://indeks.lodz.pl biuro rachunkowe  Łódź, ul. Gdańska 126
http://www.goldenline.pl/rafalbarczynski  me rafalba