World Library  
Flag as Inappropriate
Email this Article

Anomaly detection

Article Id: WHEBN0008190902
Reproduction Date:

Title: Anomaly detection  
Author: World Heritage Encyclopedia
Language: English
Subject: Data mining, OPTICS algorithm, Supervised learning, ELKI, BIRCH
Publisher: World Heritage Encyclopedia

Anomaly detection

In data mining, anomaly detection (or outlier detection) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset.[1] Typically the anomalous items will translate to some kind of problem such as bank fraud, a structural defect, medical problems or finding errors in text. Anomalies are also referred to as outliers, novelties, noise, deviations and exceptions.[2]

In particular in the context of abuse and network intrusion detection, the interesting objects are often not rare objects, but unexpected bursts in activity. This pattern does not adhere to the common statistical definition of an outlier as a rare object, and many outlier detection methods (in particular unsupervised methods) will fail on such data, unless it has been aggregated appropriately. Instead, a cluster analysis algorithm may be able to detect the micro clusters formed by these patterns.[3]

Three broad categories of anomaly detection techniques exist. Unsupervised anomaly detection techniques detect anomalies in an unlabeled test data set under the assumption that the majority of the instances in the data set are normal by looking for instances that seem to fit least to the remainder of the data set. Supervised anomaly detection techniques require a data set that has been labeled as "normal" and "abnormal" and involves training a classifier (the key difference to many other statistical classification problems is the inherent unbalanced nature of outlier detection). Semi-supervised anomaly detection techniques construct a model representing normal behavior from a given normal training data set, and then testing the likelihood of a test instance to be generated by the learnt model.


Anomaly detection is applicable in a variety of domains, such as intrusion detection, fraud detection, fault detection, system health monitoring, event detection in sensor networks, and detecting Eco-system disturbances. It is often used in preprocessing to remove anomalous data from the dataset. In supervised learning, removing the anomalous data from the dataset often results in a statistically significant increase in accuracy.[4][5]

Popular techniques

Several anomaly detection techniques have been proposed in literature. Some of the popular techniques are:

Fuzzy Logic based method

In recent years fuzzy logic has been adopted in several outlier detection approaches in order to improve results coming from popular outlier detection techniques. Yousri et al.[8] proposed an approach based on fuzzy logic in order to merge results obtained with an outlier detection method, which establishes if a pattern is an outlier and a clustering algorithm which provides results in allocating patterns to clusters. The two results, provided by the two different approaches, are then combined to give a meaure of outlierness. Another fuzzy based approach is proposed by Xue et al.[9] This approach, called Fuzzy Rough Semi Supervised Outlier Detection (FRSSOD), is the combination of two approaches: the Semi-Supervised Outlier Detection method (SSOD) [10] and the Fuzzy Rough C-means clustering (FRCM).[11] The objective of the FRSSOD method is to establish if a pattern under consideration can be considered as an outlier. Finally another fuzzy-based method, called Fuzzy Combination of Outlier Detection Techniques (FUCOT) [12] combines different popular outlier detection methods exploiting the advantages of each of them while overcoming their drawbacks.

Application to data security

Anomaly detection was proposed for Intrusion detection systems (IDS) by Dorothy Denning in 1986.[13] Anomaly detection for IDS is normally accomplished with thresholds and statistics, but can also be done with Soft computing, and inductive learning.[14] Types of statistics proposed by 1999 included profiles of users, workstations, networks, remote hosts, groups of users, and programs based on frequencies, means, variances, covariances, and standard deviations.[15] The counterpart of Anomaly detection in Intrusion detection is Misuse Detection.

See also


  1. ^ Chandola, V.; Banerjee, A.; Kumar, V. (2009). "Anomaly detection: A survey".  
  2. ^ Hodge, V. J.; Austin, J. (2004). "A Survey of Outlier Detection Methodologies". Artificial Intelligence Review 22 (2): 85.  
  3. ^ Dokas, Paul; Levent Ertoz, Vipin Kumar, Aleksandar Lazarevic, Jaideep Srivastava, Pang-Ning Tan (2002). "Data mining for network intrusion detection". Proceedings NSF Workshop on Next Generation Data Mining. 
  4. ^ Tomek, Ivan (1976). "An Experiment with the Edited Nearest-Neighbor Rule".  
  5. ^ Smith, M. R.; Martinez, T. (2011). "The 2011 International Joint Conference on Neural Networks". p. 2690.  
  6. ^ Breunig, M. M.;  
  7. ^ Zimek, A.; Campello, R. J. G. B.; Sander, J. R. (2014). "Ensembles for unsupervised outlier detection". ACM SIGKDD Explorations Newsletter 15: 11.  
  8. ^ Yousri, N.A.; Ismal, M.A. Kamel, M.S. "Fuzzy Outlier Analysis a combined Clustering-outlier Detection Approach". IEEE SMC 2007. 
  9. ^ Xue, Z.; Shang, Y. Feng, A. "Semi-supervised outlier detection based on fuzzy rough C-means clustering". Mathematics and Computers in Simulation 80, pp. 1911-1921, 2010. 
  10. ^ Gao, J.; Cheng, H.B. Tan, P.N. "Semi-supervised outlier detection". Proceedings of the ACM Symposiumon Applied Computing, vol. 1, ACM Press, Dijon France, 2006, pp. 635-636. 
  11. ^ Hu, Q.; Yu, D. "An improved clustering algorithm for information granulation". Proceedings of 2nd International Conference on Fuzzy Systems and Knowledge Discovery, Vol. 3613, LNCS, Springer Verlag Berlin/Heidelberg/Changsa, china, pp. 494-504,2006. 
  12. ^ Cateni, S.; Colla,V. Nastasi, G. "A multivariate fuzzy system applied for outliers detection". Journal of Intelligent and Fuzzy Systems, 24 (4), pp.889-903, 2013. 
  13. ^  
  14. ^ Teng, H. S.; Chen, K.; Lu, S. C. (1990). "Adaptive real-time anomaly detection using inductively generated sequential patterns". Proceedings of the IEEE Computer Society Symposium on Research in Security and Privacy: 278–284.  
  15. ^ Jones, Anita K.; Sielken, Robert S. (1999). "Computer System Intrusion Detection: A Survey". Technical Report, Department of Computer Science, University of Virginia, Charlottesville, VA.  
This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.

Copyright © World Library Foundation. All rights reserved. eBooks from Project Gutenberg are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.