Due to the Internet's fast development, it has taken on significant importance in both people's personal and professional lives. Due to the tremendous disruption, it causes to people's lives and livelihoods, as well as the significant losses it causes to the social economy, and the spam problem is becoming increasingly prevalent, raising public concern.


Numerous spam detection and filtering techniques, such as text clustering [1], neural networks [2], and support vector machines [3], have emerged and grown quickly. The increasingly prevalent and often updated spam assault and camouflage techniques significantly negatively impact the efficacy and viability of these detection approaches. With the aid of some substitution, insertion, or coding, spam frequently does not interfere with the reader's comprehension of the information, hides its garbage features simultaneously, acts as an interference filter, and uses the characteristics of mining and extraction to achieve the goal of avoiding filter detection, leading some to believe that the effectiveness of a method for detecting spam in practical applications is low. Additionally, spam detection involves a significant amount of real-time data processing.


However, certain machine learning-based detection techniques cannot be effectively used in real-world applications due to performance constraints in model updates and quick detection. In other words, the constantly evolving types of spam and the current state of detection technology demonstrate that the only effective way to address the issue of spam information flooding is to fully utilize various detection technologies, improve on them, and innovate. We must also keep up with The Times. To satisfy the needs of extensive practical applications, it is also vital to strike a balance between detection accuracy and efficiency.


Based on the information above, this topic will examine advanced network junk information detection technology, summarise the state of some network junk information detection technology, compare and contrast the benefits and drawbacks of various technologies, and present the issues that require resolution. This project will conduct extensive research and innovation on the major spam detection technologies from both the theoretical and application levels based on the current technology accomplishments.


The following tasks are the primary goals of this project: (1) As a result of spam camouflage technology, statistics and analysis of vast amounts of real mail data summarise the hidden unique behaviour traits in the spam message format and present a revolutionary feature selection approach for email header enhancement. (2) The machine learning technique for spam identification will be the Support Vector Machine (SVM) approach.


I plan to Design Model, Data Analysis and Machine Learning for the first two weeks.

References

  1. Sasaki M, Shinnou H. Spam detection using text clustering[C]//2005 International Conference on Cyberworlds (CW'05). IEEE, 2005: 4 pp.-319.

  2. Wu C H. Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks[J]. Expert systems with Applications, 2009, 36(3): 4321-4330.

  3. Vishagini V, Rajan A K. An improved spam detection method with weighted support vector machine[C]//2018 International Conference on Data Science and Engineering (ICDSE). IEEE, 2018: 1-5.