In the second part of the laboratory we will work on an Anti Spam filter. The purpose is to develop a series of methods that could compute if a email is spam or not.
The project will be done in Python3.12.
You will get 2 lots of emails for the purpose of testing your algorithm and a 3rd lot will be used for the final evaluation.
When you'll send the project via email it is paramount you respect the next rules:
the subject of the email should contain [SSOSM]
the project should have a ZIP archive as the attachment (not RAR, 7z)
the ZIP archive should contain a folder with the format "LASTNAME_FIRSTNAME" and inside this folder there should be a "main.py" file
project.zip/Gavrilut_Dragos/main.py
The "main.py" script should have support for arguments like:
main.py -info <output_file>
this parameter should write in the output file information about the project in a json like this one:
{
"student_name": "John Doe",
"project_name": "Asteroids",
"student_alias": "johndoe",
"project_version": "1.0.0"
}
main.py -scan <folder> <output_file>
this parameter will scan the <folder> (not recursively) and will write the result for each file on a line in the <output_file>
nume_fisier|inf or nume_fisier|cln are correctly formatted where cln = clean and inf = infected
The following are wrong:
nume fisier|cln (missing "_")
nume_fisier |cln (contains space)
nume_fisier | inf (contains space)
C:\cale\fisier\nume_fisier|cln (contains full path to file)
nume_fisier|abc (verdict can only be cln or inf)
Each lot contains 2 folders: Clean and Spam.
The file formats are either html or text, with or without encodings, the first line of the file being the emails subject followed by its content.
Ideally your algorithm should say that all the files in the Clean folder are "cln" and all the files from the Spam folder are "inf".
Please send your projects to asimion at bitdefender dot com.
The bonuses will be awarded for the first 3 students in the overall standings only if they have a a total score over 85 points.
You will get the First Lot on 25 Nov (week 9). You will have 4 weeks to improve your algorithm using this lot, and the project you will send me will be evaluated on the Second Lot.
You will get the Second Lot on 6 Jan (week 13). You will have 2 weeks to "train" your algorithm on this lot, and the project you will send me will be evaluated on the Third Lot. You will never have this lot as it is used for the final evaluation.
I will post here the standings with you and your projects every week.
In the final week I will do daily evaluations and in the final day there will be hourly evaluations.
I cannot stress this enough, please bug-proof your applications.