Because of very short development time, this script is designed for users who have prior experience in using command programs. If you do not have experience in working with the command prompt or command line programs, you may seek help from the ICT teacher or TSS of your school.
Best Wishes.
Fixed a bug in performing name matching where the name include a comma
Improved logging and error reporting
Some more bug fixed for using the mapping file
Fixed some encoding issues when reading CSV files generated from Excel encoded in UTF-8
Some minor bug fixes
Added to capability to onclude at most 4 attachments to each students
Some minor bug fixes
Added the capability to send emails to students through an GMail account as sender, including the attachment and individualized email message.
Added the capability to name the splitted files by extracted student names
Added the capability to name the splitted files by extracted student candidate number
Added the capability to name the splitted files by CUSTOMIZED filenames specified in a mapping file. Matching is done by the extracted HKID from the PDF and the HKID specified in the mapping file.
Added the capability to generate encrypted zip files
Added the capability to name the splitted files by extracted student HKID
Due to the pandemic, HKDSE 2020 results will probably be released to students through EMail or Intranet.
However, the PDF file obtained from HKEAA contains the result slips for all students and it is not tagged.
The script avaiable on this site aims to help teachers split the PDF obtained from the HKEAA.
In particular, the script can:
split PDF file from HKEAA and save each indiviudual page using the HKID / Candidate No of the students
put the splitted PDF file into AES encrypted ZIP file
send the file through GMail to the students as attachment, mail merge and individualized message is supported
THE SCRIPT IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
The script is written in Python 3 and tested on both Windows 7/10 and Mac OS. To use the script, Python 3 should be installed on the machine.
Installation of Python 3
Installation of Python 3 on Mac (Simply install the Python 3 executable, no need to install the virtual envrionment as we will be using venv instead)
Make sure Python 3 is installed and can be executed.
Type the following command in the Windows Command Prompt / Mac Terminal and check if python 3 is correctly installed.
python --version
The outputs should shows Python 3.X.X
To simplify setup, we will be using a virtual python environment for installation of the libraries and exection of the script. Each virtual environment has its own Python binary (which matches the version of the binary that was used to create this environment) and can have its own independent set of installed Python packages in its site directories. This allows installation of additional Python modules even without administrative rights.
To create a vritual environment, type the following command
python -m venv /path/to/new/virtual/environment
where /path/to/new/virtual/environment is a folder to be used to store all custom modules to be installed in this virtual environment. It can be any folder.
Example:
python -m venv d:\hkdse2020\venv
You may refer to https://docs.python.org/3/library/venv.html for more information.
Once a virtual environment has been created, it can be “activated” using a script in the virtual environment’s binary directory. The invocation of the script is platform-specific (<venv> must be replaced by the path of the directory containing the virtual environment):
C:\>\path\to\new\virtual/environment\Scripts\activate.bat
Example:
C:\>d:\HKDSE2020\venv\Scripts\activate.bat
source /path/to/new/virtual/environment/bin/activate
Example:
source /Users/mainuser/hkdse2020/venv/bin/activate
Once activated, the name of the virtual environment will be shown in the command prompt
The following libraries are used in the script:
PDFMiner3 https://pypi.org/project/pdfminer3/
PyZipper https://pypi.org/project/pyzipper/
They can be installed by the following commands:
PyPDF2
pip3 install --upgrade pypdf2
PDFMiner3
pip3 install --upgrade pdfminer3
YaGmail
pip3 install --upgrade yagmail
PyZipper
pip3 install --upgrade pyzipper
Docopt
pip3 install --upgrade docopt
Update Root Certificates (For OAuth2 Credential file generation)
pip3 install --upgrade certifi
jinja2 (For Mail Merge)
pip3 install --upgrade jinja2
The Script can be downloaded from here.
Save the script in a folder (e.g. D:\HKDSE\)
Usage:
hkdse-split.py split --sourcepdf=<sourcepdf> --outputfolder=<outputfolder> --hkid [(--generatezip --secretfile=<secretfile>)]
hkdse-split.py split --sourcepdf=<sourcepdf> --outputfolder=<outputfolder> --candidateno [(--generatezip --secretfile=<secretfile>)]
hkdse-split.py split --sourcepdf=<sourcepdf> --outputfolder=<outputfolder> --studentname [(--generatezip --secretfile=<secretfile>)]
hkdse-split.py split --sourcepdf=<sourcepdf> --outputfolder=<outputfolder> (--frommapping --mapfile=<mapfile>) [(--generatezip --secretfile=<secretfile>)]
hkdse-split.py sendfile --inputfolder=<inputfolder> --emailfile=<emailfile> --oauth2_file=<path_to_oauth2_creds.json> --sender=<senderemail> --templatefile=<template> --subject=<mailsubject>
hkdse-split.py setupemail --oauth2_file=<path_to_oauth2_creds.json> --sender=<senderemail>
hkdse-split.py (-h | --help)
hkdse-split.py version
Options:
-h --help Show this screen.
--sourcepdf=<PATH of Source PDF> Path to the HKDSE Results PDF file from HKEAA
--outputfolder=<Output Folder> Folder used to store the splitted PDF files
--hkid Use the extracted HKID as filenames for splitted PDF files
--candidateno Use the extracted Candidate No as filenames for splitted PDF files
--studentname Use the extracted Student Name (English) as filenames for splitted PDF files
--frommapping Use the FILENAME column specified in the map file as filenames for splitted PDF files
--mapfile=<mapfile> CSV file containing the HKID to FILENAME mapping
--generatezip Use Candidate Number as filenames
--secretfile=<secretfile> CSV file containing the HKID to SECRET mapping
--emailfile=<emailfile> CSV file containing the HKID to EMAIL and other merging variables used in the template mapping
--oauth2_file=<file> JSON file containinng the OAuth2 Credentials
--sender=<senderemail> EMAIL address of the sender
--templatefile=<file> The email message template file
--subject=<emailsubject> The email subject
Commands:
split split the HKDSE Results PDF file from HKEAA
sendfile Send the files stored in inputfolder through email to the students,
according to the filename stored in the mapping file
setupmail Setup OAuth2 for GMail (To be implemented)
Files will be splitted and named by the extracted HKID
python hkdse-split.py split --sourcepdf="D:\HKDSE2020\HKDSE.pdf" --outputfolder="D:\HKDSE2020\Output" --hkid
Files will be splitted and named by the extracted Student Name
python hkdse-split.py split --sourcepdf="D:\HKDSE2020\HKDSE.pdf" --outputfolder="D:\HKDSE2020\Output" --studentname
Files will be splitted and named by the extracted Candidate Number
python hkdse-split.py split --sourcepdf="D:\HKDSE2020\HKDSE.pdf" --outputfolder="D:\HKDSE2020\Output" --candidateno
Files will be splitted and named by FILENAME column in the mapping file matched by HKID
python hkdse-split.py split --sourcepdf="D:\HKDSE2020\HKDSE.pdf" --outputfolder="D:\HKDSE2020\Output" --frommapping --mapfile="mapping.csv"
The mapfile (e.g. mapping.csv) should contains AT LEAST 2 columns, HKID and FILENAME
Please note that the mapping file MUST BE a CSV file, EXCEL file is NOT currently supported.
Optionally, Encrypted Zip files can be generated. The password to the encrypted zip files will be taken from the secretfile specified.
python hkdse-split.py split --sourcepdf="D:\HKDSE2020\HKDSE.pdf" --outputfolder="D:\HKDSE2020\Output" --frommapping --mapfile="mapping.csv" --generatezip --secretfile="secret.csv"
The secret file (e.g. secret.csv) should contains AT LEAST 2 columns, HKID and SECRET
Please note that the secret file MUST BE a CSV file, EXCEL file is NOT currently supported.
The script supports sending email through GMail. To allow the script to send email through GMail. OAuth2 authentication is needed. After the setup, a file containing the OAuth2 token will be created.
Step 1: Go to the Google API Console, login with the email account to be used for sending email.
Step 2: From the project drop-down, create a new one by selecting Create a new project.
Step 3: Input the Project Name. The Project Name can be anything e.g. HKDSE SPLIT
Step 4: Click CREATE to create the project
Step 5: Once the PROJECT is created. We need to configure the Consent Screen. Select the OAuth Consent Screen menu from the left.
Step 6: Choose Internal for the User Type and Click CREATE
Step 5: Once the PROJECT is created. We need to configure the Consent Screen. Select the OAuth Consent Screen menu from the left.
Step 6: Choose Internal for the User Type and Click CREATE
Step 7: Input the Application name. It can be anything e.g. HKDSE SPLIT
Step 8: Once the Consent Screen is created. We can create the OAuth Credentials. Click the Credentials menu from the left.
Step 9: Click +CREATE CREDENTIALS from the top menu and choose OAuth client ID
Step 10: Choose Desktop app as the Application type and leave the Name as the default value.
Step 11: Click CREATE to create the client ID
Step 11: Copy the Client ID and the Client Secret to a file for later use.
Step 11: Execute the following command (all in one line) to generate the Credential file
python hkdse-split.py setupemail
--sender=sender@yourdomain.com
--oauth2_file="oauth2.json"
Step 12: Follow the instructions to complete the authentication and generate the Credential file. Note that the input will NOT will displayed on the screen. No worry if you cannot see anything you typed.
The script supports simple mail merge function.
Template variables are enclosed using the curly braces {{ }}.
Example template file (template.txt):
Dear {{LASTNAME}}, {{FIRSTNAME}},
Attached is a zip file containing the HKDSE Result file.
The password to open the zip file is the same as your HKID (e.g. A1234567).
Attachment: {{FILENAME}}.zip
Best Wishes,
Your Teacher
When performing mail merge, you must also specify the data source file containing at least the following fields:
EMAIL - The email address of the recipient
FILENAME - the file name of the attachment (e.g. A1234567.zip)
Other fields to be merged. Please note that all field names are case-sensitive and SHOULD ONLY contains alphabets and digits (no spaces or special character)
Example data source file (email.csv)
HKID,LASTNAME,FIRSTNAME,EMAIL,SECRET,CLS,CLSNO,FILENAME
Y0583437,CHAN1,JACKY1,huichunkit+01@gmail.com,1NAHC1YKCAJ,6A,1,6A01_CHAN1_JACKY1.pdf
Y4848275,CHAN2,JACKY2,huichunkit+02@gmail.com,2NAHC2YKCAJ,6A,2,6A02_CHAN2_JACKY2.pdf
To execute mail merge, use the following command (all in one line)
python hkdse-split.py sendfile --inputfolder=students/pdfs --emailfile=<email.csv> --oauth2_file=oauth2.json --sender=<youremail@yourdomain.com> --templatefile=<template.txt> --subject="HKDSE Result"
where
--sender specifies the sender email address, must be the same as the email address used to create the Google OAuth2 project
Please NOTE that Google imposed a limit of sending emails through SMTPS. Please refer to here for more details.
For GSuite EDU users, you may want to take a look at the following limits
Mutliple attachments (up to 4) can be sent to the students.
To send multiple attachments:
Put the files in specified in the inputfolder parameter (same as the original PDF / ZIP)
Add the columns FILENAME02, FILENAME03 and FILENAME04 to the CSV file specified by the emailfile parameter
You may try to reach me at huichunkit@gmail.com but honestly, no guarantee for anything. I am sorry for that.