Between IgG subtypes, we discovered slight differences in the constant heavy regions of the amino acid sequence of the IgG antibody. To distinguish between the four IgG subtypes (IgG1, IgG2, IgG3, IgG4), our code evaluates the hinge and constant heavy 2 (CH2) region of the IgG antibody - amino acids 226 to 238.5 The amino acids and their positions in the constant heavy region are well-documented in scientific literature.6
To verify the written code, we identified and ran 17 therapeutic antibodies approved or in review by the FDA.7,8 Since these antibody therapeutics have been well-studied, we compiled data into a document containing their IgG subtype, therapeutic goal, heavy chain sequence, and effector function from DrugBank online, KEGG: Kyoto Encyclopedia of Genes and Genomes, and the NIH.9,10 Then we compared the output of the written code with the actual data
From the literature, we were able to compile the corresponding Fcy receptors for each of the IgG subtypes and their respective affinity values for each receptor.11 Based on the expression of each Fcy receptor on various immune cells, we were able to gather each immune cell and its respective effector function.12,14
(*Please refer to Appendix for our complete code*)
Biopython
We used Biopython's Bio.SeqIO module to read fasta files. Our code iterates through all the antibody sequences in the file and uses the function Bio.SeqIO.parse() to create a SeqRecord object for each antibody. By creating these objects, we can easily access the antibody name by using the function "ObjectName.description" and the sequence by using "ObjectName.seq".
Pandas
We relied on Panda data frames to organize our data into rows and columns. We created two data frames ( called "info_df" and "df") by using the pd.DataFrame() function; these data frames are used to create two tables (in a separate Excel sheet) within an Excel file.
We created the "df" data frame from a dictionary: the "fcgr_function1" dictionary for full-sized antibodies and the "fcgr_function2" dictionary for non-full-sized antibodies. Both dictionaries contain Fcy receptor data; however, the values for all keys in the fcgr_function2 dictionary are empty as small antibodies do not bind with Fcy receptors.
To update a data frame, we used the pd.concat() function.
To write our data frame into Excel sheets, we used the pd.ExcelWriter() class.
The following line allows us to create a new Excel file with the antibody name/description as the file name, as well as create a writer object:
"with pd.ExcelWriter(str(header_line) + '.xlsx') as writer:"
We used "info_df.loc[ , ] =" to populate a specific location in the "info_df" data frame, and then used the ".to_excel(writer, sheet_name='info', index=False)" function to write the info data frame to Excel (which contains "Header", "Category", "HC Constant Region Length", and "HC Constant Region Sequence" columns).
Similarly, we used "df.to_excel(writer, sheet_name='data', index=False)" to write the df data frame (containing the "Name", "Function", "Affinity (1e6 M-1)", "Cell Distribution", and "Effect" columns) to an excel sheet called "data".
Note that we specified "index = False" within our data frames so that an index column would not appear in our Excel sheets.
Among the five naturally occurring antibody isotypes, the IgG antibody isotype is the most abundant in human serum and the most popularly used in developing antibody therapeutics.15,16 Given that the subtypes of IgG antibodies primarily vary at the hinge region, with different lengths and various disulfide bonds, we used this region to distinguish the subtypes.4 The peptide sequence from the 226th to the 238th amino acid spans across the end of the hinge region and the beginning of the constant heavy 2 (CH2) to classify antibodies into their respective classes.5
Some antibody therapeutics are not full-length antibodies, only containing the antigen-binding fragment(s) (Fab, or F(ab)'2). These smaller-sized antibodies remain binding to the prospective antigens to neutralize them but do not elicit an effector response.5,17 Therefore, we measured the length of the constant regions in the heavy chain and then eliminated sequences that did not constitute full-length antibodies.
Given the information gathered from the 'Data Curation' section, we created a system that links all the information to the IgG subtypes and outputs the IgG subtype class, the Fcy receptors it binds to, the affinity values for the IgG-FcyR pairings, the immune cell types expressing the different Fcy receptors and their effector functions in a table format.