Evaluation of genetic diversity

Protein sequences of the prioritized proteins were extracted from 12 T. cruzi annotated proteomes which were aligned to predict conserved regions (Supplementary Table 13). Five proteins including DNAJ chaperon protein, subtilisin-like serine peptidase, DGF-1, MASP, and trans-sialidase showed strong homology (above 80%) across 12 different strains of T. cruzi. Evolutionary distance (p-distance) for DNAJ protein among 12 strains of T. cruzi was estimated to be 0.005 and across species 0.746 and for trans-sialidase 0.234 across strains and 0.795 across species. Evolutionary divergence of all 150 sequences available for trans-sialidase protein from T. cruzi CL Brener was estimated to be 0.616 (Supplementary Table 14). Estimates of evolutionary divergence between sequences and the number of amino acid differences per site among sequences are shown along with standard error estimated in (Supplementary Table 14). The less divergent sequences make it evident that the predicted epitopes can serve well as broad spectrum peptide vaccines against this pathogen.

To validate their selection as an antigenic epitope and their potential role to serve against most common strains of T. cruzi, regions underlying epitopes were marked out among the aligned sequences and a consensus sequence was derived. All of the prioritized proteins showed almost conserved patterns of the epitopes. A 15-mer DNAJ antigentic HTL epitope predicted within T. cruzi CL Brener bears the sequence of "TGVSKNGRQLRVSGK". When checked across sequences having homologous patterns among 12 T. cruzi strains, 100% conservation was observed across species (see Fig. 14). Being conserved it also maintained its antigenicity and virulence (Supplementary Table 15) thus having potential to evoke immune response efficiently within the host. Consensus sequence of the predicted CTL epitope from trans-sialidase protein sequences (Fig. 14) is ‘SSDADPTVV’ which was also identical to the previously prioritized strain specific epitope. Sequences of all 8 prioritized proteins from T. cruzi CL Brener were aligned against their homologous sequences among other annotated strains of T. cruzi. The regions underlying the epitopes were found to be conserved. DNAJ CTL epitopic region being virulent

and antigenic among different strains produces a consensus epitope sequence as "KTGRNGDMY" (Fig. 14). Similarly, consensus sequences derived from aligned regions of DNAJ BCL, trans-sialidase BCL and HTL epitopes were "VHINLKQ", "SLWSVRL" and "MLVGKYSRNAAAGXQ", respectively. In trans-sialidase HTL epitope, one residue shows substitution (Fig. 14) Glutamine replaces Arginine in the consensus sequence, but both amino acids being amphipathic [Betts et al 2003] do not make such a drastic change after substitution.

Epitope conservancy analysis using IEDB also showed that almost all the predicted epitopes were conserved across different strains of T. cruzi (see Table T4) (Supplementary Table 16).

To cross-check the conservation of epitopes among various strains of T. cruzi, epitopes were also checked for their homology against prioritized proteins from different strains and species (see Supplementary File 4) (Supplementary File 4). Conserved epitopes not only validate the selection procedure of the designated pipeline but also support the ideas of using these epitopes as a broad-spectrum multi-peptide vaccine.

Fig. 14: Aligned regions showing conserved epitopes among various strains of T. cruzi. Individual targeted proteins (DNAJ chaperon and trans-sialidase proteins) among the 12 strains are aligned using CLC Main workbench and the regions with conserved epitopes (sequences) have been demonstrated in the red boxes. (a) DNAJ BCL epitope; (b) DNAJ CTL epitope; (c) DNAJ HTL epitope; (d) trans-sialidase BCL epitope; (e) trans-sialidase CTL epitope; and (f) trans-sialidase HTL epitope.