GPCR with G protein heterotrimer, through which GPCRs signal.
The following GPCRs experimentally shown to function in yeast were analyzed: NTR1, SSR2, MC4R, FPR1, ADRB2, V2R, CXCR2, GLP1R, and C5A. There are solved, inactive structures for V2R, CXCR2, GLP1R, and C5A. Unfortunately, as most researchers don't publish negative results, only two receptors were analyzed that are known to not function in yeast- OPRM and CNR1. The S. cerevisiae GPCRs Ste2 and Ste3 were also analyzed to provide an additional criteria for comparison.
MUSCLE (https://www.ebi.ac.uk/Tools/msa/muscle/) was used to generate a multiple sequence alignment for all of the GPCRs in the data set (9). Percent Identity Matrices (calculations of the percent identity of protein pairs in the alignment) were used to determine proteins with top sequence similarity.
The server at http://evfold.org/evfold-web/evfold.do was used to generate evolutionary coupling (EC) values for all proteins in the GPCR data set. EC scores are derived from multiple sequence alignments by determining evolutionarily constrained pairs of residues. These values were then used to generate 3D structures, and mutation values, as explained in the EV Fold and EV Mutation sections.
Shown at left are the high-ranking EC scores for NTR_1. Red contacts are predictions using the global model, blue contacts are local predictions, and black are experimentally known contacts.
Predicted 3D structures were derived from EV Fold analyses generated on the http://evfold.org/evfold-web/evfold.do server. EV Fold uses EC scores generated from the EV Couplings analyses. The strategy translates residues into atomic coordinates by constraining high scores in terms of distance via standard distance geometry algorithms (2).
Shown at left is the top predicted structure for SSR5.
GPCRs in the data set were, again run on the http://evfold.org/evfold-web/evfold.do server to generate EV mutation analyses The strategy uses EC Couplings scores to predict effects of all possible substitutions at each amino acid position (Figure 2). Output is a list of scores for each of these possibilities, with more negative scores predictive of more deleterious substitutions. For each GPCR, individual substitutions were summed to generate a single score for each position. These scores were ranked, and location was determined for the predicted five most deleterious mutations via three-dimensional structure generated by EV Fold.
Hopf et al., “Mutation Effects Predicted from Sequence Co-Variation.”
Figure 2. Overview of EV mutation strategy. Independent models treat each amino acid individually, while the EV Mutation strategy uses a global approach that treats residues as dependent on each other to predict residues that are likely to be causative of scores (10).
The following code was used to read EV mutation csv files, sum the scores at each position and rank sums.
import pandas as pd
from collections import Counter
import glob
allEpis={}
for infile_loc in glob.glob("/Users/colleenmulvihill/Desktop/mutation_matrices/*.csv"):
df=pd.read_csv(infile_loc)
epi=Counter()
for i in range(min(df['pos']),max(df['pos'])+1):
epi[i]=sum(df[df['pos'] == i]['prediction_epistatic'])
epi=epi.most_common()
temp=infile_loc.split("/")[-1]
temp=temp.split(".")[-2]
allEpis[temp]=epi
PDB files generated by EV Fold were analyzed using Chimera.
The match align tool on Chimera was used to create alignments based on the distances between alpha carbons of the superimposed GPCR structures. Residue-residue distance cut off was set to 5.0 Angstroms.
Shown at left is image of predicted GPCR structures overlaid in Chimera.