There are two structures of CFTR, 5uak (close state) and 6msm (semi-open state). We calculated distances of residue pairs in 5uak and 6msm, and the difference of distances indicates the movement of these two structures. We can use this information to find out promising residue pairs that can form linker to lock the protein in one state. Linkers are often composed of flexible residues so that the adjacent protein domains are free to move relative to one another.
To be clearer, the distance difference, Δd is equal to the distance of one residue pair in 6msm minus the distance of the same residue pair in 5uak. Then a positive difference indicates a larger distance for the residue pair in 6msm (semi-open) than 5uak (closed). To lock the protein into the semi-open state, we can prevent the amino acids in the residue pair from moving away (in the case of negative Δd) or from moving toward each other (in the case of positive Δd) by connecting them with a molecular linker.
Since calculating distances of all residue pairs in the whole protein takes a long time and the channel is in extracurricular region, we could make a list that contains residues in extracurricular region and calculate distances of residue pairs in the list. Also, there are missing residues in pdb file, and there is no coordinate information for missing residues, so we remove those missing residues from extracellular residues list. A list is represented as [num1, num2], which means numbers from num1 to num2.
Extracellular residues include: [81, 138] [195, 241] [308, 351] [860, 932] [991, 1034] [1103, 1150]
Missing residues in 6msm.pdb file: [410, 434] [638, 844] [890, 899] [1174, 1201]
Missing residues in 5uak.pdb file: [1, 4] [403, 438] [646, 843] [884, 908] [1173, 1206]
We removed residues [890, 899] [884, 908] from the extracellular list.
Then we got a final list: [81, 138] [195, 241] [308, 351] [860, 883] [909, 932] [991, 1034] [1103, 1150]
We know that the cross linking span of linkers are limited. If the distance of a residue pair is greater than the limit, we can tell the residue pair cannot form a linker and we are not going to force the residue pair open. We set the limit of distance between a residue pair to 20 Å.
There are 41616 residue pairs in extracellular region. The number of residue pairs whose distance is less than the limit is 11360. In those 11360 residue pairs, there are 4664 positive Δd and 6696 native Δd.
Here are the statistics of Δd:
Mean of Δd: -0.1874881
Standard deviation of Δd: 1.899162
We want to narrow down the range of residue pairs that can possibly bind and lock the protein into open state. Since two residue pairs that contain sulfhydryl group can possibly form disulfide bonds. I mutated all possible residue pairs in the extracellular region to Cysteine and calculated the distance of the sulfhydryl group. If the distance of one residue pair is too far, it probably will form two linker which is not we want. I also calculate the distance difference of SG in 6msm and 5uak to see the movement of residue pairs.
Considering the cross linking span, we set the cutoff of distance to 13Å, if the distance of a residue pair is larger than 13 Å, we don’t consider this pair promising. If the residue pair forms linker and moves, the distance of this residue pair will change. Since the distance of the residue pair might get larger or smaller, we set the cutoff of the absolute difference of distance to 5 Å. In other words, if the residue pair moves than 5 Å, we consider it as promising.
We found that there is 89 promising residue pairs. But 89 pairs is still too much, there might be other factors that contribute to narrowing down promising residue pair. Furthermore, we found residue pair 104 and 116 is not on the list, which means we needed to change our cutoff. Since the distance of residue pair 104 and 116 in 6msm is 4.68Å, in 5uak is 15.6Å, the distance difference is -10.94Å. We set the cut off of distance to 16Å and the cut off of absolute difference to 5Å. Then we got a list contains 264 residue pairs.
To form disulphide bond, the residue pair has to be in some kind of solvent. If we get the list of residues that are solvent available, we can narrow down the promising residue pairs list.
Tina used Pymol to check if a residue is solvent available or not, then I got a residues list that are extracellular and solvent available. Do the same to the semi-open state and true open state.
Here is the residue list:
[100, 121] [216, 221] [328, 337] [883] [909, 912] [1012, 1014] [1119, 1128]
I mutated the residue pairs in this list and calculated the RMSD of sg. Then I used R language to filter out residue pairs that meet our requirements (distance 16 Å and Δd < -5 or Δd > 5).
I loaded the model, mutated one residue pairs to cysteine in the model, calculated the sg distance of the residue pairs. Then I reloaded the model, mutated another residue pair and calculated the sg distance. There is the histogram of distance difference of open model and 6msm.