We invite you to consider a more accurate approach to data for training AI to calculate protein structures
We are pleased to present our method for accurately calculating 2D protein structures from nucleotide sequences (input data is entered in .fasta and .fa formats).
Source data taken from X-ray diffraction and NMR:
1) They do not distinguish fragments of fundamental helices (Alpha, Beta, Pi, 310) in the protein structure, super secondary helices from fundamental helices, etc.
2) The error along the 1D structure is plus or minus 5 amino acid residues.
3) The length of "invisible tails" in experiments is 30-50 amino acid residues.
4) There is virtually zero certainty for 97% of proteins that do not crystallize, so there is no data for training AI.
We have solved the problem of how to eliminate the problems of interpreting X-ray diffraction and NMR data.
Within our Protein Picotechnology method:
- 2D structure calculation is accurate and automated.
- The 2D Protein Picotechnology diagram displays all types of helical regions of the protein chain, individual turns, and single amino acid residues.
- The third letter of the triplet controls the rotation angle of the next amino acid residue relative to the previous one.
- There are four types of angles, and they correspond to the assembly angles of the fundamental helices.
- A 3D Genetic Code table for 2D structure calculation has been published.
- Programs for 3D structure calculation are in development.
We hope this will work.
More than 100,000 structures are located in the PDB database and are just a dozen radiographs on the Internet. Why?
An X-ray diffraction pattern is a two-dimensional image of a limited fragment of a three-dimensional space in which a crystallized protein molecule is inscribed. This two-dimensional image is represented on the radiogram as a combination of visible fragments, in graphic perspective
Signature under the roentgenogram: This is the radiograph of a protein crystal.
Radiogram of protein phosphorylase
Roentgenogram of lysozyme
Areas of increased density (dark points on the radiograph of X-ray structural analysis) are located where groups of atoms are most closely grouped, for example, amino acid residues.
The average density is displayed in gray, so it is not visible on the gray background of the radiographs - there is no contrast.
Low density areas are located where the density of atoms is lower than the average of the crystal.
The beta helix has approximately average density, the strength of the relative position of the peptide groups. In the alpha helix of the peptide groups, they are located more compactly, so the alpha helix is denser, and it is “visible” by X-ray structural analysis.
The alpha helix is the most dense protein structure. The 310-helix is more compact, but less dense. Pi-helix more " smeared", but the density is practically no different from the alpha-helix.
Protein helices may be at different angles to the "guides" of a two-dimensional image, which is a photograph of the volume in which the crystal is enclosed - images in perspective. How does their visibility depend on these angles? If in a crystal all the protein molecules are oriented the same way and have a strictly identical form, then the diffraction from the helical sections develops and a clear overall picture is obtained. If the orientation of the helix sections is different, then a clear picture is blurred in a gray spot "on the whole screen", as is the case with collagen.
Can there be alpha helix - high, medium and low density in different images - depending on the angle of rotation of the crystal to the beam, etc.? In different angles, the helicesl areas are visible in different ways. But the essential parameter is their mutual orientation. If all molecules are oriented in the same way and have the same structure, then each helix section in all molecules is oriented in the same way, therefore it is clearly recorded on X-ray. If, as in the collagen molecule, a large number of helical sites are oriented differently, then even the same orientation of the collagen molecules does not allow to notice the helical sites, because the diffraction from each of them in the amount gives a uniform gray «noise».
Because of the same density levels, are alpha and pi helix often confused? Almost always. Therefore, today the PDB does not specify at all what kind of helix. The helixl and that's it.
And is there even the slightest possibility of visual distinction between alpha and pi helix on a radiograph? No, the X-ray structural analysis "sees" only the glare reflected from the high-density portions of the crystal. So the helix turns are not visible.
And on the Picotechnology models you can see these differences .
Radiogram of the bacterial protein crystal - the reaction center of photosynthesis
Radiograms of collagen fibrillar protein:
a - diffraction pattern of normal fiber; the position of the lines reflects the geometry of the three-helix collagen molecule;
b - diffraction pattern of fiber isolated from pathologically altered tissue; arrows indicate additional reflexes arising from biochemical changes in tissue.
If you type in the X-ray Protein X-ray, you will see a dozen of real X-rays. And the number of structures in Protein Data Base is over 100,000. Where are all the X-ray photographs of these proteins? Why for many decades only a dozen radiographs have leaked onto the Internet, i.e. 0.01% of the declared amount?
How on the roentgenogram are “bold black points” identified with certain biochemical structures? On the X-ray structural analysis, only fragments of the spirals are visible. But there is a nuance. The whole protein can be the twist of a single alpha 310 helix hybrid. Or a sequence of alpha and pi-helixes. Or one program helix or all possible combinations of them. Therefore, X-ray structure analysis specialists are puzzled every time they see the next spot on the radiograph.
Why are beta helices not visible in X-rays?
Why alpha helix can be seen clearly, partially visible or not visible at all?
Why can alpha be confused with the 310 helix?
X-rays notice the different density of the crystal. The alpha-helix and pi-helix density is increased, so these helices leave traces on the radiogram. In the beta helix, the density is almost the same as the average for the crystal.
However, if the sections of the helices in the crystal are oriented differently, then the radiograph is illuminated uniformly, as in the case of collagen.
So far, in addition to collagen, software coils in PDB have not been detected, and you have already seen a radiogram of collagen.
Naturally, neither alpha nor pi helices can be seen in the collagen crystal.
This is due to the fact that the helices portions of neighboring molecules have different orientations in space, therefore the X-ray pattern is “smeared” uniformly.
As for the 310-helix, its density is slightly different from the density of the alpha-helix, so radiologists often confuse them.
How generally can we distinguish the types of helices on the radiogram of X-ray structural analysis? The 310-helix has 3 amino acid residues per turn, and the alpha-helix is 4. Sometimes, specialists in X-ray structural analysis find it possible to notice the difference in the density of these helices, which is 3/4. And the pi-helix from the alpha-helix is usually impossible to distinguish, since they have almost the same density. The beta helix is not visible at all. Software helices can coincide in density with the fundamental. Then, X-ray structural analysis specialists will take the program helix as fundamental. And if the parameters are very different, for example, as in the program 775-helix of collagen, then the researchers suggested that this is a triple helix. X-ray analysis can not verify this, because "sees" ... look at the image above.
Note the sequence QQQQQQQQQQQQQQQQQQQQQ.
Radiologists do not see that this is a straight alpha helix. And the 2D PicoTeсh diagram shows it
How is the image of a 3D structure constructed on X-ray analysis based on a data about a 2D structure?
And what do they do with "undefined" sites? How do they paint?
The answer is predictive, presumably and far from always accurate.
Previously, the amino acid sequence was determined by biochemists. Now using the Picotechnology method, it can be determined from the table of the usual genetic code, if the nucleotide coding sequence of the protein is known.
The secondary structure is still being tried to be determined by the homology method, i.e. library already defined by other methods of structures.
This method does not give reliable results, which is connected just with the composite genetic code used in Picotechnology of proteins, which experts have not noticed for 26 years since its discovery.
Today in the X-ray structural analysis method they look at the roentgenogram and try to guess, from what can the next X-ray be reflected there? Construct a model that could explain the radiograph. But this is a speculative model, predictive, not guaranteeing the accuracy of stereometric construction.
In the case of extra long helices, the radiograph is predictable and can be modeled on a computer. What superlong helixes you have studied can give an X-ray image predicted in this way? See Antihomologues.
Considering that there are more than 100,000 structures in a PDB, there must be at least 100 proteins with software helices. But I still could not find them. Also, the search system is far from perfect. Either he refuses to search at all, or he finds something that does not look like a sample.
X-ray structural analysis does not see clearly , with absolute certainty, helix sections.
The fact that X-ray diffraction analysis does not distinguish the alpha-helix from the pi-helix can be seen from the PDB standard.
In the PDB database today, alpha-pi-310 is no longer being written. They simply write "helix" or "not helix".
Drawing a 3D model from PDB (free hand) :
The secondary structure, determined according to the table of the composite genetic code by the Picotechnology of proteins method, displays all types of helical regions, turns, and single amino acid residues that make up the protein chain
Examples of software helices
Using the example of this software 811141111-helix (built to music with a 9-beat size), it is clearly seen that, along with the alpha-helix code (red), the 310-helix code (orange) is very important. This code performs an additional rotation of the helix around its own axis of symmetry by an angle that distinguishes the 310-helix from the alpha-helix. In the elements of the software helix, the amino acid residue L occurs with the code of the alpha helix and 310 helix.
<!-- Google tag (gtag.js) -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-4DL5FWRQ35"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-4DL5FWRQ35');
</script>