The new disgusting bug of x86 CPU

x86 SUPER BUG. x86 is fooling ALWAYS.

NEW-OLD-ETERNAL, stupid BUG EACH x86. (c) Oleg E. Tereshkov 2014

This page was made with translate.google.com help.

No slavery hopeless, than the slavery of those slaves, who himself considered free from the shackles. Goethe.

Iczelion: "Welcome to a world where the programmer - the owner of computer and not the other way around.""Well, well,"- I said.

I am glad to welcome you, colleagues. Many thanks to all, who found for themselves an opportunity to familiarize with my modest, unexpected, and sad discovery.

In February 2015, this work was proposed to senior management INTEL, AMD, and the Editorial Board of the russian journal Hacker. But this kind of purchase luminaries are not interested. It has been as much as 9 months. I think it is enough to calve them, well, at least something, finally. No - so no. Article yours. Meet.

Instead of preface. It's easy to blame everything on the fogs in Bryansk forests, sudden snow machines on take off strip, carotid air traffic controllers, drunken captains cruise ocean liners, Russian recidivists, Ukrainian nationalists, Donetsk separatists allegedly shot down Malaysian Boeing. More easier to blame everything on is already dead, allegedly demented French pilots and the same remaining on the land allegedly demented their girlfriends. But it is very hard to recognize the real defects of the real processor x86, built around and uncontrollably replicate millions of copies.

So let us, brothers, to complete fuck x86 processor, from INTEL and AMD, and the Russian Hacker magazine, envy, let bites its elbows, and all that only stretches in itself.

In short, the new-old-eternal, discovered me, stupid bug x86 is sharp, up to 94 times, slowing the program from only one change order / prioritization location variables when you declare them in the text of the source code and / or the values of these variables themselves. What normally should not be.

For a normal CPU, memory cell № 1 and № 2 are no differ from each other. For x86 - different. To normal CPU is not important whether the cells memory № 1 and № 2 with number 1,2,3 or 255. For x86 - has.

That formal example:

temp1=1

temp2=2

temp3=3

and

temp1=1

temp3=3

temp2=2

OR:

temp2=2

temp1=1

temp3=3

and

temp2=2

temp3=3

temp1=1

OR:

temp3=3

temp1=1

temp2=2

and

temp3=3

temp2=2

temp1=1

For you, all of this - the same things. And for me. But for x86 - no. In all six cases, the speed of the subsequent one-the-same program can be another.

Or if:

temp1=1

temp2=2

then temp1 + temp2 on x86 will run for one time, and temp2 + temp1 - for another. Even more, the difference will be if change temp1 = 2 and 1 = temp2.

Do not believe?

Here the prototype in C:

// Speed_max.c

//(c) Oleg E. Tereshkov 2014

#include <stdio.h>

#include <conio.h>

//-------------------

int temp0,temp1,temp2,temp3,\

temp4,temp5,temp6,temp7,\

temp8,temp9,temp10,temp11,\

temp12,temp13,temp14,temp15,\

temp16,temp17,temp18,temp19,\

temp20,temp21;

//----------------------------------

//-------------MAIN-----------------

//----------------------------------

int main (void){

L1: temp11=1;temp12=5;temp13=1;

temp14=4;temp15=2;temp16=3;

temp17=4;temp18=2;temp19=5;

temp20=3;temp21=2;

temp1=temp2=temp3=temp4=\

temp5=temp6=temp7=temp8=\

temp9=temp10 = 0x7fffffff;

temp0=clock();

while (temp10 > 0) {

temp1=temp1 - temp11;

temp2=temp2 - temp12;

temp3=temp3 - temp13;

temp4=temp4 - temp14;

temp5=temp5 - temp15;

temp6=temp6 - temp16;

temp7=temp7 - temp17;

temp8=temp8 - temp18;

temp9=temp9 - temp19;

temp10=temp10 - temp20;

//----(0)merry-go-round(0)----

temp21=temp20;temp20=temp19;

temp19=temp18;temp18=temp17;

temp17=temp16;temp16=temp15;

temp15=temp14;temp14=temp13;

temp13=temp12;temp12=temp11;

temp11=temp21; }

temp1=clock();

temp1=temp1 - temp0;

printf("%d\n", temp1);

goto L1;

}

A simplest, intuitive algorithm, which, I am sure, no needs any comment, and comments by itself.

----------------cut+line_1---------------

But in order to further eliminate any disputes associated with non-obvious features of a specific compiler C, especially for all of you, I rewrote this simple program in assembler Fasm:

;Programm Speed_me_1.asm for Fasm

;(c) Oleg E. Tereshkov 2014

;//-------------macro-------------//

macro library [label,string]

{ forward

local _label

dd 0,0,0,rva _label,rva label

common

dd 0,0,0,0,0

forward

_label db string,0 }

macro import [label,string]

{ forward

local _label

label dd rva _label

common

dd 0

forward

_label dw 0

db string,0 }

;//------------endmac-------------//

format PE console

entry start

;//-------------code-------------//

section '.code' code readable executable

start: nop

lbl0: mov eax, 7fffffffh

mov [temp1], eax

mov [temp2], eax

mov [temp3], eax

mov [temp4], eax

mov [temp5], eax

mov [temp6], eax

mov [temp7], eax

mov [temp8], eax

mov [temp9], eax

mov [temp10], eax

call [clock]

mov [temp0], eax

lbl1: cmp [temp10], 0

jle lbl2

mov eax,[temp1]

sub eax, [temp11]

mov [temp1], eax

mov ecx,[temp2]

sub ecx, [temp12]

mov [temp2], ecx

mov edx,[temp3]

sub edx, [temp13]

mov [temp3], edx

mov eax,[temp4]

sub eax, [temp14]

mov [temp4], eax

mov ecx,[temp5]

sub ecx, [temp15]

mov [temp5], ecx

mov edx,[temp6]

sub edx, [temp16]

mov [temp6], edx

mov eax,[temp7]

sub eax, [temp17]

mov [temp7], eax

mov ecx,[temp8]

sub ecx, [temp18]

mov [temp8], ecx

mov edx,[temp9]

sub edx, [temp19]

mov [temp9], edx

mov eax,[temp10]

sub eax, [temp20]

mov [temp10], eax

;---(1)merry-go-round(1)---

mov eax,[temp20]

mov [temp21], eax

mov ebx,[temp19]

mov [temp20], ebx

mov edx,[temp18]

mov [temp19], edx

mov ecx,[temp17]

mov [temp18], ecx

mov eax,[temp16]

mov [temp17], eax

mov ebx,[temp15]

mov [temp16], ebx

mov edx,[temp14]

mov [temp15], edx

mov ecx,[temp13]

mov [temp14], ecx

mov eax,[temp12]

mov [temp13], eax

mov ebx,[temp11]

mov [temp12], ebx

mov edx,[temp21]

mov [temp11], edx

;---(1)merry-go-round(1)---

jmp lbl1

lbl2: call [clock]

sub eax, [temp0]

push eax

push temp_s

call [printf]

add esp, 8

jmp lbl0

;//-------------idata-------------//

section '.idata' import data readable writeable

library crtdll, 'crtdll.dll'

crtdll:

import printf,'printf',\

clock, 'clock'

;//-------------data-------------//

section '.data' data readable writeable

temp0 dd 0h

temp1 dd 0h

temp2 dd 0h

temp3 dd 0h

temp4 dd 0h

temp5 dd 0h

temp6 dd 0h

temp7 dd 0h

temp8 dd 0h

temp9 dd 0h

temp10 dd 0h

temp11 dd 1

temp12 dd 5

temp13 dd 1

temp14 dd 4

temp15 dd 2

temp16 dd 3

temp17 dd 4

temp18 dd 2

temp19 dd 5

temp20 dd 3

temp21 dd 2

temp_s db 25h,64h,0ah,0

Of everyone work? I hope. Carousel can be rewritten this way:

;---(2)merry-go-round(2)---

mov eax,[temp11]

mov [temp21], eax

mov ebx,[temp12]

mov [temp11], ebx

mov edx,[temp13]

mov [temp12], edx

mov ecx,[temp14]

mov [temp13], ecx

mov eax,[temp15]

mov [temp14], eax

mov ebx,[temp16]

mov [temp15], ebx

mov edx,[temp17]

mov [temp16], edx

mov ecx,[temp18]

mov [temp17], ecx

mov eax,[temp19]

mov [temp18], eax

mov ebx,[temp20]

mov [temp19], ebx

mov edx,[temp21]

mov [temp20], edx

;-------(2)merry-go-round(2)-------

Run time will not change and will be approximately 13350ms. From now on, I will allow myself to specific numbers. The time everywhere in milliseconds, the processor - AMD Athlon (tm) XP 2500+ 1.83Ghz, memory - 400Mhz, system - XP3.

(For the gifted, but not particularly burdened with unnecessary brain bashers morons, just in case, I note that the above herein timing measurements are only valid for my system. You will need them to be different, it is - nothing meaningless details. But the proportions and trends for (single-core and virtually all dual-core) x86 processors remain completely. And it is - is really important.)

If the carousel rewritten as:

;-------(3)merry-go-round(3)-------

mov eax,[temp21]

mov [temp20], eax

mov ebx,[temp20]

mov [temp19], ebx

mov edx,[temp19]

mov [temp18], edx

mov ecx,[temp18]

mov [temp17], ecx

mov eax,[temp17]

mov [temp16], eax

mov ebx,[temp16]

mov [temp15], ebx

mov edx,[temp15]

mov [temp14], edx

mov ecx,[temp14]

mov [temp13], ecx

mov eax,[temp13]

mov [temp12], eax

mov ebx,[temp12]

mov [temp11], ebx

mov edx,[temp11]

mov [temp21], edx

;-------(3)merry-go-round(3)-------

miracles begin. And for C-Compiler, too. Run time immediately increase to 33500 - in 2.53 times. If this is not evident in your eyes, in this embodiment, the variable temp11-temp21 filled number 2. The arithmetic average temp11-temp20 initially is equal to 3. 3/2 = 1.5 and run time should be 13250 x 1.5 = 19875. And we have - 33500. I think that you do not have a reasonable explain why.

Let's assign temp21 = 3. And make another run for the (3)merry-go-round(3). Time is 22250. Not 13350. Why? My answer - I revealed a new-old, birth defect, a bug of processor x86. So rude - with a guarantee, it manifests itself in all x86 single-core models, without exception, up to Pentium Mobile (check), and most dual-core. It is called - FAILURE O.E. TERESHKOV for processors x86.

Some say that these calculations are illegal. And in such long and entangled loops can be difficult detected the cyclic trim and matching.

Good. Assembler - not C. In the assembler FASM, everything is obvious. Let's go back to the first option (1)merry-go-round(1), but for all variables temp11-temp21 assign the value 3. Result - 13250 ms. Not 22250.

I am right, runtime on x86 processors in unpredictable manner depending on the magnitude and location location-specific variables in particular, finished the final code. Which is completely unpredictable, in this relation, may differ from what is written in the source. Especially in the case of the C compiler, etc.

If the speed of program execution in times dependents on the order location variables in memory, the stupid and pointless to compare the equivalent, with no rough internal defects compilers for x86 processors in speed of the final code. And those who pay for some there is a super-duper synthetic tests simply throw money in the trash or brazenly make a fool of us.

The only thing we can confidently say, that this the specific source code, compiled in the this specific compiler, of this particular version, in the IDE or from the Command Line specifically, in this particular machine, is performed in this a specific time. And for those another compilers - for that time. Most - about anything.

Because for proper CPU, it does not matter the place location specific variables in memory and value. For x86 - it has. x86 - murk and defective..

And so, the compiler x86, which proved to be the worst in the previous test, the following may be the best - any chance to variable jabbed.

For different models of x86, the final results of comparative speed program tests, for different compilers, do not match. On one model wins one x86 compiler, on the other - the other, with one and the same, unchanged, prepared code type 2 + 2. This completely confuses anyone. Especially if the compiler prohibits any optimization. I can demonstrate if necessary. So, contrary to all expectations, LCC32 can win Fasm 10% -15% rate on a fully identical the final code. Who among you can magine this sanely explain?

Best x86 compiler does not exist! What use? - It depends on personal preference and everyone decides for himself.

Though there are no rules without exceptions?

;Programm Speed.pb for PureBasic

;(c) Oleg E. Tereshkov 2014

;========================

Import "kernel32.lib"

GetTickCount()

EndImport

;========================

OpenConsole()

L1:

temp11.l = 1

temp12.l = 5

temp13.l = 1

temp14.l = 4

temp15.l = 2

temp16.l = 3

temp17.l = 4

temp18.l = 2

temp19.l = 5

temp20.l = 3

temp21.l = 2

temp1.l = 2147483647

temp2.l = 2147483647

temp3.l = 2147483647

temp4.l = 2147483647

temp5.l = 2147483647

temp6.l = 2147483647

temp7.l = 2147483647

temp8.l = 2147483647

temp9.l = 2147483647

temp10.l= 2147483647

temp0.l =GetTickCount()

While temp10>0

temp1=temp1 - temp11

temp2=temp2 - temp12

temp3=temp3 - temp13

temp4=temp4 - temp14

temp5=temp5 - temp15

temp6=temp6 - temp16

temp7=temp7 - temp17

temp8=temp8 - temp18

temp9=temp9 - temp19

temp10=temp10 - temp20

temp21=temp11

temp11=temp12

temp12=temp13

temp13=temp14

temp14=temp15

temp15=temp16

temp16=temp17

temp17=temp18

temp18=temp19

temp19=temp20

temp20=temp21

Wend

temp1=GetTickCount()

temp1=temp1 - temp0

PrintN(Str(temp1))

Goto L1

Time for PureBasic 4.10 - 22000 ms, and for the PureBasic 4.41 demo - 13500ms. There is a difference. Although both of these compilers cost the same. And it's apparently not only in order of variables. And that is why, on the site PureBasic, you'll never see the previous versions of the program. You know yourself, dirty clothes, dung in the house and everything else more. All learned in comparison and everything is learned, sooner or later, even the human genome and bugs x86.

Exceptions prove the rule. And if the compiler does not slows as frankly as PureBasic 3.20 - 4.10, GCC, Ionic, Easm and other rattles, the performance of final program for x86, in the absence of gross miscalculations in it, very largely a matter of personal luck and pure programmer chance when you declare variables in the source.

Or, for SPHINX C-- 0.239:

//speed.c--

#pragma option W32C

#pragma option J0

#pragma option 3

//-------------------

extern WINAPI "crtdll.dll"

{ printf();}

extern WINAPI "kernel32.dll"

{ GetTickCount();}

//-------------------

int temp0,temp1,temp2,temp3,\

temp4,temp5,temp6,temp7,\

temp8,temp9,temp10,temp11,\

temp12,temp13,temp14,temp15,\

temp16,temp17,temp18,temp19,\

temp20,temp21;

//----------------------------------

//-------------MAIN-----------------

//----------------------------------

int main (void){

L1: temp11=1;temp12=5;temp13=1;

temp14=4;temp15=2;temp16=3;

temp17=4;temp18=2;temp19=5;

temp20=3;temp21=2;

temp1=temp2=temp3=temp4=\

temp5=temp6=temp7=temp8=\

temp9=temp10 = 0x7fffffff;

temp0=GetTickCount();

while (temp10 > 0) {

temp1=temp1 - temp11;

temp2=temp2 - temp12;

temp3=temp3 - temp13;

temp4=temp4 - temp14;

temp5=temp5 - temp15;

temp6=temp6 - temp16;

temp7=temp7 - temp17;

temp8=temp8 - temp18;

temp9=temp9 - temp19;

temp10=temp10 - temp20;

temp21=temp20;temp20=temp19;

temp19=temp18;temp18=temp17;

temp17=temp16;temp16=temp15;

temp15=temp14;temp14=temp13;

temp13=temp12;temp12=temp11;

temp11=temp21; }

temp1=GetTickCount();

temp1=temp1 - temp0;

printf("%d\n", temp1);

goto L1;

}

Time 12,500 milliseconds - the fastest, but if the line "extern WINAPI" crtdll.dll "{printf ();}" is replaced by "extern WINAPI "crtdll.dll" {int cdecl printf (char *,...);}, as it must have, the execution right up to larger view of 13000ms and is +4%, though in fact nothing has changed. And if, for this example, in addition to {int cdecl printf (char *,...);}, and all variables are converted in the local, the runtime will increase even more and take 13400ms +7.2% from the initial 12,500. Scam of the INTEL and AMD!

And you can optimize and compare indefinitely. Alas, all normal compilers are the same, what would to you on this occasion said, and what would you personally own may seem. And in all sorts of Do-While and For-Next for x86, it is better to use global variables. Faster is. It seems to be. But not rush to conclusions.

How many about wonderful discoveries preparing by enlightenment spirit!!! Yes, true, how much heinous discoveries have prepared for us INTEL and AMD?

At the dawn of civilization, rattling armor a CPL32 and the younger brother - ASM32. They want to sell its and today for $30 apiece. Under Windows, it still do not work - keep the brand, only under MS DOS - so, probably, customers more and better sold, but here's how announced, by the authors themselves, these crafts: - CPL32, the best assembler for true, pure assembly programmers, there is absolutely NO RED TAPE, you do NOT declare variable types, segments or procedures.

In short, to program now you can instinctively, like a bull on the road piss. But is good this? Can we thus become closer to God than the Roman Pope and for x86, closer than INTEL and AMD?

FASM, too, allows you to write without any rules. And let's, rewrite the second listing like this:

;Programm Speed_me_2a.asm for Fasm

;(c) Oleg E. Tereshkov 2015

;//-------------macro-------------//

macro library [label,string]

{ forward

local _label

dd 0,0,0,rva _label,rva label

common

dd 0,0,0,0,0

forward

_label db string,0 }

macro import [label,string]

{ forward

local _label

label dd rva _label

common

dd 0

forward

_label dw 0

db string,0 }

;//------------endmac-------------//

format PE console

entry start

start: nop

lbl0: mov eax, 7fffffffh

mov [temp1], eax

mov [temp2], eax

mov [temp3], eax

mov [temp4], eax

mov [temp5], eax

mov [temp6], eax

mov [temp7], eax

mov [temp8], eax

mov [temp9], eax

mov [temp10], eax

call [clock]

mov [temp0], eax

lbl1: cmp [temp10], 0

jle lbl2

mov eax,[temp1]

sub eax, [temp11]

mov [temp1], eax

mov ecx,[temp2]

sub ecx, [temp12]

mov [temp2], ecx

mov edx,[temp3]

sub edx, [temp13]

mov [temp3], edx

mov eax,[temp4]

sub eax, [temp14]

mov [temp4], eax

mov ecx,[temp5]

sub ecx, [temp15]

mov [temp5], ecx

mov edx,[temp6]

sub edx, [temp16]

mov [temp6], edx

mov eax,[temp7]

sub eax, [temp17]

mov [temp7], eax

mov ecx,[temp8]

sub ecx, [temp18]

mov [temp8], ecx

mov edx,[temp9]

sub edx, [temp19]

mov [temp9], edx

mov eax,[temp10]

sub eax, [temp20]

mov [temp10], eax

;---(1)merry-go-round(1)---

mov eax,[temp20]

mov [temp21], eax

mov ebx,[temp19]

mov [temp20], ebx

mov edx,[temp18]

mov [temp19], edx

mov ecx,[temp17]

mov [temp18], ecx

mov eax,[temp16]

mov [temp17], eax

mov ebx,[temp15]

mov [temp16], ebx

mov edx,[temp14]

mov [temp15], edx

mov ecx,[temp13]

mov [temp14], ecx

mov eax,[temp12]

mov [temp13], eax

mov ebx,[temp11]

mov [temp12], ebx

mov edx,[temp21]

mov [temp11], edx

;---(1)merry-go-round(1)---

jmp lbl1

lbl2: call [clock]

sub eax, [temp0]

push eax

push temp_s

call [printf]

add esp, 8

jmp lbl0

temp0 dd 0h

temp1 dd 0h

temp2 dd 0h

temp3 dd 0h

temp4 dd 0h

temp5 dd 0h

temp6 dd 0h

temp7 dd 0h

temp8 dd 0h

temp9 dd 0h

temp10 dd 0h

temp11 dd 3

temp12 dd 3

temp13 dd 3

temp14 dd 3

temp15 dd 3

temp16 dd 3

temp17 dd 3

temp18 dd 3

temp19 dd 3

temp20 dd 3

temp21 dd 3

temp_s db 25h,64h,0ah,0

;//-------------idata-------------//

section '.idata' import data readable writeable

library crtdll, 'crtdll.dll'

crtdll:

import printf,'printf',\

clock, 'clock'

Compile, run. And Aw, Snap - the execution right increased five times since 13350ms to 60200ms. Not bad! But it would seems... What is there to do? Flat memory! Memory x86, it's flat course, but as you can see, not for everyone, not for all, not always, and not in all places. Write a self-modifying code under x86 is fun, but debility. Moronity!

So, what the compiler produces the fastest code - Fasm or Fasm same??? What's better??? Fasm or Fasm itself??? After all, as you can see for yourself the difference in the speed of execution of the same source code processed by the same compiler Fasm, under the same conditions on the same processor x86, maybe five times!!!

----------------cut+line_2---------------

Its were flowers and background. Although, 5 times - not that flowers, but +400%. And now - berries and new evidence. Silver nail into the heart of x86. Let us give more ability for the source code to display and the number of cycles, waste processor for the same time, and along with get rid of becoming unnecessary counting milliseconds, well, for the purity experimentation, so to speak:

;Programm Speed_me_3.asm for Fasm

;(c) Tereshkov Oleg Evg. 2014

macro library [label,string]

{ forward

local _label

dd 0,0,0,rva _label,rva label

common

dd 0,0,0,0,0

forward

_label db string,0 }

macro import [label,string]

{ forward

local _label

label dd rva _label

common

dd 0

forward

_label dw 0

db string,0 }

format PE console

entry start

;//-------------code-------------//

section '.code' code readable executable

start: nop

lbl0: mov eax, 7fffffffh

mov [temp1], eax

mov [temp2], eax

mov [temp3], eax

mov [temp4], eax

mov [temp5], eax

mov [temp6], eax

mov [temp7], eax

mov [temp8], eax

mov [temp9], eax

mov [temp10], eax

rdtsc

mov [vr_temp_edx_1],edx

mov [vr_temp_eax_1],eax

lbl1: cmp [temp10], 0

jle lbl2

mov eax,[temp1]

sub eax, [temp11]

mov [temp1], eax

mov ecx,[temp2]

sub ecx, [temp12]

mov [temp2], ecx

mov edx,[temp3]

sub edx, [temp13]

mov [temp3], edx

mov eax,[temp4]

sub eax, [temp14]

mov [temp4], eax

mov ecx,[temp5]

sub ecx, [temp15]

mov [temp5], ecx

mov edx,[temp6]

sub edx, [temp16]

mov [temp6], edx

mov eax,[temp7]

sub eax, [temp17]

mov [temp7], eax

mov ecx,[temp8]

sub ecx, [temp18]

mov [temp8], ecx

mov edx,[temp9]

sub edx, [temp19]

mov [temp9], edx

mov eax,[temp10]

sub eax, [temp20]

mov [temp10], eax

mov eax,[temp11]

mov [temp21], eax

mov ebx,[temp12]

mov [temp11], ebx

mov edx,[temp13]

mov [temp12], edx

mov ecx,[temp14]

mov [temp13], ecx

mov eax,[temp15]

mov [temp14], eax

mov ebx,[temp16]

mov [temp15], ebx

mov edx,[temp17]

mov [temp16], edx

mov ecx,[temp18]

mov [temp17], ecx

mov eax,[temp19]

mov [temp18], eax

mov ebx,[temp20]

mov [temp19], ebx

mov edx,[temp21]

mov [temp20], edx

jmp lbl1

lbl2: rdtsc

sub eax,[vr_temp_eax_1]

jae sb1

neg eax

sb1: sbb edx,[vr_temp_edx_1]

mov [vr_temp_edx_2],edx

mov [vr_temp_eax_2],eax

mov eax,edx

call htd

mov eax,[vr_temp_eax_2]

call htd

call prp_d

mov [vr_temp_dec_3],eax

call prp_d

mov [vr_temp_dec_2],eax

call prp_d

mov [vr_temp_dec_1],eax

call prp_d

mov [vr_temp_eax_2],eax

call prp_d

mov [vr_temp_edx_2],eax

mov ebx,vr_temp_edx_2

fnd0: mov eax,[ebx]

cmp al,30h

jnz fnd0_2

inc ebx

jmp fnd0

fnd0_2: push ebx

call [printf]

add esp, 4

jmp lbl0

prp_d: pop ebx

pop edx

mov ah,dl

pop edx

mov al,dl

rol eax,16

pop edx

mov ah,dl

pop edx

mov al,dl

push ebx

ret

;=========--------------------->

htd: pop edx

xor ebx,ebx

htd1: sub eax,3B9ACA00h

jb htd1_2

inc ebx

je htd1_3

jmp htd1

htd1_2: add eax,3B9ACA00h

htd1_3: add ebx,030h

push ebx

xor ebx,ebx

htd2: sub eax,5F5E100h

jb htd2_2

inc ebx

je htd2_3

jmp htd2

htd2_2: add eax,5F5E100h

htd2_3: add ebx,030h

push ebx

xor ebx,ebx

htd3: sub eax,989680h

jb htd3_2

inc ebx

je htd3_3

jmp htd3

htd3_2: add eax,989680h

htd3_3: add ebx,030h

push ebx

xor ebx,ebx

htd4: sub eax,0F4240h

jb htd4_2

inc ebx

je htd4_3

jmp htd4

htd4_2: add eax,0F4240h

htd4_3: add ebx,030h

push ebx

xor ebx,ebx

htd5: sub eax,186A0h

jb htd5_2

inc ebx

je htd5_3

jmp htd5

htd5_2: add eax,186A0h

htd5_3: add ebx,030h

push ebx

xor ebx,ebx

htd6: sub eax,2710h

jb htd6_2

inc ebx

je htd6_3

jmp htd6

htd6_2: add eax,2710h

htd6_3: add ebx,030h

push ebx

xor ebx,ebx

htd7: sub eax,3E8h

jb htd7_2

inc ebx

je htd7_3

jmp htd7

htd7_2: add eax,3E8h

htd7_3: add ebx,030h

push ebx

xor ebx,ebx

htd8: sub eax,64h

jb htd8_2

inc ebx

je htd8_3

jmp htd8

htd8_2: add eax,64h

htd8_3: add ebx,030h

push ebx

xor ebx,ebx

htd9: sub eax,0Ah

jb htd9_2

inc ebx

je htd9_3

jmp htd9

htd9_2: add eax,0Ah

htd9_3: add ebx,030h

push ebx

mov ebx,eax

add ebx,030h

push ebx

push edx

ret

;//-------------idata-------------//

section '.idata' import data readable writeable

library crtdll, 'crtdll.dll'

crtdll:

import printf,'printf'

;//-------------data-------------//

section '.data' data readable writeable

temp0 dd 0h

temp1 dd 0h

temp2 dd 0h

temp3 dd 0h

temp4 dd 0h

temp5 dd 0h

temp6 dd 0h

temp7 dd 0h

temp8 dd 0h

temp9 dd 0h

temp10 dd 0h

temp11 dd 1

temp12 dd 5

temp13 dd 1

temp14 dd 4

temp15 dd 2

temp16 dd 3

temp17 dd 4

temp18 dd 2

temp19 dd 5

temp20 dd 3

temp21 dd 2

vr_temp_edx_0 dd 0h

vr_temp_eax_0 dd 0h

vr_temp_edx_1 dd 0h

vr_temp_eax_1 dd 0h

vr_temp_edx_2 dd 0h

vr_temp_eax_2 dd 0h

vr_temp_dec_1 dd 0h

vr_temp_dec_2 dd 0h

vr_temp_dec_3 dd 0h

vr_temp_dec_4 db 00ah,000h

vr_temp_ebx_1 dd 0h

vr_temp_esp_1 dd 0h

Rustled? How this program works is not important. I myself do not know, I think. The important thing is that it is the complete analog of two previous ones, and runs in 50600000000 cycles processor. And now let's just rewrite section '.data' in this way:

;//-------------data-------------//

section '.data' data readable writeable

vr_temp_edx_0 dd 0h

vr_temp_eax_0 dd 0h

vr_temp_edx_1 dd 0h

vr_temp_eax_1 dd 0h

vr_temp_edx_2 dd 0h

vr_temp_eax_2 dd 0h

vr_temp_dec_1 dd 0h

vr_temp_dec_2 dd 0h

vr_temp_dec_3 dd 0h

vr_temp_dec_4 db 00ah,000h

vr_temp_ebx_1 dd 0h

vr_temp_esp_1 dd 0h

temp0 dd 0h

temp1 dd 0h

temp2 dd 0h

temp3 dd 0h

temp4 dd 0h

temp5 dd 0h

temp6 dd 0h

temp7 dd 0h

temp8 dd 0h

temp9 dd 0h

temp10 dd 0h

temp11 dd 1

temp12 dd 5

temp13 dd 1

temp14 dd 4

temp15 dd 2

temp16 dd 3

temp17 dd 4

temp18 dd 2

temp19 dd 5

temp20 dd 3

temp21 dd 2

Aw, Snap! And here it is! Again! Program time increased 5 times, and the number of required clock cycles for this processor was already the whole 260500000000. Not bad!

----------------cut+line_3---------------

I hear you, I hear screams belated epiphany: - It's Windows, it all stupid Windows! No, not Windows. And not Fasm. This is a x86. Here's an analogy for MS DOS, and if necessary I can lay out for DiceRTE - http://www.diefer.de or DEXOS. The same.

; Simple Speed5 demo, for MSDOS. Fasm

;(c) Oleg E. Tereshkov 2014

format binary as 'com'

use16

ORG 0x0000100

lbl0: mov eax, 7fffffffh

mov [temp1], eax

mov [temp2], eax

mov [temp3], eax

mov [temp4], eax

mov [temp5], eax

mov [temp6], eax

mov [temp7], eax

mov [temp8], eax

mov [temp9], eax

mov [temp10], eax

rdtsc

mov [vr_temp_edx_1],edx

mov [vr_temp_eax_1],eax

;=========--------------------->

lbl1: cmp [temp10], 0

jle lbl2

mov eax,[temp1]

sub eax, [temp11]

mov [temp1], eax

mov ecx,[temp2]

sub ecx, [temp12]

mov [temp2], ecx

mov edx,[temp3]

sub edx, [temp13]

mov [temp3], edx

mov eax,[temp4]

sub eax, [temp14]

mov [temp4], eax

mov ecx,[temp5]

sub ecx, [temp15]

mov [temp5], ecx

mov edx,[temp6]

sub edx, [temp16]

mov [temp6], edx

mov eax,[temp7]

sub eax, [temp17]

mov [temp7], eax

mov ecx,[temp8]

sub ecx, [temp18]

mov [temp8], ecx

mov edx,[temp9]

sub edx, [temp19]

mov [temp9], edx

mov eax,[temp10]

sub eax, [temp20]

mov [temp10], eax

mov eax,[temp11]

mov [temp21], eax

mov ebx,[temp12]

mov [temp11], ebx

mov edx,[temp13]

mov [temp12], edx

mov ecx,[temp14]

mov [temp13], ecx

mov eax,[temp15]

mov [temp14], eax

mov ebx,[temp16]

mov [temp15], ebx

mov edx,[temp17]

mov [temp16], edx

mov ecx,[temp18]

mov [temp17], ecx

mov eax,[temp19]

mov [temp18], eax

mov ebx,[temp20]

mov [temp19], ebx

mov edx,[temp21]

mov [temp20], edx

jmp lbl1

;=========---------------------->

lbl2: rdtsc

sub eax,[vr_temp_eax_1]

jae sb1

neg eax

sb1: sbb edx,[vr_temp_edx_1]

mov [vr_temp_edx_2],edx

mov [vr_temp_eax_2],eax

mov eax,edx

call htd

mov eax,[vr_temp_eax_2]

call htd

call prp_d

mov [vr_temp_dec_3],eax

call prp_d

mov [vr_temp_dec_2],eax

call prp_d

mov [vr_temp_dec_1],eax

call prp_d

mov [vr_temp_eax_2],eax

call prp_d

mov [vr_temp_edx_2],eax

mov ebx,vr_temp_edx_2

fnd0: mov eax,[ebx]

cmp al,30h

jnz fnd0_2

inc ebx

jmp fnd0

fnd0_2: mov dx,bx

mov ah,0x09

int 21h

push lbl0

ret

prp_d: pop ebx

pop edx

mov ah,dl

pop edx

mov al,dl

rol eax,16

pop edx

mov ah,dl

pop edx

mov al,dl

push ebx

ret

;=========------------------>

htd: pop edx

xor ebx,ebx

htd1: sub eax,3B9ACA00h

jb htd1_2

inc ebx

jmp htd1

htd1_2: add eax,3B9ACA00h

add ebx,030h

push ebx

xor ebx,ebx

htd2: sub eax,5F5E100h

jb htd2_2

inc ebx

jmp htd2

htd2_2: add eax,5F5E100h

add ebx,030h

push ebx

xor ebx,ebx

htd3: sub eax,989680h

jb htd3_2

inc ebx

jmp htd3

htd3_2: add eax,989680h

add ebx,030h

push ebx

xor ebx,ebx

htd4: sub eax,0F4240h

jb htd4_2

inc ebx

jmp htd4

htd4_2: add eax,0F4240h

add ebx,030h

push ebx

xor ebx,ebx

htd5: sub eax,186A0h

jb htd5_2

inc ebx

jmp htd5

htd5_2: add eax,186A0h

add ebx,030h

push ebx

xor ebx,ebx

htd6: sub eax,2710h

jb htd6_2

inc ebx

jmp htd6

htd6_2: add eax,2710h

add ebx,030h

push ebx

xor ebx,ebx

htd7: sub eax,3E8h

jb htd7_2

inc ebx

jmp htd7

htd7_2: add eax,3E8h

add ebx,030h

push ebx

xor ebx,ebx

htd8: sub eax,64h

jb htd8_2

inc ebx

jmp htd8

htd8_2: add eax,64h

add ebx,030h

push ebx

xor ebx,ebx

htd9: sub eax,0Ah

jb htd9_2

inc ebx

jmp htd9

htd9_2: add eax,0Ah

add ebx,030h

push ebx

mov ebx,eax

add ebx,030h

push ebx

push edx

ret

;=========---data1-------->

temp0 dd 0h

temp1 dd 0h

temp2 dd 0h

temp3 dd 0h

temp4 dd 0h

temp5 dd 0h

temp6 dd 0h

temp7 dd 0h

temp8 dd 0h

temp9 dd 0h

temp10 dd 0h

temp11 dd 1

temp12 dd 5

temp13 dd 1

temp14 dd 4

temp15 dd 2

temp16 dd 3

temp17 dd 4

temp18 dd 2

temp19 dd 5

temp20 dd 3

temp21 dd 2

vr_temp_edx_0 dd 0h

vr_temp_eax_0 dd 0h

vr_temp_edx_1 dd 0h

vr_temp_eax_1 dd 0h

vr_temp_edx_2 dd 0h

vr_temp_eax_2 dd 0h

vr_temp_dec_1 dd 0h

vr_temp_dec_2 dd 0h

vr_temp_dec_3 dd 0h

vr_temp_dec_4 db 00ah,'$'

vr_temp_ebx_1 dd 0h

vr_temp_esp_1 dd 0h

Compile, run under Windows - slightly slower than the original. And under pure DOS - even slower. Now rewrite section data, it is a tradition.

;=========---data2-------->

vr_temp_edx_0 dd 0h

vr_temp_eax_0 dd 0h

vr_temp_edx_1 dd 0h

vr_temp_eax_1 dd 0h

vr_temp_edx_2 dd 0h

vr_temp_eax_2 dd 0h

vr_temp_dec_1 dd 0h

vr_temp_dec_2 dd 0h

vr_temp_dec_3 dd 0h

vr_temp_dec_4 db 00ah,'$'

vr_temp_ebx_1 dd 0h

vr_temp_esp_1 dd 0h

temp0 dd 0h

temp1 dd 0h

temp2 dd 0h

temp3 dd 0h

temp4 dd 0h

temp5 dd 0h

temp6 dd 0h

temp7 dd 0h

temp8 dd 0h

temp9 dd 0h

temp10 dd 0h

temp11 dd 1

temp12 dd 5

temp13 dd 1

temp14 dd 4

temp15 dd 2

temp16 dd 3

temp17 dd 4

temp18 dd 2

temp19 dd 5

temp20 dd 3

temp21 dd 2

And again, the run time increased by 5 times. Now, back to the original version, but in the beginning, add meaningless line that the program is nowhere used.

;=========---data3-------->

hllspd db "Hello! I'm Speedy speedometr !!!",00ah,00ah,000h

temp0 dd 0h

temp1 dd 0h

temp2 dd 0h

temp3 dd 0h

temp4 dd 0h

temp5 dd 0h

temp6 dd 0h

temp7 dd 0h

temp8 dd 0h

temp9 dd 0h

temp10 dd 0h

temp11 dd 1

temp12 dd 5

temp13 dd 1

temp14 dd 4

temp15 dd 2

temp16 dd 3

temp17 dd 4

temp18 dd 2

temp19 dd 5

temp20 dd 3

temp21 dd 2

vr_temp_edx_0 dd 0h

vr_temp_eax_0 dd 0h

vr_temp_edx_1 dd 0h

vr_temp_eax_1 dd 0h

vr_temp_edx_2 dd 0h

vr_temp_eax_2 dd 0h

vr_temp_dec_1 dd 0h

vr_temp_dec_2 dd 0h

vr_temp_dec_3 dd 0h

vr_temp_dec_4 db 00ah,'$'

vr_temp_ebx_1 dd 0h

vr_temp_esp_1 dd 0h

And again, the run time increased by 5 times. In person, I would not undertake to say so what the time is right. I know only that it should not be. But anyway, I'm congratulate all, especially those, who bought QNX. All this did not seem to like real time. Money is spent, and in the end - fig-leaf. Garbage is garbage, garbage and remains - x86.

Of course, after the stench will spread around the world, INTEL and AMD will explain everything for us. It may even find that it should be, that it should the way, it's a trick against hackers for passwords, brute force, are not broken.

But personally, I would prefer to get the money back, even to x86 was accustomed. Do not know, whether the will is now INTEL and AMD shares up or fall, but I would place I&A in February bought this article and you now it would not read. But what's done is done. How to place INTEL and AMD would you do? And what about this think their main shareholders? - Nothing personal, just business?

----------------cut+line_4---------------

Here is the program:

; MHZ.asm for Fasm

;(c) Tereshkov Oleg Evg. 2014

macro library [label,string]

{ forward

local _label

dd 0,0,0,rva _label,rva label

common

dd 0,0,0,0,0

forward

_label db string,0 }

macro import [label,string]

{ forward

local _label

label dd rva _label

common

dd 0

forward

_label dw 0

db string,0 }

format PE console

entry start

;=========----------------------------------------->

section '.code' code readable executable

start: nop

lbl0: rdtsc

mov [vr_temp_edx_1],edx

mov [vr_temp_eax_1],eax

;=========----------------------------------------->

lbl1:

call [clock]

mov [temp0], eax

push 1000d

call [Sleep]

;=========----------------------------------------->

lbl2: call [clock]

sub eax, [temp0]

push eax

push temp_s

call [printf]

add esp, 8

rdtsc

sub eax,[vr_temp_eax_1]

jae sb1

neg eax

sb1: sbb edx,[vr_temp_edx_1]

mov [vr_temp_edx_2],edx

mov [vr_temp_eax_2],eax

mov eax,edx

call htd

mov eax,[vr_temp_eax_2]

call htd

call prp_d

mov [vr_temp_dec_3],eax

call prp_d

mov [vr_temp_dec_2],eax

call prp_d

mov [vr_temp_dec_1],eax

call prp_d

mov [vr_temp_eax_2],eax

call prp_d

mov [vr_temp_edx_2],eax

mov ebx,vr_temp_edx_2

fnd0: mov eax,[ebx]

cmp al,30h

jnz fnd0_2

inc ebx

jmp fnd0

fnd0_2: push ebx

call [printf]

add esp, 4

jmp lbl0

prp_d: pop ebx

pop edx

mov ah,dl

pop edx

mov al,dl

rol eax,16

pop edx

mov ah,dl

pop edx

mov al,dl

push ebx

ret

;=========----------------------------------------->

htd: pop edx

xor ebx,ebx

htd1: sub eax,3B9ACA00h

jb htd1_2

inc ebx

je htd1_3

jmp htd1

htd1_2: add eax,3B9ACA00h

htd1_3: add ebx,030h

push ebx

xor ebx,ebx

htd2: sub eax,5F5E100h

jb htd2_2

inc ebx

je htd2_3

jmp htd2

htd2_2: add eax,5F5E100h

htd2_3: add ebx,030h

push ebx

xor ebx,ebx

htd3: sub eax,989680h

jb htd3_2

inc ebx

je htd3_3

jmp htd3

htd3_2: add eax,989680h

htd3_3: add ebx,030h

push ebx

xor ebx,ebx

htd4: sub eax,0F4240h

jb htd4_2

inc ebx

je htd4_3

jmp htd4

htd4_2: add eax,0F4240h

htd4_3: add ebx,030h

push ebx

xor ebx,ebx

htd5: sub eax,186A0h

jb htd5_2

inc ebx

je htd5_3

jmp htd5

htd5_2: add eax,186A0h

htd5_3: add ebx,030h

push ebx

xor ebx,ebx

htd6: sub eax,2710h

jb htd6_2

inc ebx

je htd6_3

jmp htd6

htd6_2: add eax,2710h

htd6_3: add ebx,030h

push ebx

xor ebx,ebx

htd7: sub eax,3E8h

jb htd7_2

inc ebx

je htd7_3

jmp htd7

htd7_2: add eax,3E8h

htd7_3: add ebx,030h

push ebx

xor ebx,ebx

htd8: sub eax,64h

jb htd8_2

inc ebx

je htd8_3

jmp htd8

htd8_2: add eax,64h

htd8_3: add ebx,030h

push ebx

xor ebx,ebx

htd9: sub eax,0Ah

jb htd9_2

inc ebx

je htd9_3

jmp htd9

htd9_2: add eax,0Ah

htd9_3: add ebx,030h

push ebx

mov ebx,eax

add ebx,030h

push ebx

push edx

ret

;=========----------------------------------------->

;//-------------idata-------------//

section '.idata' import data readable writeable

library Msvcrt,'Msvcrt.dll',\

kernel32,'kernel32.dll'

Msvcrt:

import printf,'printf',\

clock, 'clock'

kernel32:

import Sleep,'Sleep'

;//-------------data-------------//

section '.data' data readable writeable

temp0 dd 0h

temp_s db 25h,64h,0ah,0

vr_temp_edx_0 dd 0h

vr_temp_eax_0 dd 0h

vr_temp_edx_1 dd 0h

vr_temp_eax_1 dd 0h

vr_temp_edx_2 dd 0h

vr_temp_eax_2 dd 0h

vr_temp_dec_1 dd 0h

vr_temp_dec_2 dd 0h

vr_temp_dec_3 dd 0h

vr_temp_dec_4 db 00ah,000h

vr_temp_ebx_1 dd 0h

vr_temp_esp_1 dd 0h

It measures the frequency of the processor. I have through time fly two values - 1833653534 and 2461100546, approximately. And You? And some of them right? What to prove? Take the average arithmetic for the year? Well, well - it is reasonable.

----------------cut+line_5---------------

Conclusions:

1. Me, O.E. Tereshkov identified and proved a new bug processor x86. Manifests itself in an unpredictable slowing speed and unpredictable programs output, depending on the arrangement of sequence and values variables in the final code, and just like that, without any depending on the reasons and as such.

2. No entity in the universe, including the divine, and no a whore in the world does not know what's inside x86, at the actually occurs.

3. One risc, of two steps with three dozen domestic registers, better than all Intel / AMD combined. risc deterministic and predictable. Intel / AMD - dark and dense forest.

4. Giving robots the ability to somehow influence the human life - the height of stupidity, idiocy, cruelty and inhumanity. Who does so - subman. Remember speed avtoregistrators, cover accidents drivers endless fines from scratch.

A small bunch of silicon the size of 5x5 mm - all, human well-being - nothing. Just think, a bunch of brainless silicon 5x5 mm prescribes penalties you! The policemen to it - only an appendage. And you obey it, goes and pay. And you then who? Man? The image and likeness of God?

- Cattle! Chickens and cows, to be milked and carry. Perhaps you should disappear.

Mankind has long had signed his own death warrant. And it called hopelessness. Robots have long been controlled us, they rob and hammering. Remember Terminator? Time to sue for Intel and AMD.

5. In light of the above, absolute nonsense took attempt to compare x86 compilers for the speed of the final code programs. Such a comparison is - absurd.

6. Manufacturers of synthetic benchmarks for x86 should be ruined - they new nothing. AMD and Intel have to go on an unprecedented replace all of their stuff, which they managed to sell-foist to naive fools in the past 35 years, since 1980.

7. This discovery has no value for a particular users simply crawling the Internet. Who cares, opens a browser for 1/2000000000 or 5/2000000000 seconds? And the Notepad and the Paint will continue to work fine.

8. This discovery is of great importance for the institutions and universities conducting research on x86, if any any. There is a big difference: the results will be received in a year or in five, ten years or over fifty. And will its be true? Who agree to wait so long and pay for the final bullshit?

9. This discovery is of great importance for employers who pay wages to their employees, while they polishing their pipes until x86 rumble wasted. Remember! - 5 times! Time to sue Intel and AMD.

10. This discovery is of great value wherever produced audio and video rendering - audio, film, television and radio studios. Who needs the overtime?

11. This finding is of great importance for the Russian Space Agency, with punctually, drowning in the Pacific Ocean its carriers. For a moment, and how much it uses the x86? What about NASA, which satellites overshoot planets? Time to sue Intel and AMD.

12. I'm not talking about brute force password cracking on fivefold slow processor - another tale for the naive Fools.

13. More recently, and Apple crawled to the x86. Take congratulations. Not for nothing, you took away so much for your new Mac! Once seen a vision and genius of Steve Jobs. How he made you! And now and ever and ever and ever.

14. Forget and erase asses the endless miles of books and scientific articles on code optimization x86. x86 do not reads these books. I'm - O.E. Tereshkov illuminated your path and expanded, and share the horizons. In the proposed to you simplest program, just only 22 numeric variables. For a moment, how many of you know the number of placements from 22 to 22? To sort - not enumerate. And children and your grandchildren will suffice. It is time to expand the testers states. And special programs automate this case write. But given newly opened specifics x86, as it did not work even longer.

15. It is time stop to spit and shit aside Windows. Create the industry standard for such nonsense as x86 - high Art. What can you personally, spitting with or without?

16. To write self-modifying code under x86 - suicide. Microsoft, and then Linux, apparently, knew everything. About what indirect evidence PE-file format, with a separate section for variables - data. From the beginning, Intel and AMD, apparently opened to Microsoft and Linux secret how to build code and variables in PE-file, for WINDOWS never slowed so godless as usual shamelessly brakes REACTOS. And how could you tricks used, your program will never work as well quickly as ntoskrnl.exe. And not because you are stupid or ring 0, but because from the beginning conceived so. Tempting Perspective! - You always second grade. And all the orders - to them. Dishonest competition, monopoly.

17. Considering that the bug common to all models of x86, and that x86 microcomputer itself, with its domestic the processor and the microcode inside, logical to assume the existence of built-in password-key. If the program presents it - works quickly. If not - godless slow. 1.83 / 5 = 366 MHz instead of 1833 MHz for AMD Athlon (tm) XP 2500+ 1.83GHz. For this reason, almost all Sound Recorder programs worthless.

18. Apparently, because of the passkey to x86, Microsoft, from Windows very beginning, imposed to all VC ++. Enough look at what surround prologue-epilogue VC ++ generates. Hide passkey in such an amount of unnecessary and senseless waste - simpler than simplest. Only with the key of Microsoft, software will fly, and with your - crawl like dead turtle. That is why even the most well-known equipment manufacturers bought their device drivers in Microsoft. Qualifications they have, but the key - no. Here are their own drivers not fit anywhere. It took the efforts of many people to return the Assembler to WindoWs. Thanks to the assembler and written this article.

19. It is possible that there is a variant of x86 for Pentagon and ours, devoid of mistakes described, and options for all others fools. Congratulations. Do you personally from what a camp? And I.

20. Shit (x86) - it is, and from America, shit.

21. This article has been proposed for review to lisa.su@amd.com, andy.bryant@intel.com, brian.krzanich@intel.com, harry.wolin@amd.com, mark.papermaster@amd.com. No one of them never torn his ass off a stool, and mumbled nothing in response. Probably for this, they and pay them millions. If they had a little more intelligence, you now would not see this article to read. And I - lived in Miami. But where have you seen the clever Head? Where? The Intel? The AMD? Not for nothing because, x86 so is.

22. This work is quite pull on Ph.D. and doctorate, and the Nobel But what Sciences Academy gives it to me? Science, as well as the production, nothing to do for real life. Only the loot!

23. For who, it is all necessary, if it does not work how it is necessary? INTEL and AMD, each went their long way, but in the end each allegedly turned the CPU with exactly the same bugs, like a "competitor." Well, and who do you want to fuck?

24. Recently, for $27, I bought a DELL INSPIRON 510m with INTEL MOBILE 1.70GHz inside. But it works on the 1.70GHz 15-20 seconds when starts and immediately goes to 597MHz. Who can tell me why? DELL? INTEL?

25. Tell us more, it's you again aliens framed. As on the Moon. Stupid fuckers. Adieu.

PS. Well, that russian magazine Hacker had flown through, it all understandably. But for INTEL and AMD, all still remains in force. The Fathers-founding, still can acquire the Right to Copy. According to the February price. Despite inflation.

Copyright: (c) Oleg E. Tereshkov December 2014. This article is owned by Oleg E. Tereshkov in all senses. Any use of it, printing, copying and reproduction - only after prepayment and my written consent. It is in the spirit of Microsoft, Intel and AMD.

Love you everyone! And now, go with faith and enthusiasm program on x86 a new, human DNA, change the human genome. Forward!

Yes, and of course, you are always welcome in the INTEL, and AMD. Come on. Believe me. I know that better than anyone.

Open letter to Intel&Amd from November 28, 2015:

Hi, Intel&Amd. The past week the page https://sites.google.com/site/excelmidi/ was view over seven hundred programmers from all around the world. Among them are such well-known, as Fred - PureBasic, hutch - Masm32, Tomasz Grysztar - Fasm, Jacob Navia - LCC32, Andrew Stuart Tanenbaum - Minix, Linus Benedict Torvalds - Linux, William Henry Gates - Windows. Objections did not arise and everything was confirmed.

If this is a joke, it is not funny.

We have spent years for education. More years to gain experience. We work hard, bringing our program to perfection to provide the user with maximum comfort and performance.

Therefore, a new surprise, with the placement of variables, is discouraging and makes give up. Back to the classified segment registers? We are ready. Well, show it us.

We are waiting for an explanation. Programmers around the world.

As an illustration of our words, we give two simple, exactly the same program - quick.asm and slow.asm, the difference in speed performance of which is 94 times to AMD Athlon (tm) XP 2500+ and 30 times for the Intel (R) Pentium (R) M.

And it's not funny.

So if you, as a programmer, can not await the outcome of your program - be reserved by patience. Go to the cinema, take a vacation for a week or switch to other Projects. And the results will come. Or simply swap variables.

If you do not have time to wait, use the line lbl0: mov eax, 004f0000h

;Programm quick.asm for Fasm

;(c) Oleg E. Tereshkov 2015

format PE console

entry start

;//-------------macro-------------//

macro library [label,string]

{ forward

local _label

dd 0,0,0,rva _label,rva label

common

dd 0,0,0,0,0

forward

_label db string,0 }

macro import [label,string]

{ forward

local _label

label dd rva _label

common

dd 0

forward

_label dw 0

db string,0 }

;//------------endmac-------------//

;//-------------idata-------------//

section '.idata' import data readable writeable executable

library crtdll, 'crtdll.dll'

crtdll:

import printf,'printf',\

clock, 'clock'

;//-------------data-------------//

temp0 dd 0h

temp1 dd 0h

temp2 dd 0h

temp3 dd 0h

temp4 dd 0h

temp5 dd 0h

temp6 dd 0h

temp7 dd 0h

temp8 dd 0h

temp9 dd 0h

temp10 dd 0h

temp11 dd 3

temp_s db 25h,64h,0ah,0

;//-------------code-------------//

start: nop

lbl0: mov eax, 4f000000h

mov [temp1], eax

mov [temp2], eax

mov [temp3], eax

mov [temp4], eax

mov [temp5], eax

mov [temp6], eax

mov [temp7], eax

mov [temp8], eax

mov [temp9], eax

mov [temp10], eax

call [clock]

mov [temp0], eax

lbl1: mov eax,[temp1]

sub eax, [temp11]

mov [temp1], eax

mov ecx,[temp2]

sub ecx, [temp11]

mov [temp2], ecx

mov edx,[temp3]

sub edx, [temp11]

mov [temp3], edx

mov eax,[temp4]

sub eax, [temp11]

mov [temp4], eax

mov ecx,[temp5]

sub ecx, [temp11]

mov [temp5], ecx

mov edx,[temp6]

sub edx, [temp11]

mov [temp6], edx

mov eax,[temp7]

sub eax, [temp11]

mov [temp7], eax

mov ecx,[temp8]

sub ecx, [temp11]

mov [temp8], ecx

mov edx,[temp9]

sub edx, [temp11]

mov [temp9], edx

mov eax,[temp10]

sub eax, [temp11]

mov [temp10], eax

cmp eax, 00h

jg lbl1

call [clock]

sub eax, [temp0]

push eax

push temp_s

call [printf]

add esp, 8

jmp lbl0

;Programm slow.asm for Fasm

;(c) Oleg E. Tereshkov 2015

format PE console

entry start

;//-------------macro-------------//

macro library [label,string]

{ forward

local _label

dd 0,0,0,rva _label,rva label

common

dd 0,0,0,0,0

forward

_label db string,0 }

macro import [label,string]

{ forward

local _label

label dd rva _label

common

dd 0

forward

_label dw 0

db string,0 }

;//------------endmac-------------//

;//-------------idata-------------//

section '.idata' import data readable writeable executable

library crtdll, 'crtdll.dll'

crtdll:

import printf,'printf',\

clock, 'clock'

;//-------------code-------------//

start: nop

lbl0: mov eax, 4f000000h

mov [temp1], eax

mov [temp2], eax

mov [temp3], eax

mov [temp4], eax

mov [temp5], eax

mov [temp6], eax

mov [temp7], eax

mov [temp8], eax

mov [temp9], eax

mov [temp10], eax

call [clock]

mov [temp0], eax

lbl1: mov eax,[temp1]

sub eax, [temp11]

mov [temp1], eax

mov ecx,[temp2]

sub ecx, [temp11]

mov [temp2], ecx

mov edx,[temp3]

sub edx, [temp11]

mov [temp3], edx

mov eax,[temp4]

sub eax, [temp11]

mov [temp4], eax

mov ecx,[temp5]

sub ecx, [temp11]

mov [temp5], ecx

mov edx,[temp6]

sub edx, [temp11]

mov [temp6], edx

mov eax,[temp7]

sub eax, [temp11]

mov [temp7], eax

mov ecx,[temp8]

sub ecx, [temp11]

mov [temp8], ecx

mov edx,[temp9]

sub edx, [temp11]

mov [temp9], edx

mov eax,[temp10]

sub eax, [temp11]

mov [temp10], eax

cmp eax, 00h

jg lbl1

call [clock]

sub eax, [temp0]

push eax

push temp_s

call [printf]

add esp, 8

jmp lbl0

;//-------------data-------------//

temp0 dd 0h

temp1 dd 0h

temp2 dd 0h

temp3 dd 0h

temp4 dd 0h

temp5 dd 0h

temp6 dd 0h

temp7 dd 0h

temp8 dd 0h

temp9 dd 0h

temp10 dd 0h

temp11 dd 3

temp_s db 25h,64h,0ah,0

Addition of 13 June 2016. For what fight, on it and cut. Ability to wait - swiftness reverse side. I specifically took a six-month pause to dot all the "i".

Intel and AMD, as would be expected, keep icy silence. And they will keep it always. They have nothing to say. And they the more do not want to replace all of their, sometimes not predictably, to put it mildly, not very productive junk sold for the last 26 years.

It convincingly demonstrates that no Intel or AMD does not control the situation, and provide only reproduction and circulation. And the joke, about x86 production by the aliens tracing paper, just not even a joke, but the prose of life.

I'm certainly not the seats folded hands, and between business, as planned, managed to participate in the Bell Labs Prize. Bell Labs Prize team, as would be expected in such sensitive cases, chose to freeze out and at first did not see my suggestions in the proposal, together with Bell Labs, investigate described, in this article, problem.

On my fair and very polite remark, that in the end of the ends they can not be so blunt, Bell Labs Prize team collected a second consultation and refused me with the wording of that part: Competition material described here can cause irreparable damage to the future of the entire computer industry.

Well, for the future of the computer industry, now we can be calm. With so cool x86-based, it does not disappear. But you must admit, if it does not Win, then at least, it does Recognition.

The very Bell Labs, though not loudly, softly, shyly, officially, but privately acknowledged the existence of a new x86 bug. And even interceded for the future of the all computer industry. Commendable.

Of course, if I was not I, but some Knut, Rich or Kernighan, then maybe everything would be another. And Bell Labs would again trumpeted at all crossroads about their new achievements of itself. And so -

Tell me without thinking: in Europe and the United States 400 million lives human. In India and China - 3 billion. But how many Nobel laureates among the two?

Of course, times of Rich and Kernighan at Bell Labs, for a long time, passed. And now, the glory of Bell Labs - people like Jan Hendrik Schon. Therefore, as you know, to communicate any hope to Bell Labs, today - is just silly. Incidentally, their OS Plan9 not even loaded, here within, probably the last 10 years. Funny, is not it?

Well, baby quite an assumption - Bell Labs and BellLabs Prize funded by Intel and AMD. In this case, as they say, comments are superfluous.

And what to do? If it's summer, collect berries and cook from them jam. If the winter, drink tea whith this jam. x86 does not pull on the object of worship. If you are a pro, try to accept and overcome a despond and natural disappointment. You have no way out. If you're a fan, think of the programming, as wholesome entertainment, which improves you.

To stumble on this bug, it took me 3 days. For Intel and AMD, acknowledged its existence and explained in any way, it will take 30,000 years. Treat x86 as well as you treat rolls of toilet paper, a pack of cigarettes or sanitary napkins. Those as it deserves - consumer. And then, finally, everything will fall into place.

Agree, without computers, life would be much a gray, dull, brutish and uninteresting, but on the other hand more natural, lively, calm and human - to one hundred years.

Viva Google, viva Intel and AMD, hello future of all computer industry, and success to all of us. Special thanks to Marcus from Bell Labs Prize team. Love you. The future, of the entire computer industry, will remember you. Always!

Update of November 16, 2017. A gift, for all those who have read up to this place, but, for various reasons, could not understand the foregoing examples and draw the right conclusions for themselves. Intel and Amd, of course scoundrels. But the benefit of the article should be, right? Therefore:

If you want your program to work quickly, order of following the sections in it, in the general case, should be only this:

.code

(.import data (*.dll))

.data

This is not known even by many authors of illustrious compilers and linkers. You know. :) And of course, there is no self-modification in critical blocks. :) x86 oh, how it does not like this. Bye!