37.5 Prime MIPS on Linode VM to 42.5 MIPS. gvp-> was faster on the
PowerPC architecture when gvp was kept in a dedicated register, but
that does not apply to Intel.
Old:
Timing CPU, 20.0 ticks per second...
35.3 Prime MIPS for 16-bit ADD loop
40.0 Prime MIPS for 16-bit MPY loop
42.1 Prime MIPS for 16-bit DIV loop
21.4 Prime MIPS for 32-bit ADD loop
30.8 Prime MIPS for 32-bit MPY loop
28.6 Prime MIPS for 32-bit DIV loop
57.1 Prime MIPS for 16-bit X=0 loop
44.4 Prime MIPS for 32-bit X=0 loop
37.5 average Prime MIPS
New:
Timing CPU, 20.0 ticks per second...
42.9 Prime MIPS for 16-bit ADD loop
53.3 Prime MIPS for 16-bit MPY loop
47.1 Prime MIPS for 16-bit DIV loop
24.0 Prime MIPS for 32-bit ADD loop
38.1 Prime MIPS for 32-bit MPY loop
32.0 Prime MIPS for 32-bit DIV loop
57.1 Prime MIPS for 16-bit X=0 loop
44.4 Prime MIPS for 32-bit X=0 loop
42.4 average Prime MIPS
added get32m & put32m: these always map
changed get32 and put32 to be inlined
changed mem[] references to MEM[] to allow experiments
tried using register for MEM pointer - not so great
tried using register for instcount - screwed up (very sluggish)
removed "char unmodified" from STLB; uses access[2] instead, to
avoid a multiply instruction in mapva (can use shift now)
use ea instead of pa when checking for page crossing in get32,
in preparation for read VA caching, like iget16 uses
changed gvp->prevppa from Prime memory offset to mem[] pointer
added inline to tch, tcr, adlr
added -DNOIDLE to make BDX use CPU cycles instead of sleeping
changed ea64v.h so ixy avoids branching (hot spot in Shark)
inlined and simplified iget16 instruction fetch
moved pio test to R-mode path
moved and simplified effective address calculation switch stmt
removed mode switch stmt for EA calcs, changed to cascaded if
moved iget16 static vars to gvp, for inlining
changed mapva and iget16 so that the normal path is predicted