37.5 Prime MIPS on Linode VM to 42.5 MIPS. gvp-> was faster on the
PowerPC architecture when gvp was kept in a dedicated register, but
that does not apply to Intel.
Old:
Timing CPU, 20.0 ticks per second...
35.3 Prime MIPS for 16-bit ADD loop
40.0 Prime MIPS for 16-bit MPY loop
42.1 Prime MIPS for 16-bit DIV loop
21.4 Prime MIPS for 32-bit ADD loop
30.8 Prime MIPS for 32-bit MPY loop
28.6 Prime MIPS for 32-bit DIV loop
57.1 Prime MIPS for 16-bit X=0 loop
44.4 Prime MIPS for 32-bit X=0 loop
37.5 average Prime MIPS
New:
Timing CPU, 20.0 ticks per second...
42.9 Prime MIPS for 16-bit ADD loop
53.3 Prime MIPS for 16-bit MPY loop
47.1 Prime MIPS for 16-bit DIV loop
24.0 Prime MIPS for 32-bit ADD loop
38.1 Prime MIPS for 32-bit MPY loop
32.0 Prime MIPS for 32-bit DIV loop
57.1 Prime MIPS for 16-bit X=0 loop
44.4 Prime MIPS for 32-bit X=0 loop
42.4 average Prime MIPS
reworked ring/register fix so that Primos nevers sees RP faulted
but we don't have to do extra tests in the fetch loop
changed EAxxx routines to use RP segno when EA = register
added FP exception fault to ieeepr8 and all FP routines
added round flag to ieeepr8 (though not sure it's rounding correctly)
used gcov info to reorder some stuff in ea16s, ea32s, ea32r64r
changed warn() and fatal() to use get16t; prevpc might be a register
IMPORTANT NOTE: to compile with -O0, also use -DNOREG (gcc bug)
changed get/put(16,32)r to check for ring change and use regular
get/put call if possible, so brp supercache can be used
FUTURE: could add separate brp cache entry for R0 accesses
changed ea32r64r live register test so normal path is first
changed ea64v live register test so normal path is first
change ea32i to use INCRP macro instead of RPL++
added 2 new rbp entries for FAR0/1; used in ldc/stc
changed IMA and IRS back to use get/put16t to make use of
supercache instead of always doing 1 mapva
changed apea() to use brp[RPBR] for AP fetch, UNBR for indirect
added -DNOMEM to remove -mem command option and testing
fixed bug in STTM causing sporadic CPU times (negative deltas)
Change tracing to show when supercache is used
Symbolic names for supercache entries
Added supercache entries for sector 0 and PB (different from RP)
Add invalidate_brp to invalidate supercache
Separate get16trap (WILL trap) from get16t (MIGHT trap)
Use supercache in get32 and various get/put routines
removed "char unmodified" from STLB; uses access[2] instead, to
avoid a multiply instruction in mapva (can use shift now)
use ea instead of pa when checking for page crossing in get32,
in preparation for read VA caching, like iget16 uses
changed gvp->prevppa from Prime memory offset to mem[] pointer
added inline to tch, tcr, adlr
added -DNOIDLE to make BDX use CPU cycles instead of sleeping
changed ea64v.h so ixy avoids branching (hot spot in Shark)
inlined and simplified iget16 instruction fetch
moved pio test to R-mode path
moved and simplified effective address calculation switch stmt
removed mode switch stmt for EA calcs, changed to cascaded if
moved iget16 static vars to gvp, for inlining
changed mapva and iget16 so that the normal path is predicted
added INCRP macro - now does 32-bit increments of RP for speed
added ADDRP macro to return RP incremented by n (CGT)
changed globals to static (didn't help speed much - thought it might)
moved around some functions
changed shift instructions to create bitmask at runtime (faster)
manually inlined mathexception (but used inline keyword in later revs)
changed get16/put16 to get16t/put16t where address trap might occur
this eliminates ea<0 test for all other non-trappable get16/put16 calls
changed crs & crsl to macros to reference a union vs 2 distinct variables
changed crs and RP to be register variables (regs.h)
fixed tape drive problems
for R-mode MPY, do the math THEN generate the exception
DIV exception handling was wrong, code was wrong too
a few devasr changes
changed devamlc to always turn off parity instead of flipping it
writes tracing to a buffered log file instead of stderr
added clear of first 32K of physical memory to master clear
changed all exit() calls to fatal() calls
more changes to the devmt tape driver