ED@MIT-MC 10/24/79 00:42:03 To: (BUG ITS) at MIT-MC, (FILE [UCODE;UCODE BUGS]) at MIT-MC On MC, single-instruction proceed still loses if the instruction page faults on fetch. Doing 100/ jrst 2000 2000/ jrst 4000 ... 30000/ jrst 0 1000G 100>> JRST 2000  results in: 0>> 0 note that 0, being swapped IN, doesn't execute.  MOON@MIT-MC 05/23/78 22:05:31 In case you were wondering, the very sporadic lossage where XCT @(P) executes a random illegal instruction still happens. Probably XCT FDTB(CH) still loses also. Sigh.  MOON@MIT-MC 05/08/78 14:56:33 REVISION 10 FCO SUMMARY M8530-3 MAKE 'PHYS REF' ENABLE LOADING OF VMA<13-17> (HOORAY!) M8524-4 MAKE 'COND/INSTR ABORT' IOR 'TRAP CYC N' INTO 'TRAP REQ N' (HOORAY!) M8513-7 FIX CACHE PARITY ERRORS IN TOPS20 RELEASE 2 (HO HUM) M8560-2 SBUS TERMINATION PROBLEMS (ZZZ) M8558-3 DITTO (ZZZ)  MOON5@MIT-MC 03/10/77 20:26:58 Re: REVISION 9A CONTINUED ALSO, THE M8563 DMC BOARD IS MODIFIED. DMC5 LOC D5 CHANGES LOGIC FOR 1 BUS MODE READ-PAUSE-WRITE (I THINK).  MOON5@MIT-MC 03/10/77 13:51:34 Re: REVISION 9A (1) ON M8517 MB BOARD, BUFFER CCL CCW BUF WR (BIG DEAL) (2) ON M8552 DTE20 DPS5 LOC B5 PUT DPS5 TO10 I BIT DIRECTLY INTO 7432 INSTEAD OF THROUGH FLIP FLOP. (3) ON M8553 DTE20 CNT CHANGE AROUND TO11 BYTE COUNT LOGIC IN OBSCURE WAYS, ALSO DELETING SOME UNIBUS PARITY CRAP. (4) ON M8554 DTE20 INT2 LOC B4, CLEAR CLK RUN ON TIMEOUT. ALSO REPLACES THIS 74H74 WITH ANOTHER ONE. INT1 TOP CENTER AND RIGHT GENERATE MST CLR FROM PWR FAIL FROM UBUS AC LO. ALSO CHANGE SIGN OF INPUT TO 9601 AT B3 ON INT2. NO MICROCODE IMPLICATIONS. DOES THIS MAKE BYTE TRANSFER MODE WORK OR SOMETHING?  MOON5@MIT-MC 02/24/77 01:51:11 REVISION 9 (PUT IN TODAY) FIXES WRITING OF BAD PARITY INTO FAST MEMORY BY GLITCH AT PAGE FAIL (CLK SBR CALL).  MOON5@MIT-MC (Sent by ___002@MIT-MC) 02/24/77 01:15:22 INSTR AT PIBYTE+2 SHOULD HAVE MQ*4 NOT MQ*2 DOESN'T APPLY TO ONE-DTE20 MACHINE PROBABLY  MOON 1/28/77 tried the jpc code again. jumpn tt,(f) caused a page fault with code of 0 when it shouldn't have (page jumped to was set up rd only in map in core). theory is that this is caused by more than 6 ticks between jump fetch and nicond. can't see right now how to bum jpc code to be faster.  MOON 9/28/76 [1] One/proceed page-fault problem will be fixed in Revision 10, by changing the SCD board to IOR the trap cyc flags back into the trap req flags instead of jamming them. This is a kludge, but it looks like it will work. [2] Tonight teco has been bombing out in various ways. Once it looked like P was clobbered to 0, another time to a PC that had problem been POPJed back to. -- SEE NEXT, MAY BE FIXED  RMS@MIT-MC 09/17/76 22:16:12 Re: MC hardware lossage. To: MOON at MIT-MC An XCT FDTB(CH) with CH/ 123 executed something equivalent to 64240,,507665 instead of what was located at FDTB+123/ JRST FSET. There were about 60 words in the job that would address-compute to that value. FDTB 123= 27751. None of those words containing the suitable garbage were at addresses = 751 mod 1000, but 104751 was off by only 10 . -- THIS WAS AN I.T.S. BUG, FIXED. (PFA6++)  MOON@MIT-MC 08/19/76 04:51:44 Re: One proceed lossage on KL To: KLH at MIT-MC CC: [UCODE;UCODE BUGS] at MIT-MC, GLS at MIT-MC I've tracked this down. It happens whenever you one-proceed an instruction that jumps into (or drops into) a page that's not swapped in. There is this horrible design error in the machine where in this case the PC gets advanced to point to the instruction that lost, but the flags in LH(PC) [including one proceed] don't get advanced. So then the flags are one instruction behind, and you get "two proceed". You can probably get N-proceed by making a chain of jumps into successive swapped-out pages. I don't see any easy (or even moderately hard) way to fix this in the microcode without breaking something else. (A few months ago I put in a change which it turns out was only fixing one symptom of this problem.) We have some hardware changes due to be put in in a week or two that supposedly fix all outstanding problems; I'm going to try and find out if they fix this one or if not what can be done about it. -- SEE ABOVE, TO BE FIXED REV. 10.  MOON@MIT-MC 07/07/76 16:06:48 Re: HANGING AT PG4++ GETTING EBUS SET PI CYCLE SETS CON INT DISABLE. CON CLR PI CYCLE, WHICH IS HOW PI CYCLES THAT DO BLKO AND PNTR DOSN'T OVERFLOW GET CLEARED, FAILS TO CLEAR CON INT DISABLE. NORMALLY THE NICOND ON THE FETCH OF THE INTERRUPTED INSTRUCTION CLEARS IT, BUT IF THERE IS A PAGE FAIL ON THAT INSTRUCTION, IT DOESN'T GET CLOCKED. HOWEVER, PAGE FAILS MUSTN'T CLEAR CON INT DISABLE, ONLY PAGE FAULTS. IDEALLY, IT SHOULD BE CLEARED AT PFT, BUT # ISN'T AVAILABLE. I'LL ADD AN ADDITIONAL INSTRUCTION AT PFT. -- FIXED UCODE  7/1/76 UPDATED TO DEC VERSION 126 (RELEASE 3, REV 8) TRIED HACKING JPC TO AVOID FM PARITY ERRORS, BUT I CAN'T WIN. LOSES ON AN AOBJP THAT JUMPS AND PAGE FAILS. SIGH. JUST GOTTA GET THE HARDWARE FIXED.  6/9/76 JRST 12, CHOOSES WHETHER TO SET THE USER OR THE EXEC JPC BY THE RESTORED USER-MODE FLAG RATHER THAN BY THE ONE AT THE BEGINNING OF THE INSTRUCTION. SEE BRJMP. [FIXED 6/9/76] ^_ MOON@MIT-MC 06/09/76 03:48:51 TRIED FIXING THE ABORT INSTR LOSSAGE BY MAKING JRST 12, DO ARX_MEM BEFORE THE NICOND SO AS TO HAVE CONTROL OVER WHERE THE INSTR HAS TO GET ABORTED FROM. BUT SINCE THE FLAGS HAVE BEEN SET ALREADY WE BETTER SET THE PC TOO, BUT THAT BRINGS US RIGHT BACK TO THE SAME OLD LOSSAGE WITH ABORT INSTR. GRUMBLE! NEED SOME HACK HERE TO TOUCH THE FROB BEFORE SETTING THE FLAGS OR SOMETHING... [WILL TRY DOING IT WITH THE SR INSTEAD. 6/9/76] [COMPLETE WINNER, 6/10/76]  Date: 8 JUN 1976 0310-PDT From: Jeff Rubin (JBR @ SU-AI) Subject: FM Parity Mod To: moon @ MIT-MC In our machine there is an OR gate on the CON5 page producing CON FM WRITE PAR L. The input to this gate is on pin 9 and comes from E62(14) (CON FM WRITE 18-35 H). This is changed so that pin 9 comes from -CLK SBR CALL L on pin FN1, pin 10 comes from E62(15) (CON FM WRITE 18-35 L) and pin 11 comes from -CON CLK B L. The output is changed to E57(7). The gate is now drawn as an active low AND. In order to free up connector pin FN1, CON MBOX WAIT H is deleted from FN1 (was not used). The terminator that was on E57(9) remains there and terminates -CLK SBR CALL L. The terminator on the CTL1 page on -CLK SBR CALL L is deleted. Finally, a twisted pair is added from CTL1 4A36F1 (-CLK SBR CALL L) to CON5 4F35N1. I don't exactly see why they bothered to gate the signal with the clock since in our machine at least, it is already gated with the clock on the APR page at the input to the RAM. The real fix, I think, is that when the CLK page causes a CLK INSTR 1777, it causes the CRAM to thrash while the clock is low (i.e. CON FM WRITE PAR is enabled) and a glitch on the write parity signal gets through to the RAM. -------  Date: 5 JUN 1976 2236-PDT From: Jeff Rubin (JBR @ SU-AI) Subject: FM parity errors To: moon @ MIT-MC I installed a mod that Eggers told me over the phone and it seems to have cleared up our FM parity error problem. We have only run for about 1 1/2 hours since installing the mod but no parity error in that time. If you want I can send you a description of the mod, although I'm not sure what differences there are between our machines in the same area. -------  Date: 1 JUN 1976 2330-PDT From: Jeff Rubin (JBR @ SU-AI) Subject: KL10 ARX parity errors To: moon @ MIT-MC I noticed that your microcode ignores ARX parity errors. I recently discovered one cause of them and was wondering if you had discovered the same thing. Ours was caused by executing an instruction in the last location of a page when the mapping information for the next page is not yet in the page table ram and the A field of the D ram for the instruction is I-PF. It starts the prefetch and gets a page fault due to no PT DIR match. But the EBOX continues to execute microcode and doesn't do an MB WAIT right away. When the MBOX gets the page fault it sets CLK MBOX RESP which loads the ARX with a bad parity zero and also sets CON ARX LOADED. There is a term called CLK INSTR 1777 which attempts to hold off CON ARX LOADED, but it isn't set until the EBOX does an MB WAIT which causes CLK PAGE FAIL which eventually sets CLK INSTR 1777. I called Eggers and he said he hadn't heard of this bug before. Yes, indeed, we have made quite a bit of progress running the system on the KL. We are currently being hacked by some (prbably software) bug having to do with turning on cacheing for a job and by FM parity errors. Eggers told me of one way that he heard FM parity errors can be caused, but I haven't checked it out yet. ------- KLH@MIT-AI (___022) 06/04/76 03:00:16 PUSHJ P,FOO AND ^N PROCEEDED FOR TWO INSTRUICTIONS!!  MOON5@MIT-MC 06/04/76 00:01:29 ONE PROCEED DOES INFINITE PROCEED IF PAGE FAULT ON THE INSTRUCTION. THIS IS A HARDWARE BUG - PGF4+20 OR SO DOES "ABORT INSTR", WHICH RESETS THE TRAP FLAGS IN THE PC FROM THE TRAP CYCLE FLAGS. HOWEVER, IF THE PAGE FAULT WAS ON THE INSTRUCTION BEING FETCHED, THE DISP/NICOND FAILED TO CLOCK THE TRAP CYCLE FLAGS FROM THE PC FLAGS. THE TRAP REQUEST BITS IN THE PC FLAGS WERE STILL SET, BUT THE ABORT INSTR CLEARED THEM. ABORT INSTR SHOULD NOT BE DONE IF THE PAGE FAULT CAME FROM NICOND. I'M NOT SURE HOW ONE DETECTS THAT. ONE DETECTS THAT BY LOOKING AT THE "FETCH" BIT IN VMA. SKIPPING THE ABORT INSTR IS HARD TO DO BECAUSE PFT MIGHT DECIDE TO TAKE AN INTERRUPT INSTEAD. I TRIED TO UNABORT THE INSTR BY DOING A DISP/NICOND WITH 1111 IN LOW J, BUT THAT CLOBBERED THE PC TO THE ADDRESS OF THE PTW. THIS MAY REQUIRE A HARDWARE MOD TO FIX. [FIXED 6/9/76]  MOON@MIT-MC 05/30/76 16:10:30 Re: HANGING WAITING FOR EBUS THIS APPEARS TO BE DUE TO ATTEMPTING TO GRAB THE EBUS TO TURN OFF THE PI TO TAKE A PAGE FAULT WHILE CON INT DISABLE IS SET, BUT CON PI CYCLE IS NOT SET. THE PI BOARD THINKS IT'S OK TO INTERRUPT, BUT THE CON BOARD THINKS IT ISN'T, SO THE PI BOARD GRABS THE EBUS TO GIVE AN INTERRUPT, BUT THE CON BOARD WON'T ADMIT TO THE MICROCODE THAT THERE IS AN INTERRUPT, SO UCODE HANGS WAITING FOR THE EBUS TO BECOME AVAILABLE. AN EXTRA INSTRUCTION "CLR INTRPT INH" AT "PIFET" IN "IO" HAS BEEN PUT IN AND WILL PROBABLY FIX THIS. (I THINK IF YOU TAKE A DTE OR METER INTERRUPT, GO TO PIFET, PAGE FAULT ON THE FETCH OF THE INSTRUCTION INTERRUPTED OUT OF, AND TAKE ANOTHER INTERRUPT JUST THEN, WAS WHEN IT LOST.) [FIXED 5/30/76]  MOON@MIT-MC 05/30/76 16:06:56 Re: POP AC,AC LOSES FIXED 5/30/76.  MOON@MIT-MC 05/16/76 00:03:50 Re: MOVN CAUSES BAD PAGE FAILS FIXED BY KLUDGING MOVN, MOVNI TO BE DIFFERENT FROM MOVNM AND MOVNS AND SAVE A CYCLE (SIMILARLY FOR MOVM, MOVMI)  moon@MIT-MC 02/17/76 17:04:24 To: [UCODE;UCODE BUGS] at MIT-MC CC: MOON at MIT-MC When get no valid match on page fail, try clearing 8P instead of 4P so as to get both valid bits set. Jud thinks this may neutralize the paging bug. [Done, 5/9/76]  MOON5@MIT-MC 02/13/76 23:19:33 OCCASIONAL LOSS OF TRAPS [One proceed not setting %PSINH, lose if page fault, fixed 5/9/76]  MOON@MC 02/12/76 02:22:43 THE JPC FEATURE CAUSES FM PAR TO BE WRITTEN WRONG AT "AOJJPC:". PUTTING IN A TIME/3T MAKES IT HAPPEN GROSSLY OFTEN. I SUSPECT THE LOSS INVOLVES A PAGE FAIL ON THE ALREADY-STARTED FETCH OF THE NEXT INSTRUCTION. SEE (CON4) CON AR 36, WHICH SHOULD BE ZERO DURING WRITING INTO FAST MEMORY. WE'LL SEE WHAT HAPPENS IN REV 7. [Still loses, I think, 5/7/76]  MOON@MC 02/11/76 18:30:35 To: [UCODE;UCODE BUGS] at MC CC: BUG-ITS at MC CURRENTLY INSTALLED UCODE IS MISSING JPC FEATURE FOR SOME REASON. ALSO, CAN'T ONE-PROCEED THROUGH A UUO THAT SYSTEM RETURNS TO USER. HOWEVER, ONE PROCEEDING THROUGH AN LUUO WORKS. [Fixed]  MOON@MC 02/03/76 17:02:25 Re: PAGE LOSSAGE 1000-3000 LOSES 1000-3000-5000-7000 WINS 1000-3000-15000-17000 LOSES THE PROBLEM IS THAT PAGE FAIL HOLD COMES ON AND GENERATES PT DIR CLR, WHICH ENABLES BOTH VALID BIT RAMS (PAG3). ABOUT 10-20 NANOSEC LATER, - PT MATCH COMES UP AND PF CODE 01 IN H COMES ON. THIS GETS STROBED INTO THE EBUS REGISTER BY LOAD EBUS REG (CSH3), WHICH LASTS 40 NANOSEC STARTING ABOUT 10 NANOSECONDS AFTER PF CODE 01 IN H COMES UP. THIS DERIVES FROM PAGE FAIL T2, WHICH IS TOO LATE. AS FAR AS I CAN SEE, THE ONLY WAY TO FIX THIS IS THE ORIGINALLY- PROPOSED MOD TO THE PAG BOARD TO "AND" PAGE FAIL HOLD WITH CON KI10 PAGING MODE IN THE GENERATION OF PT DIR CLR. REV.7 DOESN'T CHANGE THE TIMING SUFFICIENTLY. DEC HAS BEEN INFORMED.  KNOWN BUGS/MISSING FEATURES IN ITS KL10 MICROCODE FAST MEMORY PARITY LOSSAGE IN JPC MICROCODE. I PUT A TIME/3T AT AOJJPC WHICH APPEARS TO HAVE FIXED IT (WILL HAVE TO WAIT AND SEE IF ANY MORE FM PAR LOSSAGE HAPPENS.) IF SO, A TIME/3T MAY HELP WITH THE NXT INSTR LOSSAGE, TOO. SEE MOON;CACHE LOSSAG. -- FIXED PAGE LOSSAGE - 1000/ JRST 2000 2000/JRST 1000 DOES PAGE REFILL EVERY 5 MICROSEC. CAN'T HAVE EVEN/ODD PAIR IN PT AT SAME TIME. - PT MATCH IS COMING UP. AS FAR AS I CAN TELL ALL THE RIGHT THINGS ARE GOING INTO THE PT DIR; UNFORTUNATELY NOT EVERYTHING COMES OUT TO THE BACKPLANE. FAILS ON BOTH OF OUR COPIES OF M8520. M.A.R. DOESN'T WORK REV.7 SHOULD FIX THIS BY MAKING 'PF EBOX HANDLE' WORK. ONE PROCEED WORKS BUT STILL UNSOLVED IS THE PROBLEM OF LOSING PDL OR ARITH OVERFLOW TRAP WHEN ONE-PROCEEDING. J.P.C. -- WHEN TURNED ON, SYS WON'T RUN APPARENTLY IT SCREWS UP WHEN IN USER MODE? 6 FEB 76 - SINGLE-JPC MODE WORKS! RING MODE HAS NOT BEEN TESTED. VECTOR INTERRUPTS OF THE BLKO VARIETY PICK UP THE JSR FROM 40+2*N INSTEAD OF LOC+1. THIS HAS TO BE FIXED TO MAKE THE IMPTERFACE WORK. --> THIS HAS BEEN FIXED BUT NOT TESTED. TURN ON UCODE STATE 01 AND UCODE STATE 03 DURING PAGE FAIL TROUBLE WITH IMULI ? HAS THIS BEEN FIXED?