Chasing a subtle timing bug while on the road

On my trip I hauled the Digilent Nexys2 board that is the core of my 1130 machine, as well as a Digilent Spartan-3 board I use as a logic analyzer to capture signals while debugging. I had prepared sections of the IBM 1130 processor diagnostics programs to more exhaustively test the machine for correct operation.

The section that validated the multiply instruction was highlighting minor discrepancies in the second word of the result - the high order 16 bits were correct but not so for the low 16 bits. I began to step through the multiply examples using the single step (SS) capability of the 1130. The multiply has an I1 cycle (T0 to T7), an E1 cycle (T0 to T7) and then up to 16 E2 cycles, depending on the data value. My laborious checking in SS mode mirrored the machine but both of us came to the correct answer, not the failing value I saw when running the machine normally or in other modes. The other modes include SMC (one of the instruction cycles such as E1) and SI (an entire instruction from I1 to its end), depending on how much I want to execute with each push of the Program Start button.

Each time I pressed the Program Start button, it would advance the T clock one step (actually it provides a Phase A state while depressed and the ensuing Phase B when released. If the machine is doing extended T7 cycles then it just executes one more T7, whereas in other cases it advances the T clock too. SI, SMC and regular run mode were coming to the wrong answer, but SS mode stepping got it right. This told me I had a timing issue.

Using SMC mode, I quickly discovered that the E1 cycle, which was supposed to set up conditions but not actually change values, was altering the value in the extension register (Q) from 5555 to 1555. That error propagated through the calculation to yield the wrong answer I was seeing. It failed doign the entire E1 in SMC mode but if I did each T clock step separately in SS, it worked right.

It took many conjectures, followed by routing of the signals to the logic analyzer and capture before I found the problem. In the E1 cycle, the value in the Q register is shifted to the right until the low order bit (Q15) is 1. For values such as our test case which already had a 1 in Q15, it was supposed to simply end E1 quietly and let the E2 cycles do the rest of the multiply.

However, even though my low bit was 1, it wasn't turning off the shift state in time, causing the E1 cycle to shift the value until the next bit containing 1 was in the low position. 5555 shifted once was 2AAA with a low 0, shifted once more became 1555 and once again Q15 had a 1 value. If it had properly done two shifts, it should have also recorded that by decrementing the CCC register twice, but it did not since this was an error.

SS mode allowed the shifting control signals time to reset, the machine did not shift the value 5555 at all, and everything was fine. In SMC, SI or full run mode, the machine began shifting before the shift control could switch off.

The flaw was in ALD pages KT101 and KT121, where the machine determines that this is a multiply, an E1 cycle and the value of Q15 is 1. I had attempted to reduce delays in the logic turning the Shift Ctrl FF on, but the effect was that it would now flip back on after being reset. I corrected the error in my implementation, but had to go back and validate that shifting operations still worked properly. With that proven out, I was able to drive on through further groups of tests, such as correct execution of divide instructions.

No comments:

Post a Comment