At least twice I have been bitten by studying documentation to understand the logic or the timing of a machine, when I didn't notice that the instance I was studying had been artificially slowed to produce a low end, low price point model.
The first time was when the T clock logic of my machine was first ready for debugging after implementing the machine according to the full set of ALDs online at bitsavers.org, from the machine named 1130C. The T clock steps through 280 nanosecond cycles from T0 to T7, which encompasses one storage access cycle of the 2.2 microsecond core memory (these timings are for the fastest of the 1130 models). Various conditions block or cause the T clock to advance, among them "extra" T7 cycles which are required in the 1130 when certain operations can't be completed by the end of the first T7 cycle.
Addition in an 1130 occurs iteratively, with a bitwise binary addition in a T cycle producing the carry bits from each bit as a new addend in the D register to be applied for another round of addition. Only when the D register is all zeroes, meaning that all carry values have rippled through the result, is the arithmetic operation complete. This is marked by dropping the signal "arithctl" when D is all zero. If we are at T7 and arithctl is still on, then instead of advancing to T0 as normally would happen, the clock stays at T7 for at least another 280ns cycle.
The logic analyzer showed the T clock advancing cleanly each 280 ns from T0 up to T7, but it was always taking several extended T7 cycles. I assumed that the timing of some signals was off, not arriving in time to allow the T clock to advance to T0, thus began carefully reviewing the timing of all relevant signals. I couldn't see any issue, yet the T clock was delayed in advancing to T0 for every storage access cycle.
Finally it dawned on me, while walking through the ALD pages related to the T clock, that this behavior I was seeing on the recreated 1130 was the intended behavior of the circuits. The ALD had some FFs forming a counter that would count away some 280 ns cycles before it triggered the advance of T clock to T0.
With the philosophy of this recreation stressing as near exact reproduction of the logic circuits gate by gate as possible, I had coded page after page of ALD diagrams into VHDL but hadn't analyzed the purpose and intent of every gate or signal. I found that I had implemented this delay counter because it was there on page KM212, labeled "T7 Extend". To my surprise, it turns out that machine 1130C is a model 4, a special model that runs slower than the others. While its brothers with the 3.6 microsecond core storage run with T0 following directly after T7, the model 4 runs at an speed of 4.5 microseconds because of the extra T7 cycles that are tossed in to waste 900 nanoseconds at the end of every storage cycle. This was the reason for the delay counter and the reason that my recreation was experiencing extra T7 cycles. Had I realized this up front, I would not have coded in the delay counter nor had any wasted T7 cycles to debug. I modified the VHDL to turn this into a full speed machine, no intentional wasted T7 cycles.
The second incident where an artificially slowed model caused me to spend hours of unnecessary time was with the emulation of the 1132 printer. I had to understand the mechanism inside if I were going to provide accurate timing simulation of the printer and reproduce its behavior faithfully. I had to emit various pulses and signals from the emulated printer to the device adapter logic of the 1130, and these had to be at the right time if the adapter logic were to work as intended and any printing would run at the same lines per minute as a real 1132.
This printer was built for the 1130 by taking the printing engine of the pre-computer punched card accounting machine, model 407, and wrapping a minimum of electronics around it for use by 1130. This kept costs down by leveraging and perhaps recycling mechanisms from the base of 407 machines that were being replaced by electronic computers.
At its heart is a cylinder of type rotating in front of the paper, hammers pushing the paper onto the wheel when the intended letter was rotated into position. To decide when to strike a hammer, the machine had a 'print disc' on the end of the cylinder that was read by photocells to emit timing pulses and the seven bit value of the letter that was just rotating into print position next. I had several timing diagrams of the printer which were guiding my emulation - I would put in delay counters to wait for timings based on the documentation, or emit pulses of durations given by the documentation at appropriate points. The emulator hardware sets a print disc rotating at the speed of the 1132, bringing each of the 48 characters than can be printed in the actual order they are arranged around the cylinder of a physical 1132 printer. Based on the rotation speed, the wheel moved from character position to the next position every 11+ milliseconds.
The way this was driven by programs was pretty byzantine. The printer would interrupt the 1130 once every 11+ milliseconds, which the program would respond to with an XIO Read to get the seven bit value of the upcoming character. The program then looked at the line it was printing, setting a 1 bit for every hammer position of the 120 columns that matched this one character. The bits were set in a fixed location, the scan buffer, which the 1132 printer would fetch with cycle stealing when it was time to actually print the character, firing the hammers to strike those columns where this letter was wanted. The program would then wait for the next interrupt, print the positions that contained the next character, and do this up to 48 times until all the character values in the print line had been printed.
One of the diagrams showed the interval between the pulses that caused the interrupt and the actual printing to be 22+ milliseconds. That implied that it would take two full rotations of the cylinder to print a line if it had all 48 character values in it. However, the rated speed of the printer, both for numeric only and general print lines, could never be attained in this case. Even all-numeric lines would involve more than one turn of the cylinder, because the type for the digits were adjacent on the cylinder. If the cylinder rotates to the next digit in 11 ms but it takes 22 ms to read and react, extra rotation was inevitable.
I was quite concerned about this because of the dichotomy between rated speed and the timing diagrams. I spend hours trying to imagine schemes that would still allow a printed line to complete with only one rotation of the cylinder.
In a chance conversation with a docent at the Computer History Museum, in front of one of their 1130 machines on display, I was relaying the problem I hit with the model 4 and its slow-down by wasted cycles. The docent who had quite a bit of 1130 experience in his earlier days mentioned several other places where IBM created slowed down, entry level priced models through delays like this. He happened to mention that slowed down models of the 1132 were offered - the light bulb went on! I had timing diagrams from a slowed down 1132, once again these were from the 1130C machine ALDs. I looked at the timing diagrams for the 1130B machine on bitsavers, whose ALDs were incomplete but did contain the timing diagrams, which gave me a correct timing diagram. Only 11+ ms from the pulse causing the interrupt until printing, not 22. The longer time was a delay built into the slowed printer model. That model would print at half the lines per minute of its normal breathren. Mystery solved and emulation design was easy from that point forward.
However, throughout the construction of this replica, I had to carefully check for missing logic or changes based on such machine specific details. Not solely slowdowns for entry level models, but also address lines and register bits eliminated if a machine had less core than the largest configurations - a 32KW machine needed 15 address lines but an 8KW model would have two of those lines and all the related flip flops eliminated to reduce costs. I had to think through every ALD page that touched on memory addresses to be sure that I had all the logic needed for full 32KW implementation.
As well, if a machine did not have a card reader installed, for example, then cost would be saved by deleting all the signals and circuits related to that device. Interrupt and cycle stealing logic in particular varied quite a bit based on such configuration issues. By comparing the 1130 B and C ALDs from bitsavers and portions of the ALD from the 1130 being restored at the National Museum of Computing, I identified and included logic related to several such options. Sometimes the timing of other signals needs to be delayed or generated earlier to suit a device - the Storage Access Channel (SAC) and the attached multiplexor channel (the 360 mux channel was leveraged by the 1130 to attach the 1403 printer, 2260 graphics stations, 2310 and 2311 disk drives and other such peripherals). I will need a full set of ALD pages for the SAC in order to properly support it - the bitsavers machines do not contain SAC support.
No comments:
Post a Comment