False starts and research

I began building the 1130 design several times over the past months, building up from the heart, the basic clocking, registers and arithmetic unit. Each time I reached a point where it became clear that the approach I was taking would not work, I tossed out all the code and hunted for a new idea that might work. I believe there were four fairly substantial starts that I discarded before I came on the idea for the current approach.

The four failed tries all attempted to create a clock synchronous design appropriate for FPGA instantiation, largely because I believed it was the only way the machine would work. I had copies of the IBM ALDs, the logic diagrams of the 1130, thanks to bitsavers.org, which up until then I was using as a reference source to understand what behavior I had to create, but not as a model for logic I would design. 

Typical ALD page
That meant quite a bit of research and analysis, digging through all the manuals and thinking about what was needed to make the machine function properly. What conditions would require data to move between a register pair, when to suppress normal activities, when to recognize error conditions . . . the deep detail that is needed to complete the design of a working computer. Once I had access to ALDs, I had another way to conduct research and was getting a fairly complete picture of the conditions, signals and what had to occur in which cycle for any instruction or condition. The list of what I didn't understand or what didn't yet make sense kept shrinking. For example, there is a ballet of tricks and special logic needed to create the Program Load behavior that the machine uses to bootstrap from a single boot card to end with the disk based monitor system running on the machine. How to force the store of each column from the card reader or paper tape reader, when the machine was not executing instructions? How to start the machine executing the first 
instruction once the load of that first boot card was complete? You have to develop a very complete understanding of the interplay of signals and the normal operation of large parts of the machine before you can really understand what has to occur to do a Program Load. 

This all changed a few months ago, triggered by a conversation I had with a collector who owns an 1130 and an 1800 - neither operational - and who is an engineer and creative hobbyist. I bought a paper tape reader on ebay and decided to pick it up locally because the seller was nearby. Bob Rosenbloom was that seller, and offered to show me his extensive computer and technology collection when I picked up the reader. Somewhere in the conversation, when he was showing me the start of his all-relay computer project and I was talking about the 1130 project, I mentioned the core challenge of building an async design in FPGA. Bob didn't this as an impossible thing at all and chatted about a similar issue he overcame several years back. 

When I was back staring at the 1130 project notes, Bob's confidence that an async computer could be built successfully in an FPGA took hold. That was the first trigger. The second was my decision to implement the machine according to the original IBM design, attempting to build it as exactly to the ALD as possible. The third was the idea that I should build FPGA synchronous code to model any behaviors of the SLT logic that were not consistent with current logic, incompatible with FPGA or necessary. By encapsulating those behaviors in a virtual logic gate that I could combine with basic gates, I could build up logic that looked just like the ALDs and hopefully would behave the same way as well. 

That method is working beautifully, now that I have all of the 1130 and most of its peripheral adapters implemented, have built emulators for most of the peripherals and am well along building up the other machinery to complete the working system. I have been debugging the machine steadily, making use of a logic analyzer as well as extensive use of the Xilinx logic simulator running on my laptop. It seems to be executing all instructions as well as IO interrupts perfectly, but I need to finish my peripheral emulation to get to where I can boot DMS and run card decks before I can finish debugging. There is a limit to how much I can hand assemble and load into the machine - I put in the core of the extended diagnostics, using the listings in the Maintenance Diagnostic Manuals (MDMs) that are also on Bitsavers. 

I do need to relate a funny story about how well this is working and the dangers of using a single set of ALDs for a single 1130 machine.

The only complete set of ALDs on bitsavers is labeled 1130C. There is all but one volume of 1130B, but the missing volume is the crucial volume that contains most of the processor core. I was testing my machine, carefully implemented in my ALD-faithful approach, when I began chasing a defect. The 1130 steps through eight steps or cycles as part of one 'storage cycle' - the T clocks - stepping through T0 up to T7 in one storage cycle. The machine is designed to spend extra T7 cycles if doing variable length activities, such as addition or subtraction, which due to the 1130 approach will take more cycles for certain values of data than for others. My design was not moving directly from T7 of one cycle to T0 of the next, and I was struggling to figure out what I had done wrong to cause the behavior. Gradually it dawned on me that my machine was working exactly like the 1130C in real life would have operated, because that machine in real life was a special model, the slowed down model 4 that IBM sold as an entry price point. To justify the lower price point, the machine had some extra logic to spin its wheels a few cycles between T7 and the next T0, thus slowing down the performance. I had perfectly implemented that cycle wasting logic, because my ALD faithful, timing faithful machine was recreating an unusual slow model. Once I figured out what a normal machine would look like, such as the 1130B which was missing the pages that illustrated a full speed machine, I was able to get normal behavior for the machine I am recreating, a 2.2 microsecond storage cycle time 1130 with 32K of core.

The machine I have built is based on FPGA hardware, malleable hardware that is configured to produce a given hardware design with gates and wiring, the configuration stored on a read only memory that is loaded when power is applied. Make a change to the design, load a changed file to the read only memory, and the new hardware comes into being with a reset or power-on. The configuration is described in a language that circuit designers use to define and create products, with the two major choices being Verilog and VHDL. I chose VHDL and learned to build hardware using the language and the ISE tools from Xilinx ISE Webpack tools that compile the VHDL, load it onto the board and provide simulation facilities to test out designs.

FPGA chip - not the version used in my 1130
The board on which the 1130 is implemented uses a Xilinx Spartan 3E FPGA (datasheet) that provides 1.2 million gate equivalents, far more than the total gates used in an 1130 and enough to support all the peripherals emulation and other additional logic I am adding to make the recreation a usable toy. 

I will spend the next few posts talking about the unique behaviors and my little modules that stand in for logic gates in very place in the ALDs where the oddball behavior is needed. The only other changes I needed were to adjust a few spots where the machine is doing what IBM refers to as a "DC Reset" of a register. It is apparent that the response time of the SLT flip flop is s-l-o-w to such DC reset signals, thus the designers would be gating the movement of the current contents of a register to another spot while simultaneously doing a DC reset. Once I introduced a very minor delay (in terms of the FPGA clock which is 20ns compared to the IBM T cycles which are 280ns long), the current contents were copied successfully before the reset took effect. I expected many such timing issues that would need to be tracked down laboriously, but it was only a handful of adjustments that seemed necessary, everything else is operating very reliably.

It the spots where I introduced the delays, I see that IBM placed pairs of not gates, as they saw the need to introduce signal delays. Unfortunately, the FPGA design tools intelligently remove that, recognizing that in digital logic, two wrongs make a right - the final outcome is identical except for timing. Thus, the intended delay was removed. Even though the FPGA uses lookup tables instead of gates, if it did look up a NOT function twice it would have added some delay and might have done the trick, but my change does the delay explicitly in an encapsulated logic element I call a "delay".

No comments:

Post a Comment