Major rework of the 1130 logic to make fpga operation fully clock sync

After experiencing an increasing rate of misbehaviors that are caused by operating the fpga in an async manner, something that is widely flagged by experienced engineers as problematic, I decided to alter my strategy a bit, developing a method to make the async IBM elements operate synchronously to the underlying fpga clock. As I added more logic to handle peripheral devices, the utilization of the fpga increased. This led to more signals being routed longer distances inside the fpga, which increased the chances that signals to drive logic may arrive at different times.

These race hazards are avoided by good sync design, because all the timing variations are completed and signal levels stabilized long before the end of a clock cycle, so that actions to be taken are shielded from instantaneous incorrect results that come from the logic gates during the times when only partial signals are seen. The glitches still happen, but any action such as a state transition of a FSM are only determined at the clock edge, long after the glitches have gone away. The glitch is going to happen just after the clock edge of one cycle, shortly after all the signals have arrived and the glitch is gone, while we are still in the midst of the clock cycle. By the next edge, no glitch any more.

The combinatorial logic - the ordinary nonclocked functions such as AND, OR and NOT - are where the glitches occur. As long as actions are only taken on clock edges and as long as the signals feeding combinatorial logic are stable long enough before the clock edge, we get good reliable behavior. For most combinatorial logic, they receive new signals from actions taken at the last clock edge, thus their settling begins at the beginning of a clock cycle. As long as cumulative delays of gate operation and signal routing don't get close to the length of the fpga cycle, everything is stable by the end of a cycle.

Some circuits are driven by external signals, whose changes are initially unrelated to the clock cycle of the fpga. These therefore could change very close to but before the clock edge, with the resulting glitches not yet  vanished when the clock tick happens. This is a second type of async that the designer has to handle, often by passing the signal through a series of clocked registers to align the change in the signal value to the clock of the fpga. Usually, two in a row is considered sufficient to deal with short term metastability if the external signal changes extremely close to the clock edge. The chances that the unstable output of the first register will produce metastable behavior in the second register is quite low, enough so that a third or further stage is not usually warranted.

An FPGA does not actually implement AND, OR, NOT or other basic logic gates to instantiate the combinatorial logic I write. Instead, fpgas are mainly composed of look up tables (LUTs) which the 'compiler' will set up to produce the results intended by the combinatorial logic it is replacing. The input signals for the targeted logic are used as the address into the LUT, and the output of that table for each combination of input values is the output of the targeted logic. As a simple example, if I coded a four input AND gate - output <= Ain and Bin and Cin and Din - the compiler would set up the four inputs as the bits of an address for the LUT, then put zeroes in all the locations in the table except for the one location that corresponds to the address 1111 which is the case where all four inputs are true. In that one location, a logical '1' is placed. Thus, when the signals arrive, they are addressing a cell in the LUT and the value of that cell is the output we see.

Now, if the signals don't arrive at exactly the same time, we can have temporary addresses that are incorrect. Imagine that we have values for the signals Ain = 1, Bin = 1, Cin = 0 and Din = 1 initially, but these are going to change to Ain = 1, Bin = 0, Cin = 1 and Din = 1. The output of our logic circuit should be 0 initially and stay 0 after the new inputs arrive since both states don't satisfy the AND logic equation. However, imagine that the signal for Cin arrives earlier than the signal for Bin. As the new value of Cin (1) arrives but the old value of Bin is still present (1), we temporarily have an address of 1111 for the LUT. That means the output will temporarily be 1 instead of 0. Thus, at a gross level we are changing the inputs from 1101 to 1011 and expect a steady 0 from the circuit, but due to delays we see the output 0 glitch up to a 1 for a short period then fall back to 0.

Each time I make a change to the logic and rerun the 'compiler', it assigns my logic to LUTs and routes signals. From run to run, the placement and routing are often very different, and this is true for areas totally unrelated to the logic portion I was changing that required this new run. Thus, one may do a run of the 'compiler' and find the fpga behaving badly compared to what you expect to happen, then do another run that yields an fpga working as intended.

Some parts of the 1130 design are triggered asynchronously - whenever the trigger signal is 1, the gate will operate. Since these are edge triggered flip flops in most cases, they see an edge and will flip. The almost immediate revocation of the trigger signal doesn't matter, the 'mousetrap' is already 'sprung' and needs a signal on a different input in order to be reset. Dropping the trigger condition just leaves the flipflop set.

My edge triggered flip flops had been designed purely asynchronously - built as a SR flipflop with some extra logic to enforce the IBM behavior, but changing when the combinatorial signals looped around and the set of gates stabilized in its new state, either on or off. This had no relationship to the fpga clock, thus signals produced by these gates could in turn be causing glitches in circuits they fed; if those glitches happened with an aynch gate like the edge triggered ones, or very near the fpga clock edge, then erratic results could ensue.

Another class of latch is used widely in the 1130 - an async latch composed of some AND, OR and NOT gates interconnected. They have a set and a reset line, but no connection to the fpga clock nor to any particular timing in the 1130. Whenever a reset signal arrives, the latch will turn off. Whenever a set signal arrives, the latch will turn on. If those set or reset signals are brief glitches, they change the gate just as well as if it were a long term input.

 Like the edge triggered flip flops, these are 'mousetrap' gates that are sprung by a wrong signal and require explicit resets even if the input signal goes to its intended state almost instantly. The logic gates in the fpga are LUTs, remember, and the lookup is extremely fast compared to IBM SLT gates. Thus, it doesn't take many picoseconds of a false address combination to drive the output from an unintended cell, thus to produce a possibly incorrect output.

The designer using fpgas expresses code as if it were sequences of standard logic gates being combined, the 'compiler' converts this to data for LUTs and links input signals to the address bits of the LUT. The designer does not see this, however, it takes conscious extra effort to unravel this to discover the LUT, addressing connections and data stored in the LUT. It is not something that is reported or directly available. One would need to be inspecting at the level of the bitstream loading the fpga and that is not well exposed by the vendor.

I had experiences as well where seemingly correct finite state machines (FSMs) would misbehave or get stuck in mystery state. Turns out these were also cases of glitches into the LUTs that encode the FSM and of the layout of the state variables that hold the current or next state of the FSM. I will document this more in another shorter post after this one.

The solution to all of these was to make the FSM transition decisions or the activation of gates, latches and flipflops occur only at fpga clock edges. For my Digilent board, it ticks once per 20 nanoseconds. I created an updated version of my edge triggered gates and of the pulse triggered flip flops. As long as I am careful to ensure that all combinatorial inputs are going to be stable before the clock tick, meaning that most must depend in part on signals that change only immediately after the clock tick, then I can avoid the glitch behavior of the fpga.

There are some complications - signals that in the previous version of the machine had to flow through several sequential gates that were edge triggered or had flip flops would now have multiple 20ns delays induced. Signals that might have arrived a bit after an fpga clock tick would still change the state of the pulse triggered flipflops or latches. Now, if a signal changes after a clock tick, it won't effect the latch or flipflop until the next clock tick.

Timing that worked, sloppily but worked, will now fail. In many places, the 1130 designers expected a signal to remain active a bit after an 1130 clock change, due to latency in the circuitry and their gates, thus would trigger a flipflop based on conditions X, Y and Z during cycle B. If condition Y is created only in cycle A, dropping at the start of cycle B, it would work okay in the 1130 due to latency but won't work in a clock sync fpga implementation.

The signal Y drops at the clock edge when the system is moving out of cycle A and into cycle B, right at the edge, and emits the signal indicating cycle B after the tick. The logic that matches signal Y and the signal representing cycle B will find they are like ships passing in the night, Y dropping before B shows up, thus in cycle B the condition Y is false. On an 1130, condition Y is still true for a few nanoseconds into cycle B, long enough to change a pulse triggered or edge triggered async gate.

I anticipate several dozen hours spent adjusting timing to ensure that a clock sync machine works as well as the async version worked, but of course with the added benefit of eliminating the sporadic glitchy annoyances. I will report on a few of the modifications I needed to make to accommodate the sync nature of my implementation.

No comments:

Post a Comment