muchen 牧辰

Circuit Timing II & Pipelining

Updated 2018-03-12

Timing Closure

Timing Closure is just a fancy word for ensuring that the design meets the timing constraints. So far this was not complicated for the simple labs that we have done.

To find out how fast our circuit can run, in Quartus, we can go to the report produced by TimeQuest Timing Analysis. The variable we’re looking for is Fmax, it tells us the maximum frequency our design can run at. This is based on register-to-register delays. It looks at all the FFs and figures out the delay in the critical path.

If multiple clocks exists in the design, then we have two entries for Fmax. This is because the analysis tool notices that there are two critical paths.

Timing Constraint

How do we tell the tool which frequency to use? We use a timing constraint, which specifies a desried clock frequency. The TimeQuest timing analysis wizard enables us to specify (in assignment menu -> timing analysis wizard).

If the constraint is not set, the tool set will try to set an unachieveable clock frequency like 1Ghz.

Effects on Optimization

How does the timing constraint relate to optimization?

Slow clock: leads to small circuit with more weighting on optimizing for logic utilization

Fast clock: leads to larger circuit with more things happening in parallel

Fundamentally, the architecture is all about trade offs. So choose a desired clock frequency and corresponding size of circuit based on requirements.

Example: optimization of lab 1 design

Scenario Fmax Area of circuit
No constraints 133 MHz 148
130MHz constraint 130 MHz 143
Timing constraint of 10 MHz 112 MHz 135

Case Study: Lab 4 KSA Module

page 10

Fmax is really slow and no clock works for this design.

Solution: Find longest path. The tool can display Worst-Case Timing Paths. We see that there are paths that have negative slack (meaning that its violating time constraints).

Click on “Report worse case path (In TimeQuest UI)” for the path.

We see that the delay could potentially be caused by the combinatorial divider. page 13

We go into the code and find that it could potentially be caused by the expression % 256 but we fix that and recompile and find out that nothing has changed. We see that the division is actually not occurring here. The division is happening elsewhere so we keep looking.

page 16

We find another i%3 but 3 is not base-2 and i is 32 bits, so the tool made a 32 bit divider. Fixing this, we solved the problem and Fmax increased.

One Fast Boiiii

It is possible to go faster than recommended Fmax since it’s only a conservative estimate. But no guarantee and is very dangerous. Operating conditions such as temperature will affect the timing.

Pipelining

If a critical path is too long to satisfy timing closure, then we an try pipelining, by cutting the critical path and connecting them with FFs.

For instance, if the critical path used to take 5ns, and we split into two paths: 2ns and 3ns, we have improved the Fmax from 200MHz to 333Mhz. However, now it takes 2 cycles to complete the cycle.

Because of this, pipelining would be helpful if:

This trades off latency with faster clock speed and throughput.

Note: we need to change the rest of the design to account for that fact there are more cycles needed to expect the output.

Limits

Cannot have too much pipelines because the of the setup time ad clock-to-q time is constrained physically. Having faster clock frequency will join the two signals together and there is no time for logic.

This is especially true for FPGAs since the chains of flip flop needs to route many logic blocks together and there will be more propagation delay.

Quartus will not pipeline the design automatically, but it will retime.

Retiming

Key of pipelining is to make every state balanced (in terms of delay).

page 31

Notice that the first path takes 2ns and the critical path takes 4ns. What we can do is put the inverter and relocate it in front of the FF, so each path takes 3ns. The reposition of the inverter does not affect the circuit.

But be careful if the output of the inverter is immediately used elsewhere, in which case, the logic in the second path needs to be adjusted.

Ultimately, retiming can speed up Fmax.

Limit of Retiming

page 34

Consider that we have some initial value and we want to perform retiming.

We move the combinational logic to the left and that outputs to a single output register. It is easy for the compiler to evaluate the output register.

However, this is much more difficult in reverse. So the rule of thumb is initial values for registers may limit opportunities for retiming.

Clock Skew

Clock will arrive later in some parts of the circuit that is limited by the physical properties. There is no guarantee that the clock edges that arrive at all flip flops happen simultaneously.

The implication of clock skew include

page 41

The clock period expression needs to be adjusted to account for clock delay:

\[t_{clock}\geq t_{clk-to-q}+t_{logic}+t_{setup}-D\]

In this case, since it takes longer for the critical path, having a clock skew delay actually improves the design because the logic in the critical path have more time.

page 43

If the clock arrives in a different direction, then we do not have the advantage.

\[t_{clock}\geq t_{clk-to-q}+t_{logic}+t_{setup}+D\]

This decreases our clock frequency.

Violations Due to Clock Skew

Clock skew can cause hold time violations if the clock skew delay is greater than the hold time. The hold time is not long enough, the clock is late (and expects a longer hold time), thus a wrong value is sampled.

page 46

The moral is to avoid clock skew at all times.

Clock Tree

If the components are hooked up such that the clock follows a clock tree such as a H-tree (fractal), given that the H-tree has a constant fractal level, then all components should have the same clock delay.

H tree
David Eppstein [Public domain], via Wikimedia Commons

In an FPGA, an Phase Locked Loop (PLL) is used.

PLL

The phased locked loop exist in modern FPGAs. It generates a mixed signal (analog and digital) circuit that generates output clock aligned to an input clock.

Motivation

Consider generating a 25 MHz clock from a 50 MHz clock. We use a counter that counts to 1 (NOT gate). Note that the flip flops has its own delays which makes two problems:

Operation

page 62

Glitches

A glitch is when a circuit changes its output quickly as the combinational logic is being computed quickly.

page 65

page 66

Glitches usually don’t matter if we’re not constantly observing the output. However, this matters if we care about energy use since it takes energy to perform transitions.