/blog/2022-04-25/ on ams1.josuah.net

	web version

	{josuah.net} \| {panoramix-labs.fr}
	• {josuah.net}
	• {panoramix-labs.fr}

	{git} \| {cv} \| {links} \| {quotes} \| {ascii} \| {tgtimes} \| {gopher} \| {mail}
	• {git}
	• {cv}
	• {links}
	• {quotes}
	• {ascii}
	• {tgtimes}
	• {gopher}
	• {mail}

	━━━━━━━━━━━━━━━━━━━━━━━━━━�…
	Wishbone B4: Standard or Pipelined?
	━━━━━━━━━━━━━━━━━━━━━━━━━━�…
	While writing {HDL} to teach a chip new tricks, it is best to avoid drowning in
	the complexity.

	The famous {divide and rule} helps: splitting the design in modules that, like
	a programming language function, reduce the scope of what is worked on, and
	hides the complexity for the parent module that calls them.

	But it quickly ends-up in an sea of many modules communicating in many
	different ways.

	Organising communication with a bus
	──────────────────────────�…
	Adding another layer of organisation becomes necessary: is using a {bus} that
	acts as a central spine for communication across the whole design.

	Multiple bus protocols are used, with {Wishbone} the simplest and most widely
	used one for open source cores.

	What flavor?
	────────────
	The Wishbone bus comes in multiple variants:

	• Use or not of an extra `CTI` signal: Classic or Registered Feedback;

	• Different timing constraints for `ACK`: Synchronous or Asynchronous;

	• Different meanings for `STB` and `CYC`: Standard or Pipelined;

	• Some extra optional signals.

	I suppose the aim was to offer the largest coverage of all use-cases, so that
	Wishbone to be used in a standard way for most situations.

	This large range of options also makes it harder to support every combination,
	some being incompatible together, and it seems common to use the most basic
	wishbone on every case.

	Left is to decide which combination is the simplest.

	Standard and Pipelined
	──────────────────────
	At first, I wanted to avoid the Pipelined mode, to keep it as simple as
	possible. But my opinion changed when having a look at how both worked:

	In Standard mode, when a master issue a request with `STB_O`, as long sa
	the slave did not send ready, it will keep `STB_O` high, until it sees an
	`ACK_I` held high by the slave. The `CYC_O` and `STB_O` are both set on the
	clock where ACK_I is received, and it is only on the next clock that it is
	possible to isue a new request.

	┊ ___ ___ ___ ___ ___
	┊ CLK_I __/ \___/ \___/ \___/ \___/ \__
	┊ _______________________
	┊ CYC_O __/ \______________
	┊ _______________________
	┊ STB_O __/ \______________
	┊ _______
	┊ ACK_I __________________/ \______________
	┊

	In Pipelined mode, a master issue a request by taking `STB_O` high, and
	instead of waiting for `ACK_I` to take it back low, it check `STALL_I`: if
	high, then it waits; if low, it considers the request queued by the slave, and
	may submit another one right away. In that case, the `ACK_I` only tells the
	master that a queued request has finished.

	┊ ___ ___ ___ ___ ___
	┊ CLK_I __/ \___/ \___/ \___/ \___/ \__
	┊ _______________________________
	┊ CYC_O __/ \______
	┊ _______________
	┊ STB_O __/ \______________________
	┊ _______
	┊ ACK_I __________________________/ \______
	┊ _______
	┊ STALL_I __/ \______________________________
	┊

	In both case, `CYC_O` stays up through the whole transaction, and `ACK_I`
	announces that the request is done.

	Other signals, such as data, read/write or address have been omitted for
	clarity.

	Standard uses one less signal
	──────────────────────────�…
	Implementing a Pipelined slave does not reveal to be more complex in practice:

	• If the slave is simple and gives single-clock answers, the extra `STALL_I`
	can be tied low (`STALL_I = 0`) and ignored.

	• If the slave has multiple cycles before taking a request, the `STALL_I`
	would have been used in Standard mode anyway, in the form of an internal
	`busy` register.

	Although, a Standard master is a bit simpler to implement, as it does not have
	to wait that the request is queued first, and then to wait again that the slave
	provides an answer, and instead only has to wait the `ACK_I`.

	Pipelined for better throughput
	──────────────────────────�…
	In the timing examples above, the slave takes 3 cycles to work on the request,
	and then sets the `ACK_I` signal.

	It seems to take one more clock cycle to operate, but the Pipelined mode still
	has a higher throughput: it is not necessary to wait that the result is
	available to submit a new request.

	This will only work if the slave is having a buffer, a FIFO to queue the
	incoming requests and work on them later.

	Pipelined as easy to implement as Standard
	──────────────────────────�…
	Having a Pipelined mode may seem more difficult to implement since it suggests
	that a complex queuing mechanism is to write for it, but a pipeline is entirely
	optional even in Pipelined mode.

	The only `ACK_I` needs to be shifted by one clock, which is done by using a
	register instead of a wire for it. This will add the delay needed, due to
	registers applying changes on the next clock.

	That way, it is still possible to write very simple modules that do everything
	in a single clock.

	Standard has a 1-clock better lattency
	──────────────────────────�…
	A single clock cycle is indeed consumed in Wishbone in its Pipelined mode. This
	could lead to an overall higher lattency, in particular if there are multiple
	Wishbone buses chained together.

	Pipelined may help with timing
	──────────────────────────�…
	If too complex operations are done in a single clock cycle, it may take too
	much time for all the signal to settle down and stablise until the next clock
	tick.

	A too long chain of logic and the timing constraint (the clock rate) might be
	missed.

	A long chain of logic might be broken down in two steps with registers, that
	let half of the steps be done before, and after the register, so that there is
	roughly half of the work to be one in a single clock tick.

	If Wishbone is used in Standard mode, the signals would have to propagate
	inside the master, then to the slave, then inside the slave, then back to the
	master, all of that in probably a single clock tick.

	Placing a register in the bus, by making `ACK_O` a register, permits to break
	the long chain form master to slave and back to master by introducing an
	intermediate step (register) for the signal to take a pause before going back
	to master, making sure it had time to settle down in the slave.

	That way, if the timings of the slave are fine with one master, it has better
	chances to be fine with any other master, since the timings of the slave and
	master do not sum-up anymore.

	Conclusion
	──────────
	While the Standard wishbone seems more frequently uesd, the Pipelined mode
	seems to be a bit more keen on timing, and most of the drawbacks like extra
	clock for ACK or extra signal, would likely also appear in the Standard mode.

	I am still new to Wishbone, and much curious about what you think about it:
	Which variant do you use? Anything that I would have missed for the Standard
	mode? `[email protected]`

	Among notable Pipelined mode users is {ZipCPU}.

	Update
	──────
	While looking at {this} ZipCPU article, it seems that its motivation for using
	Pipelined mode is expressed in these sentencse:

	Reminding the way logic gates may "solve maths":

	┊ One solution to sequencing operations is to create a giant state machine.
	┊ The reality, though, is that an FPGA tends to create all the logic for
	┊ every state at once, and then only select the correct answer at the end of
	┊ each clock tick. In this fashion, a state machine can be very much like the
	┊ simple ALU we've discussed.

	And the conclusion of what makes more sense:

	┊ On the other hand, if the FPGA is going to implement all of the logic for
	┊ the operation anyway, why not arrange each of those operations into a
	┊ sequence, where each stage does something useful? This approach rearranges
	┊ the algorithm into a pipeline.

	And its use of Wishbone is extensively explained in {https://raw.githubusercon…

	Links
	─────
	• {http://cdn.opencores.org/downloads/wbspec_b4.pdf#page=91}

	• {http://zipcpu.com/zipcpu/2017/05/29/simple-wishbone.html}

	• {https://zipcpu.com/blog/2017/08/14/strategies-for-pipelining.html}

	• {https://raw.githubusercontent.com/ZipCPU/zipcpu/master/doc/orconf.pdf}