The conventional high level language (like Java, C, C#) programmers, diving into RTL programming needs to keep in mind, that though the programs are apparently written sequentially, when synthesized into silicon (or FPGA), they execute in parallel. So, to save power and speed up your designs get as much done in a clock cycle as possible. For example, use non-blocking assignment '<=' rather than blocking assignment '=', unless you actually intend to synthesize a sequential assignment logic.
So, to transition from high level application programming to RTL designing, we need to reprogram our minds, from the sequential thought process to parallel process thinking.
In many places of the OpenSPARC source the delay operator is used. Which has a syntax of :
#(rise)
#(rise,fall)
#(rise, fall, off) // off equals high impedance state Z, of the tri-state logic.
For example, see line 339 of ccx_arb.v :
dff #(5) d0_0 (
...
);
Here are some Verilog programming gems.
Dual Port RAM
Very simple and elegant 2 line code for dual port RAM, see:
$home/design/fpga/opencores/raminfr.v
Simple 32 bits wide 1K RAM :
reg [31:0] ram[0:1023];
Delay a signal by 4 clocks
$home/design/sys/iop/niu/rtl/lib.v
Line 154...
always @ (posedge clk)
begin
dly1 <= din;
dly2 <= dly1;
dly3 <= dly2;
dly4 <= dly3;
end
Divide a clock signal by 4
$home/design/sys/iop/niu/rtl/lib.v
Line 168...
input clk;
output clk4;
//...
reg [1:0] count;
//...
always @ (posedge clk)
if (hw_reset_clk_lead)
count <= 0;
else count <= count + 1;
assign clk4 = count[1];
The lib.v file has many more tips and tricks that one can learn from.