RTL DESIGN GUIDELINES

1.1 STYLE AND NAMING GUIDELINES

§ File name extension for each RTL source code file is “.v”

§ RTL Code should be well commented

WHY: Readability is required to maintain the RTL code. Comments force clear coding as well as transferability between designers.

§ Port functionality should be defined in comments in the port declaration section in each module

WHY: Readability is required to maintain the RTL code

§ Module names should be specific

WHY: To avoid conflict, module names should have a unique name. General names like “registers” and “control” are too generic. “c_registers” or “c_control” are value as they refer to specific control blocks within the column driver subsystem.

§ Each module must be named “x_yyyyy”.

Here, “x” is the identifying letter for the top level module and “yyyyy” is the module name itself. Example: the module “widget” of the dram controller might be named d_widget.

WHY: This is to distinguish in a standard manner between IP-provider design modules and Nuelight’s common, reusable design modules.

§ No complex expressions in port connections during module instantiation.

Only signals names should be connected to module ports. Do not insert logic expressions on inputs to modules. e.g. signal (value==32’0)

§ The allowable characters for any variable or module name has to be selected from the set [a-zA-Z0-9_]

All RTL code should be case-insensitive.

WHY: Some tools do not distinguish case, and some do. This will present complications in our flow.

§ All names must begin with a letter only from the set [a-zA-Z]

WHY: Some tools only allow letters at the beginning of names.

§ Signal names are active-high by default. Active-low signal names must have an “_n” suffix.

WHY: The active sense of the signal is clear in the name and therefore its use is guaranteed to be correct. In general, all signals should be active high, except in special cases, such as reset, where it makes more sense to be active low.

§ All signal names that exit a module must begin with their main module identifying letters and _

The letters identifying the main module, followed by an underscore, will be a prefix to every signal name. This ensures that each signal name is unique. e.g. a_test_sig_p

WHY: Hierarchical tools can get confused when different signals have the same signal name at different levels of the hierarchy.

§ Signal names shall not change when traversing the design hierarchy.

A signal name must remain constant when moving from one level of the design to another. The name must correspond to that given to the signal by the driving module.

WHY: This makes debug and design traversal easy.

ALLOWED VIOLATIONS:

A signal connected to a module that is used more than once, "generically" can receive a new value once traversing beyond the generic boundary.

Also, a signal connected to a “legacy” or third party piece of IP can change name at the module boundary to bring it into compliance.

§ There will be one and only one module per source code file

If there is a module "encoder" in the arbiter, then the module "a_encoder" and only this module shall be defined in the file a_encoder.v. No other module shall be defined in the same file.

WHY: This is for readability, as well as because Verilog_XL can not properly handle a violation of this rule.

§ The module name must match the filename

If there is a module "arbiter" in the arbiter, then the RTL file containing module

"a_arbiter" must be named a_arbiter.v and the structural verilog file containing the module shall be called a_arbiter.vg.

WHY: This is for readability, as well as simplicity in the design flow.

§ Make port connections explicit

Never take advantage of the implicit port connection features of verilog. All port connections must be explicit.

WHY: Implicit declarations are error prone.

§ ’define names must be unique to their module and named uniquely

Use of the ’define in Verilog RTL is encouraged to allow easy code maintenance and readability. However, the ’define should be limited to single modules. All ’define names should be named in uppercase and must begin with the module’s prefix and identifying letter.

WHY: Verilog-XL requires that ’defines in an entire design are unique and never redefined, even if to the same value. Synopsys requires that ’defines used during compilation of code be referenced in the module being compiled. It is possible that the future user might also want to create a ‘define inadvertently with the same name.

§ ’define names must be contained in the common defines file

All defines must be contained in a single file, ${ROOT}/rtl/include/<design_name>.h

§ parameter names should be in uppercase and contained locally in a module

Parameters for a given module should be defined only in that module and should be uppercase for code readability. If a parameter is required to be used in multiple modules, it should be a `define

WHY: Verilog-XL requires that ’defines in an entire design are unique and never redefined, even if to the same value.

§ Use of include files is prohibited

Use of the statement "‘include file.v" in a verilog file is prohibited.

WHY: All values relevant to a module must be found within that module for synthesis purposes. It is a Synopsys requirement – at least it used to be.

Exception: ${ROOT}/rtl/include/<design_name>.h and ${ROOT}/rtl/include/timescale.h

§ Use of "initial" statements is prohibited

When coding in RTL, it is prohibited to use "initial" statements to initialize logic.

WHY: The verilog RTL or structural code must represent what is possible in real silicon. There is no hardware equivalent of "initial".

§ Code readability is paramount

Code must be readable. It must be well commented and should not be cluttered. It should not have wraparound lines when viewed at a normal terminal width nor shall it wraparound when printed. .

WHY: This is critical to the ability to review and debug code. Code that is difficult to read is prone to bugs.

1.2 REGISTER RULES AND SEQUENTIAL LOGIC

§ The basic register element is the positive-edge-triggered flip-flop

§ Negative-edge-triggered flip flops are not allowed

Exceptions are allowed when needed to meet specific design challenges, but must be highlighted in the microarchitecture specification, and approved at a design review.

§ Avoid using asynchronous resettable flip flops

Asynchronous reset lines are susceptible to glitches caused by crosstalk when using 0.18um and smaller CMOS technology. Exceptions are allowed in special cases (initial central reset logic and clock generation logic). These exceptions should be documented in the module microarchitecture specification and approved at the module design review.

Such nets should undergo crosstalk analysis in the physical design phase.

§ All flip-flops in the design must be initialized by a synchronous reset using the flip flop’s synchronous reset

There can be no un-initialized control logic flip-flops in the design. A module reset must be generated synchronously to the clock and fed to the synchronous reset input of the flip flop.

WHY: No clock is required to be running to reset the flip flop. Reset is synchronized.

EXCEPTION: It is acceptable to leave datapath flops uninitialized in the design. Care must be taken to ensure that this cannot cause any problems in the control logic.

1.3 REGISTER ELEMENT CODING

§ All register elements that are to be synchronously cleared must be inferred via the following structure:

always @(posedge clk)

if (!rstb) q <= 1’b0;

else q <= d;

An assignment of d to q is the only one allowed. No logic equations except for the synchronous reset can appear on the right side of this assignment statement.

WHY: This results in correct sync-clear-D-flip-flop inferral in synthesis.

§ All datapath register elements that are not to be synchronously cleared (i.e. un-initialized or synchronously preset) must be inferred via the following structure:

always @(posedge clk)

q <= d;

An assignment of d to q is the only one allowed. No logic equations can appear on the right side of this assignment statement.

WHY: This results in correct D-flip-flop inferral in synthesis.

§ All output of register elements must be suffixed with the letter "q"

This must follow the module identifying letter in the signal name. e.g. x_widgetnet_q or x_widgetnet_qn (active low)

WHY: This will allow for simple parsing by scripts of the synthesis output files to detect unintentional flip-flop or latch inferrals.

§ All inputs of register elements must be suffixed with the letter "d"

This must follow the module identifying letter in the signal name.

WHY: This creates a consistent naming convention between the input and output of a register element that will aid in debug, as well as identifying critical paths. The d or q version of a signal will be more appropriately used in different circumstances, and will be reviewed as such.

ALLOWED VIOLATION: This rule only applies if the "d" signal is newly created. If this "d" signal is a signal from another module, only registered within this module, the rename is not necessary, as it will needlessly increase the size of the database from the simulation and slow down the simulation. In addition, if signals correspond to pipeline stages, the d=q convention must be dropped, as it is imperative that a signal name not incorrectly bear the name of a pipeline stage in which it is not valid.

Summary: any register element must be coded as:

assign x_yyyyy_d = some logic equation;

always @(posedge clk)

x_yyyyy_q <= x_yyyyy_d

where x is the module identifying letters, q or d identifies the signal attach pin, and yyyyy is the signal name. All elements of this register structure must be placed together in RTL such that the entire structure is easily read and understood.

§ Sequential Logic must always be partitioned into Combinatorial Logic and Flip-flops

This rule requires that storage elements or nodes be explicitly called out in the RTL and that all sequential logic be considered combinatorial logic combined with storage elements.

e.g.

//cominational part

always @ (variable…..)

begin

if (some condition) x_widget_d <= 1’b1;

else x_widget_d <= 1’b0;

end

//Register

always @ (posedge clock)

begin

if (!reset_n) x_widget_q <= 1’b0;

else x_widget_q <= x_widget_d;

end

WHY: This rule forces designers to think of the hardware aspect of their design currently coded only in RTL. In addition, this partitioning will assist synthesis. As stated above, the combinatorial and sequential elements comprising the sequential logic shall be placed close together in code so as to ease readability.

EXCEPTION: In cases of extremely simple blocks, such as counters, it is both more readable and more supportable to have the combinatorial and register logic in the same procedural block.

§ All assignments in a sequential procedural block must be non-blocking (<=).

WHY: Blocking assignments imply order, which may or may not be correctly duplicated in synthesized code.

§ No latches shall be used

WHY: Latches severely complicate the STA and are more difficult to test. They lead to pseudo-asynchronous designs. Implied or explicit instantiation of latches is illegal.

ALLOWED VIOLATION: Latches are required (transparent low) in the gated clock structures.

Latches are also necessary as data lockup latches between clock domains in scan chains.

1.4 COMBINATORIAL LOGIC RULES

§ No asynchronous loops or pulse generators are allowed.

Asynchronous designs practices like one-shots and pulse generators are not allowed.

WHY: This is not acceptable in ASIC design as structures like this are susceptible to post-layout discrepancies and duty cycle variations. In addition, the margins built into cell models to guarantee high yields and robustness can result in non-working silicon even if post-layout simulations pass.

ALLOWED VIOLATION: The only expected exception to this rule is when a write enable signal is required to an asynchronous RAM (see below).

§ No combinatorial feedback loops are allowed.

No cross-coupled latches may be created. The outputs of muxes may not be used as an input to the same mux. The use of any combinatorial feedback for storage purposes is not allowed. Combinatorial loops through memories are included in this error class and must not exist.

§ Instantiation of I/O buffers in the core logic is not allowed

WHY: An internal module must be consist of core-logic elements only.

§ Do not use partial decode logic

Partial decode logic decodes one group of like values in one section of logic and another group of like values in another section of logic. Determination of a complete decode is done by combining the two decodes. In a sense, one qualifies the other. An example would be a coprocessor branch instruction. One partial decode could be a jump or branch. Another decode could be that it is a coprocessor instruction. In determining if the instruction was a coprocessor branch instruction, the branch instruction decode and the coprocessor instruction decode would be combined.

WHY: It is very easy for designers years later who are maintaining the RIP to be unaware of a signal’s partial decode nature. As such, the qualifying signal is not added and incorrect operation results.

These types of situations generally require illegal values or odd values or odd sequences to detect and are not normally targeted by directed verification efforts. With random tests it is a statistical issue if these failing cases will be hit. The design must be clean BY DESIGN.

§ Combinatorial procedural blocks should be fully specified

If combinatorial procedural blocks do not fully specify all conditional branches, then latches will be inserted during synthesis to “hold” unspecified values of the logic. This can be avoided by fully specifying the procedure.

e.g BADa

always @ (,,,,)

begin

if (ajksjd) x <= 1’b1;

else if (kjsd) x <= 1’b0;

end

e.g. GOOD

always @ (….)

begin

if (ajksjd) x <= 1’b1;

else if (kjsd) x <= 1’b0;

else x <= 1’b0; // insert catch-all to prevent latches

end

§ No 3-state elements shall be used

Any 3-state bus must be removed from the design.

WHY: 3-state buses are more susceptible to testing problems, delay inaccuracies and exceptionally high loads.

1.5 CASE STATEMENTS

§ The RTL will be completely specified

No “implied” structure is allowed. For example, in a case statement, the default case must be included. However, no logical value can be assigned in the default case, as the default case is used for proper X and Z handling. For state machines, initialization and state transitions from unused states must be specified to prevent incorrect operation. All elements within a combinatorial always block will be specified in the sensitivity list.

WHY: Incompletely specified RTL will result in incorrect X handling, Z handling, RTL vs. structural simulation violations, inferrence of latches, etc.

§ CASEZ should be used for case statements with wildcard don’t cares, otherwise use of CASE is required; CASEX should never be used.

It is important to simplify the number of entries in a case statement for readability. Wildcard don’t cares are used for this purpose. However, once wildcards are introduced into case terms, it is important to make the "case" be a "casez" construct. The default case should then pass X to the output of the logic.

WHY: Don’t cares are not allowed in the "case" statement. Therefore casex or casez are required.

Casex will automatically match any x or z with anything in the case statement. Casez will only match z’s -- x’s require an absolute match.

§ Variables are only allowed in CASE/CASEZ match terms for one-hot encoding

Under generic conditions, only binary values are allowed for matching in case statements. In other words, the following is not allowed, unless Y and Z are mutually exclusive, as in the outputs of a one-hot encoder:

always @(Y or Z)

begin

case (1’b1)

Y: BUBBA <= 1’b1;

Z: BUBBA <= 1’b0;

default: BUBBA <= 1’bx;

endcase

end

The following is the only allowed case style in generic situations:

always @(Y or Z)

begin

case ({Y,Z})

2’b00: BUBBA <= 1’b1;

2’b01: BUBBA <= 1’b0;

2’b10: BUBBA <= 1’b1;

2’b11: BUBBA <= 1’b1;

default: BUBBA <= 1’bx;

endcase

end

WHY: Code style should be reviewable by inspection. It is not clear from the variables whether or not Y and Z are mutually exclusive. It is likely that not all cases are covered, unless Y and Z are known to be mutually exclusive, in which case this should be commented. In addition, use of //synopsys parallel_case is required here, because Synopsys may not understand that these signals are mutually exclusive.

§ Don’t use //synopsys full_case directive

WHY: If all other case rules are followed, this is not necessary, as all cases are defined. If this is used and all cases are not defined, it will hide the fact that all cases are not defined. Therefore, this must not be used, as it masks errors.

ALLOWED VIOLATIONS: There may be cases when Synopsys can’t determine that a case statement covers all cases. In this case, it is necessary to tell Synopsys that this really IS a full case. If this is the case, a comment stating such is required in the RTL code.

§ Use //synopsys paralle_case directive if no priority is implied in the case statement

WHY: Synthesis optimizations can improve area and speed of the logic if priority is not required.

1.6 FINITE STATE MACHINES

§ All state machines must either be initialized to a known state OR must self-clear from every state.

WHY: State machines cannot be trusted to power up in a known state, let alone the default or idle state. They should either be initialized at reset or every state should ultimately resolve into the state cycle.

§ Future state determination will depend only on registered state variables

Only registered state variables shall be used to determine future states.

WHY: Use of pre-registered state variables can cause long and/or false timing paths.

§ State machines should be coded with Case statements and parameters for state variable names

State machines should be easy to read and should be in the following form:

parameter IDLE=3’b000, ACTIVE=3;b001;

// Combinatorial part

always @ (state or xyz or abc)

begin

case (state)

IDLE:

if (xyz) next_state <= ACTIVE;

else next_state <= IDLE;

ACTIVE:

if (abc) next_state <= IDLE;

else next_state <= ACTIVE;

endcase

end

// flops

always @ (posedge clk)

begin

if (!reset_n) state <= IDLE;

else state <= next_state;

end

1.7 TIMING CLOSURE RULES

§ Code RTL with timing in mind.

It is important to visualize the levels of logic implied in the combinatorial logic code and consider whether or not this will meet timing requirements.

WHY: Without this, timing convergence will be difficult if not impossible.

§ Minimize Ping-Pong Signals

Do not design, with other modules, signals that combinatorially bounce from once module, back to another, then back again.

WHY: Ping-Pong signals create long layout-dependent timing paths.

1.8 CLOCKING RULES

§ Clock generation logic should be grouped into a single module

§ Clock names will be defined by their source module identifier letter plus the word “<name>_clock”.

To make clock identification simple and consistent for all tools, documentation etc, a common naming convention should be applied to each clock. The clock name should be prefixed by the source module letter identifier plus a unique name for the particular clock, followed by the word “_clock”.

e.g. x_main_clock, x_system_clock

§ The design must be fully synchronous and must use only the rising edge of the clock.

There must be only one clock in each clock domain, and only the rising edge of this clock is to be used for state changes.

WHY: This rule results in insensitivity to the clock duty cycle and simplifies Static Timing Analysis (STA).

ALLOWED VIOLATION: The falling edge is only allowed in the gated clock logic as specified by the GCK modules. Compliance to industry-standard interfaces may also require use of the falling edge of the clock.

§ Gated clocks must use the common lib_clock_freeze module

Gated clock signals must use the Re-usable IP Lib block, lib_clock_freeze.v wich can be imported from: “WHEREVERWEAREGOINGTOSTOREIT” This will ensure that functionally correct clock gating can be implemented and that automatic updates of the synthesis and primetime scripts occurs.

§ If using gated clocks, give the GCK modules unique and informational names

WHY: This will help later when tuning the drive strengths for these clock buffers.

1.9 RESET GENERATION RULES

These rules are in addition to the earlier initialization rules regarding state machines and flip-flops.

§ Reset generation logic should be grouped into a single module

§ Reset names will be defined by their source module identifier letter plus the word “<name>_reset”.

To make reset identification simple and consistent for all tools, documentation etc, a common naming convention should be applied to each clock. The clock name should be prefixed by the source module letter identifier plus a unique name for the particular clock, followed by the word “_reset”.

e.g. x_main_reset, x_system_reset

§ Resets entering a major module must be resynchronized using the lib_reset_sync module

When a reset signal enters a major module, it is best to synchronize it with a dual flop synchronizer. However, when the reset is asserted to the module, the synchronizer should be bypassed, but when de-asserted, it should be derived from the synchronizer. This can be implemented using the lib_reset_sync.v module located in “WHEREVERWEAREGOINGTOSTOREIT”. This guarantees timing and automatically gets incorporated into the synthesis and primetime scripts.

§ Module stand-alone test bench must drive X’s when input is not needed

WHY: In order to stress the proper tolerance of unknown values on input signals (and therefore the correct qualification of various signals or busses), stand-alone module test benches should drive X’s on unused inputs where appropriate.

1.10 SYNCHRONIZER RULES

These rules specify specific guidelines for the implementation and use of clock domain crossing synchronizers.

§ All clock domain boundaries should be handled using 2-stage synchronizers.

All single signal domain interfaces should instantiate the single bit synchronizer, lib_sync.v module located in “WHEREVERWEAREGOINGTOSTOREIT”. This guarantees timing and automatically gets incorporated into the synthesis and primetime scripts.

§ Asynchronous signals being sampled should be hazard-free

In general, any asynchronous signal should be flopped by the source clock domain prior to being sampled by the other clock domain. Any glitches caused by transient transitions can cause a false sampling.

§ Never synchronize a bus through 2 stage synchronizers

The sampling could capture transitional values that are not valid. Sending a bus from 1 clock domain to another requires special attention. Buses should be captured by implementing the multi-bit synchronizer, lib_bus_sync.v located in “WHEREVERWEAREGOINGTOSTOREIT. This guarantees timing and automatically gets incorporated into the synthesis and primetime scripts.

1.11 FIFO RULES

These rules specify specific guidelines for the implementation and use of both synchronous and asynchronous data FIFOs.

§ All synchronous FIFOs should be implemented using the lib_fifo.v module.

All synchronous FIFOs should be implemented using the parameterizable lib_fifo.v module located in “WHEREVERWEAREGOINGTOSTOREIT. This module allows the user to specify the width, depth and watermarks and automatically gets incorporated into the synthesis and primetime scripts.

WE NEED TO WRITE THESE FOR ALL TO USE

§ All asynchronous FIFOs should be implemented using the lib_afifo.v module.

All synchronous FIFOs should be implemented using the parameterizable lib_afifo.v module located in “WHEREVERWEAREGOINGTOSTOREIT This module allows the user to specify the width, depth and watermarks as well as handles all synchronization between the 2 clock domains. All timing characteristics automatically get incorporated into the synthesis and primetime scripts.

1.12 MEMORY BIST RULES

These rules specify specific guidelines for the implementation of memory BIST controllers.

§ All RAMs should be instantiated along with a RAM MBIST collar module, lib_mcollar.v.

All RAMs should be instantiated along with a RAM MBIST collar module, lib_mcollar.v located in “WHEREVERWEAREGOINGTOSTOREIT This module is parameterizable and allows the user to specify width, depth, byte-write width etc. These collars are controlled by a master BIST controller, lib_mbist.v.

1.13 SPARE GATE INSTANTIATION RULES

These rules are in addition to the earlier initialization rules regarding the insertion of spare gates in the design

§ Major block modules should insert a number of spare gate modules on each clock domain within the design.

A Nuelight standard spare gate module should be inserted as needed in each major module in the design. This module should be physically connected to the clock net for each clock domain.

Only 1 clock domain should be connected to any 1 spare gate module.

§ Spare gate modules should have a “spare_” prefix in the instance name.

The Nuelight spare gate module instance name should have a prefix of “spare_”. This is to easily identify all spare cells within the design for tools such as Design Compiler’s don’t_touch command.

1.14 SYNTHESIS-DRIVEN RULES

§ There should be no gate or behavioral code instantiated at the chip top level or core top level.

No synthesizable code or hand-instantiated gates should be placed in the very top level modules of the design.

§ The RTL code should be completely synthesizable

No code between “//synopsys translate off” and “//synopsys translate_on” pragmas is not allowed. Such structures are used sometimes for functionality monitoring and error detection during verification. These structures will be synthesized out of the netlist and will not be present during gate level simulations. Formal verification/rule-checking might violate this rule. In this case, some stringent rules about assertions should be followed.

§ Signals can only be driven by one procedural statement

If more than one assignment or always statement can change the value of a net, that net has multiple drivers. The synthesis will result in unwanted contention.

§ Buffer Trees should be built by the Place and Route tools

Clock, reset and scan enable buffer trees should all be built by the backend tools

WHY: The Physical Design Centers can process the design faster if the tree is left out of the netlist shipped to them. They can typically get better results in an Apollo/Saturn (Jupiter?) flow than netlist-inserted trees can provide.

§ Each net must be driven by one and only one cell.

If this condition occurs, the RTL code must be modified to correct the error.

WHY: Delay modeling does not correctly handle this case.

§ Leave extra margin on global/inter-module signals

WHY: Typically, these are the slowest and most unpredictable timing signals. If possible, make them timing insensitive.

§ Structural (gate) instantiation in the RTL code should be avoided or inserted using the “gate_” prefix in the instance name.

All explicit instantiations of foundry-specific gate or generic primitives should be avoided in the RTL code.

WHY: These gates will directly impact simulation durations and therefore are fundamentally op-posed to the RTL philosophy. In addition, they can artificially restrict synthesis from solving timing issues.

ALLOWED VIOLATIONS: For clock logic and possibly reset or test logic. If it is necessary to hand-instantiate gates in the design, these gates should be named with a “gate_” prefix in the instance name to allow easy identification for Synopsys don’t_touch commands.

§ Do not mix structural (gate) instantiation and RTL code in the same module.

WHY: this makes module synthesis easier. Often don’t_touch on gate instantiations can result in the intended gates being bypassed even though they are not removed. Exact behavior is guaranteed if the 2 are kept separate.

§ Tied inputs in modules using the “gate_” prefix should use TIEHI and TIELO (not 1’b1, 1’b0).

To protect internal logic from ESD, any gate-level modules that are instantiated in the RTL code should use TIEHI and TIELO for inputs tied to constants.

§ Signals must be defined only in non-dependent processes

A signal cannot be defined and assigned in a process in which it is also in the sensitivity list.

WHY: Such a violation would result in synthesis and simulation problems.

§ All leaf module inputs must be synchronous to the clock input for that module.

A leaf-level module is one of the potentially many end modules on a physical hierarchical branch.

In other words, it is the "physically-implemented" module, not the RTL module. This distinction is important when one module and all of its sub-modules are synthesized and floor-planned as one large module.

WHY: Synthesis and STA are more effective this way. Next-generation tools deal with this more effectively but still converge faster with this recommended partitioning.

§ Each leaf-level module will contain only one clock domain.

A leaf-level module is one of the potentially many end modules on a hierarchical branch. The logic in this module should only be clocked in one clock domain. Asynchronous interfaces should be confined to a module whose only function is to synchronize between the two domains.

WHY: Synthesis and STA are easier this way. Next-generation STA tools deal with this more effectively but still converge faster with this recommended partitioning.

§ No mixing of structural and behavioral code/glue logic is allowed.

A structural module, for instance containing instantiations of sub-modules, cannot contain any logic assignments. A behavioral module that is to be synthesized should contain no structural elements.

WHY: This is a clean partition for synthesis.

§ Design Boundary Elements should be standardized.

Inputs should be registered and/or outputs should be registered in any module that is being synthesized, where possible. Combinatorial logic both before and after sequential elements in a module is not recommended. Of course, if such a partition interferes with readable code, then code readability shall supercede.

WHY: Synthesis works better on an overall system when there are few combinatorial paths that cross pre-synthesized boundaries. Synthesis cannot optimize the entire path from flop to flop.

§ Design must be implementable in standard synthesis library with standard synthesis tools

WHY: The ability to do a basic implementation of any RIP is required. The design must be able to be synthesized without the use of special cells and without the use of special point tools. These tools and cells may be used to improve performance of a hard-macro version of the design, but they must not be REQUIRED for implementation of the design.

§ Design must be synthesizable by WHATEVERSYNTHESISTOOL (basic)

WHY: This is the golden standard of synthesis tools. Any structures not synthesizable by a basic Synopsys Design Compiler (need to review tools flow) run are not allowed as they are not easily re-used by other parties. No home-grown tools can be required in a basic implementation.

§ Datapath elements should be partitioned into separate hierarchy

WHY: For a basic design, simple synthesis of datapath elements may suffice. But for higher-performance designs, special optimizations may be required. It is a necessity that these elements that may be generated by a different tool flow for a high-performance design be easily swapped into the design. This necessitates this hierarchy requirement.

§ Fixed placement of cells as a design constraint prohibited

RIP meeting synthesizable criteria cannot require fixed placement of cells in actual layout.

WHY: Requiring fixed placement of cells indicates that the design contains absolutely no margin.

Synthesis tools cannot be constrained properly to understand the precise delays and will not produce good results for the next RIP user. The final user implementation should not have to be constrained by such rules either -- it increases the development time and is a detail not easily transferred from designer to user. The design should be more robust that this practice would indicate.

§ Constrain fanout in synthesis

Synthesis should be constrained to less than 16 (pins) and in most cases less than 8 (pins).

WHY: Constraining fanout will tend to limit the length of the wire and therefore the capacitive load.

It will result in a faster circuit. More importantly, it will keep the wire length in a region better modeled by wire load models and will assist in timing closure.

§ Do not use unwanted/special cells during synthesis.

Some of the library cells are use for special purposes (clock buffers…). Some are not compliant to the design guidelines (latches…). Some other cells (drive too low…) might not be used for some other reasons. In any case, when setting up the general synthesis scripts, any cells that should be used as to be tagged as such (through the “don’t_use” Synopsys attribute). Only a subset of the library should be used when synthesizing behavioral code.

§ Clocks are defined in single central file.

In order for all the designers to use an identical clock definition (frequency + margins), the clocks should be defined in a single location. That file should be called during all synthesis run.

§ Clocks should be defined as ideal.

The synthesis tool should not be used to generate a clock tree. This required physical information (like floor-planning) that is not available at this stage.

§ Add margin to the cycle time.

In order to increase the probability of meeting the design timing constraints during P&R, the synthesis should be run with some significant margin (20% faster).

§ Do not fix hold time during synthesis.

In general, libraries are built such that flop to flop logic does not have any hold time issues assuming ideal clocks. Hold time is therefore caused by clock skew after P&R and clock tree generation. The synthesis tool does not have the proper information to therefore fix any hold time problems that could occur in the physical design stage

§ Always verify log and report files.

Any errors and warnings occurring during synthesis should be checked for validity.

§ Constraints must be met (or almost).

Ridiculous constraints unnecessarily increase the synthesis run time. They also affect the correct synthesis of paths that could be met.

1.15 SIMULATION-DRIVEN RULES

§ Verification Suite must demonstrate 100% code coverage

WHY: Not doing so essentially guarantees that the design will contain bugs

§ Always include the timescale.h file in every RTL module

The timescale.h file is under WHEREVER

WHY: Time units should not be dependant on the order by which the files are compiled.

§ Avoid PLI.

PLI structures slow down significantly the simulation.

§ Hierarchical references to ports only.

When building test benches or monitors, the use of hierarchical references should be done to ports only. If done to net, there is no guarantee that the net will be present after synthesis or P&R, therefore complicating gate level simulations.

§ All hierarchical signals should not be used directly.

They should be defined in a central file. It makes it easier to modify for gate level simulation if necessary. Only one file needs to be modified in this case.

§ All monitors should be at the same level as the test bench.

No port connections are required in this case. It is also easy to find any monitors during debugging.

§ Monitor Verilog module names should finish with the extension “_mon.v”.

§ Monitor should be disabled (no files or messages) when not used.

This should speed up simulations.

§ All the “force” and “assign (supply1, supply0)” commands should be identified.

§ All tests should be fully commented

1.16 STA-DRIVEN RULES

§ Create a chip-level, inter-module timing budget spreadsheet.

Each module will have an IO timing constraint setup and hold parameter to adhere to. To ensure that no timing constraint budget errors occur, a master spreadsheet should be created that manages all inter-module delays.

§ Synchronous memories are preferred.

All memory interfaces, should be synchronous if possible.

WHY: This design practice is cleaner and less glitch-sensitive. It allows for easier STA.

ALLOWED VIOLATION: It may be that both a read and a write access are required during the same cycle. If this is true, an asynchronous memory may be required. In addition it may be that for the function required, only an asynchronous memory may be available. If so, write enables are required to be synchronous with the clock. It is recommended if possible that the write enable be gated with the low phase of the clock. It is therefore required that address and data on writes be setup to the falling edge of the clock.

§ No timing loops shall exist.

These will show up in STA. The RTL source code must be modified to remove these.

WHY: Timing loops make STA complex and potentially inaccurate. Generally designs that conform with all other rules will comply with this rule, although more complex timing loops may exist.

§ Avoid/minimize multicycle paths.

A multicycle path is one whose delay is longer than one cycle. This is not allowed. These must be eliminated in the RTL source code.

WHY: Multicycle paths can only pass STA with manual intervention, which is error-prone and requires Nuelight to document the multicycle and provide an exception file for STA when one exists within a RIP.

§ Avoid/minimize false timing paths.

A false timing path is a timing path that violates the timing requirements of the design, but will never be functionally sensitized. The RTL code must be modified to remove all false timing paths.

WHY: False paths can only pass STA with manual intervention, which is error-prone and requires Nuelight to document the false path and provide an exception file for STA when one exists within a RIP or bolt-on.

ALLOWED VIOLATION: In some instances, particularly with busses, one module may never talk to another, yet they both sit on a bus and appear to STA to communicate. It may be complicated to design such a condition away. However, if the bus is synchronous, this should be possible. In addition, it may be that the effort required to eliminate a false path in the design is more expensive than the declaration of the path as an exception. This is a design tradeoff. This must be identified and formally signed off in a design review.

§ No zero cycle paths are permitted

A zero cycle path is a timing path that propagates from one flop to another racing the propagation of the same clock edge. This is a race condition. The RTL code must be modified to remove all zero cycle timing paths.

WHY: Zero cycle paths can only pass STA with manual intervention which is error-prone and re-quires Nuelight to document the zero cycle path and provide an exception file for STA when one exists within a RIP or bolt-on.

§ The number of direct combinatorial feed through paths must be minimized.

WHY: This is to aid static timing analysis and allow for better characterization and “shellability”.

§ STA must be performed with identical results at a 50/50 +/- 10% duty cycle

STA should be performed on a sample or final netlist at 40/60, 50/50 and 60/40.

WHY: The weaker drive strength relative to wire load, and the increasing number of flops on a clock net, may clip the clock waveform to a +/- 10% duty cycle.

§ 150 picoseconds hold time margin requirement at best case conditions

STA should indicate that the netlist and (trial) layout will have 150 ps of hold time margin (above that built into the library) at best case process, +5% voltage and 0C. This rule will require use of robust flip-flops.

WHY: Extra hold time margin makes the design more reusable and covers variations in the signoff tool suite and libraries.

ALTERNATE VARIATION: In the event that -40C is the signoff condition instead, 100 ps margin is required.

§ 5% Setup Margin Required

STA should indicate that the netlist and (trial) layout will have a 5% setup time margin (above that built into the library) at the worst case process, -5% voltage and 125C. For example, a design that needs to run at 100 MHz would need to pass STA at 105 MHz.

WHY: Extra setup time is required to maintain the promised frequency over the lifetime of the design, covering variations in the signoff tool suite and libraries.

§ Scan mode timing an analysis should be run separately

For scan mode, the timing analysis should be run without the constraints (false paths, multi-cycle paths etc) that are used in the normal mode.

Extreme care should be taken to ensure that no SI hold time paths are set as false paths in the scan mode.

1.17 TEST-DRIVEN RULES

§ No reconvergent/redundant logic shall be used.

These will show up in ATPG as “redundant” faults. The RTL source code must be modified to remove these.

WHY: Redundant logic reduces fault coverage.

§ Interfaces to memories must have bypass logic and BIST Memory collars

The RAM test methodology for current designs is memory BIST implemented by a single central memory BIST controller and 1 paramaterizable Memory BIST Collar for each RAM instantiation. Each Memory BIST Collar will implement checking of the RAM under the control of the central memory BIST controller, and will also provide black-box bypassing functionality for scan mode.

The MBIST collar is named m_collar.v and is instantiated in the design with its instance name prefixed by both the module’s identifier letter, plus a module-specific identifier and the word “_collar”.

e.g. x_ram1_collar m_collar (….);

§ Synthesized logic should be tested through full scan methodology

§ Any complex inputs and outputs should be controllable/observable from primary input and output pins in test mode.

§ All clocks should be controllable during ATPG

§ All asynchronous resets (if any) should be disabled in scan mode

§ All scan basic control should be available from primary input and output pins.

§ Read/write access should be provided to all internal RAMs

§ Internal logic/special functions should be tolerant of unknown data on the design flip flops during scan in and scan out mode.

WHY: during scan in and scan out phases of scan patterns data on control pins to special functions such as RAMs, processors, sensors etc will change randomly. In most cases, this may not be an issue, however, care should be taken to ensure that the device does not enter an illegal or self-destruct state during the process.

Simply qualifying control signals with scan mode is one way to resolve this.

§ During reset, all pins should be tristated

§ Primary outputs should be connected to redundant flops for observability

§ Primary inputs from test modes should be driven from flops for controllability

§ ATPG must hit >99% fault coverage in RIP on a (trial) netlist

WHY: This is critical to lowering DPM for high-running customer components. It affects the Nuelight bottom line.

Home