Verification Martial Arts: A Verification Methodology Blog

Archive for the 'Optimization/Performance' Category

Avoiding Redundant Simulation Cycles with your UVM VIP based simulation with a Simple Save-Restore Strategy

Posted by paragg on 6th March 2014

In many verification environments, you reuse the same configuration cycles across different testcases. These cycles might involve writing and reading from different configuration and status registers, loading program memories, and other similar tasks to set up a DUT for the targeted stimulus. In many of these environments, the time taken during this configuration cycles are very long. Also, there is a lot of redundancy as the verification engineers have to run the same set of verified configuration cycles for different testcases leading to a loss in productivity. This is especially true for complex verification environments with multiple interfaces which require different components to be configured.

The Verilog language provides an option of saving the state of the design and the testbench at a particular point in time. You can restore the simulation to the same state and continue from there. This can be done by adding appropriate built in system calls from the Verilog code. VCS provides the same options from the Unified Command line Interpreter (UCLI).

However, it is not enough for you to restore simulation from the saved state. For different simulations, you may want to apply different random stimulus to the DUT. In the context of UVM, you would want to run different sequences from a saved state as show below.

In the above example apart from the last step which varies to large extent, the rest of the steps once established need no iterations.

Here we explain how to achieve the above strategy with the simple existing UBUS example available in the standard UVM installation. Simple changes are made in the environment to show what needs to be done to bring in this additional capability. Within the existing set of tests, the two namely, “test_read_modify_write” and “test_r8_w8_r4_w4”, differs only w.r.t the master sequence being executed – i.e. “read_modify_write_seq” and “r8_w8_r4_w4_seq” respectively.

Let’s say that we have a scenario where we would want to save a simulation once the reset_phase is done and then start executing different sequences post the reset_phase the restored simulations. To demonstrate a similar scenario through the UBUS tests, we introduced a delay in the reset_phase of the base test (in a real test, this may correspond to the PLL lock, DDR Initialization, Basic DUT Configuration).

The following snippet shows how the existing tests are modified to bring in the capability of running different tests in different ‘restored’ simulations.

As evident in the code we made two major modifications.

  • Shifted the setting of the phase default_sequence from the build phase to the start of the main phase.
  • Get the name of the sequence as an argument from the command-line and process the string appropriately in the code to execute the sequence on the relevant sequencer.

As you can see, the changes are kept to a minimum. With this, the above generic framework is ready to be simulated.  In VCS, one of the different ways, the save/restore flow can be enabled as follows.

Thus above strategy helps in optimal utilization of the compute resources with simple changes in your verification flow. Hope this was useful and you manage to easily make the changes in your verification environment to adopt this flow and avoid redundant simulation cycles.

Posted in Automation, Coding Style, Configuration, Creating tests, Customization, Optimization/Performance, Organization, Reuse, Stimulus Generation, SystemVerilog, Tutorial, UVM, Uncategorized, Verification Planning & Management | 1 Comment »

SNUG-2012 Verification Round Up – Language & Methodologies – I

Posted by paragg on 25th February 2013

As in the previous couple of years, last year’s SNUG – Synopsys User Group showcased an amazing number of useful user papers   leveraging the capabilities of the SystemVerilog language and verification methodologies centered on it.

I am always excited when I see this plethora of useful papers and I try to ensure that I set aside some time to go through all these user experiences.  Now, as we wait for SNUG, Silicon Valley to kick-start the SNUG events for this year, I would want to look back at some of the very interesting and useful paper from the different SNUGs of the year 2012.  Let me start with talking about a few papers in the area of the System Verilog language and SV methodologies.

Papers leveraging the SystemVerilog language and constructs

Hillel Miller of Freescale in the paper “Using covergroups and covergroup filters for effective functional coverageuncovers the mechanisms available for carving out the coverage goals. In the p1800-2012 of the SystemVerilog LRM, new constructs are provided just for doing this. The construct that is focused on is the “with” construct. The new construct provides the ability to carve out of a multidimensional range of possibilities for a sub-range of goals. This is very relevant in a “working” or under development setup that requires frequent reprioritization to meet tape-out goals.

The paperTaming Testbench Timing: Time’s Up for Clocking Block Confusionsby Jonathan Bromley, Kevin Johnston of Verilab, reviews the key features and purpose of clocking blocks and then examines why they continue to be a source of confusion and unexpected behavior for many verification engineers. Drawing from the authors’ project and mentoring experience, it highlights typical usage errors and how to avoid them. They clarify the internal behavior of clocking blocks to help engineers understand the reasons behind common problems, and show techniques that allow clocking blocks to be used productively and with confidence. Finally, they consider some areas that may cause portability problems across simulators and indicate how to avoid them.

Inference of latches and flops based on coding styles has always been a topic creates multiple viewpoints. There are other such scenarios of synthesis and simulation mismatches that one typically comes across. To address all such ambiguity, language developers have provided different constructs to provide for an explicit resolution based on the intent. To help us gain a deeper understanding of the topic, Don Mills of Microchip Technology Inc., presented the related concepts in the paper “Yet Another Latch and Gotchas Paper” @ SNUG Silicon Valley. This paper discusses and provides solutions to issues that designers using SystemVerilog for design come across, such as: Case expression issue for casez and casex, Latches generated when using unique case or priority case, SRFF coding style problems with synthesis, SystemVerilog 2009 new definition of logic

Gabi Glasser from Intel presented the paper “Utilizing SystemVerilog for Mixed-Signal Validation@ SNUG Israel, where he proposed a mechanism for simplifying analysis and increasing coverage for mixed signal simulations.  The method proposed here was to take advantage of SystemVerilog capabilities, which enables defining a hash (associative) array with unlimited size. During the simulation, vectors are created for required analog signals, allowing them to be analyzed within the testbench along or at the end of the simulation, without requiring saving these signals into a file. The flow change enables the ability to launch a large scale mixed signal regression while allowing an easier analysis of coverage data.

Design pattern is a general reusable solution to a commonly recurring problem within a given context. The benefit of using design patterns is clear: it gives a common language for designers when approaching a problem, and gives a set of tools, widely used, to solve issues as they come up.  The paper Design Patterns In Verification” by Guy Levenbroun of Qualcomm explores several common problems, which might rise, during the development of a testbench, and how we can use design patterns to solve these problems. The patterns are categorized majorly into following areas: creational (eg factory), structural (eg composite) and behavioral (eg template) are covered in the paper.

Arik Shmayovitsh, Avishay Tvila, Guy Lidor of Sigma Designs , in their paper “Truly reusable Testbench-to-RTL  connection for System Verilog , presents  a novel approach of  connecting the DUT and testbench using consistent semantics while  reusing the testbench. This is achieved by abstracting the connection layer of each testbench using the SystemVerilog ‘bind’ construct. This ensures that the only thing that is required to be done to reuse the testbench for a new DUT would be to identify the instance of the corresponding DUT.

In the paper, A Mechanism for Hierarchical Reuse of Interface Bindings, Thomas Zboril of Qualcomm (Canada) explores another method to instantiate SV interfaces, connect them to the DUT and wrap the virtual interfaces for use in the test environment. This method allows the reuse of all the code when the original block level DUT becomes a lower level instance  in a larger subsystem or chip. The method involves three key mechanisms: Hierarchical virtual interface wrappers, Novel approach of using hierarchical instantiation of SV interfaces, Another novel approach of automatic management of hierarchical references via SV macros (new)

Thinh Ngo & Sakar Jain of Freescale Semiconductor, in their paper, “100% Functional Coverage-Driven Verification Flow propose a coverage driven verification flow that can efficiently achieve 100% functional coverage during simulation. The flow targets varied functionality, focuses at transaction level, measures coverage during simulation, and fails a test if 100% of the expected coverage is not achieved. This flow maps stimulus coverage to functional coverage, with every stimulus transaction being associated with an event in the coverage model and vice versa. This association is derived from the DUT specification and/or the DUT model. Expected events generated along with stimulus transactions are compared against actual events triggered in the DUT. The comparison results are used to pass or fail the test. 100% functional coverage is achieved via 100% stimulus coverage. The flow enables every test with its targeted functionality to meet 100% functional coverage provided that it passes.

Papers on Verification Methodology

In the paper, Top-down vs. bottom-up verification methodology for complex ASICs, Paul Lungu & Zygmunt Pasturczyk of Ciena at Canada covers the simulation methodology used for two large ASICs requiring block level simulations. A top-down verification methodology was used for one of the ASICs while a larger version needed an expanded bottom-up approach using extended simulation capabilities. Some techniques and verification methods such as chaining of sub environments from block to top-level are highlighted  along with challenges and solutions found by the verification team. The paper presents a useful technique of  of passing a RAL (Register Abstraction Layer) mirror to the C models which are used as scoreboards in the environment. The paper also presents a method of generating stable clocks inside the “program” block.

In the paper,Integration of Legacy Verilog BFMs and VMM VIP in UVM using Abstract Classes by Santosh Sarma of Wipro Technologies(India) presents an alternative approach where Legacy BFMs written in Verilog and not implemented using ‘Classes’ are hooked up to higher level class based components to create a standard UVM VIP structure. The paper also discusses an approach where existing VMM Transactors that are tied to such Legacy BFMs can be reused inside the UVM VIP with the help of the VCS provided UVM-VMM Interoperability Library. The implementation makes use of abstract classes to define functions that invoke the BFM APIs. The abstract class is then concretized using derived classes which give the actual implementation of the functions in the abstract class. The concrete class is then bound to the Verilog instance of the BFM using the SystemVerilog “bind” concept. The concrete class handle is then used by the UVM VIP and the VMM Transactor to interact with the underlying Verilog BFM. Using this approach the UVM VIP can be made truly reusable by using run time binding of the Verilog BFM instance to the VIP instead of using hardcoded macro names or procedural calls.

A Unified Self-Check Infrastructure - A Standardized Approach for Creating the Self-Check Block of Any Verification Environmentby John Sotiropoulos, Matt Muresa , Massi Corba of Draper Laboratories Cambridge, MA, USA presents a structured approach for developing a centralized “Self-Check” block for a verification environment. The approach is flexible enough to work with various testbench architectures and is portable across different verification methodologies. Here, all of the design’s responses are encapsulated under a common base class, providing a single “Self-Check” interface for any checking that needs to be performed. This abstraction, combined with a single centralized scoreboard and a standardized set of components, provides the consistency needed for faster development and easier code maintenance. It expands the concept of ‘self-check’ to incorporate the white-box monitors (tracking internal DUT state changes etc.) and Temporal Models (reacting to wire changes) along-with traditional methodologies for enabling self-checking.

For VMM users looking at migrating to UVM, there is another paper from Courtney Schmitt of Analog Devices, Inc.Transitioning to UVM from VMMdiscusses the process of transitioning to a UVM based  environment from VMM Differences and parallels between the two verification methodologies are presented to show that updating to UVM is mostly a matter of getting acquainted with a new set of base classes. Topics include UVM phases, agents, TLM ports, configuration, sequences, and register models. Best practices and reference resources are highlighted to make the transition from VMM to UVM as painless as possible.

Posted in Announcements, Coverage, Metrics, Creating tests, Customization, Modeling, Optimization/Performance, Reuse, SystemVerilog, UVM, Uncategorized, VMM, VMM infrastructure | 3 Comments »

Using the VMM Performance Analyzer in a UVM Environment

Posted by Amit Sharma on 23rd August 2011

As a generic VMM package, the Performance Analyzer (PAN) is not based on nor requires specific shared resources, transactions or hardware structures. It can be used to collect statistical coverage metrics relating to the utilization of a specific shared resource. This package helps to measure and analyze many different performance aspects of a design. UVM doesn’t have a performance analyzer as a part of the base class library as of now. Given that the collection/tracking and analysis  of performance metrics of a design has become a key checkpoint in today’s verification, there is a lot of value in integrating the VMM Performance Analyzer in an UVM testbench. To demonstrate the same, we will use both VMM and UVM base classes in the same simulation.

Performance is analyzed based on user-defined atomic resource utilization called ‘tenures’. A tenure refers to any activity on a shared resource with a well-defined starting and ending point. A tenure is uniquely identified by an automatically-assigned identifier. We take the XBUS example in  $VCS_HOME/doc/examples/uvm_1.0/simple/xbus as a demo vehicle for the UVM environment.

Step 1: Defining data collection

Data is collected for each resource in a separate instance of the “vmm_perf_analyzer” class. These instances should be allocated in the build phase of the top level environment.

For example, in xbus_demo_tb.sv:

image

Step 2: Defining the tenure, and enable data collection

There must be one instance of the “vmm_perf_tenure” class for each operation that is performed on the  sharing resource. Tenures are associated with the instance of the “vmm_perf_analyzer” class that corresponds to the resource operated. In this case of the Xbus example, lets say we want to measure transcation throughput performance (i.e for the XBUS transfers).. This is how we will associate a tenure with the Xbus transaction. To denote the starting and ending of the tenure, we define two additional events in the XBUS Master Driver (started, ended). ‘started’ is triggered when the Driver obtains a transaction from the Sequencer, and ‘ended’ once the transaction is driven on the bus and the driver is about to indicate seq_item_port.item_done(rsp); At the same time,  ‘started’ is triggered, a callback is invoked to get the PAN to starting collecting statistics. Here is the relevant code.

image

Now, the Performance Analyzer  works on classes extended from vmm_data and uses the base class functionality for starting/stopping these tenures. Hence, the callback task which gets triggered at the appropriate points would have to have the functionality for converting the UVM transactions to a corresponding VMM one. This is how it is done.

Step 2.a: Creating the VMM counterpart of the XBUS Transfer Class

image

Step 2.b: Using the UVM Callback for starting/stopping data collection and calling the UVM -> VMM conversion routines appropriately.

image

The callback class needs to be associated with the driver as follows in the Top testbecnh (xbus_demo_tb)

image

Step 3: Generating the Reports..

In the report_ph of xbus_demo_tb, save, and write out the appropriate databases

image

Step 4. Run simulation , and analyze the reports for possible inefficiencies etc

Use -ntb_opts uvm-1.0+rvm +define+UVM_ON_TOP with VCS

Include vmm_perf.sv along with the new files in the included file list.  The following table shows the text report at the end of the simulation.

image

You can generate the SQL databases as well and typically you would be doing this across multiple simulations.. Once, you have done that, you can create your custom queries to the get the desired information out of the SQL database across your regression runs.  You can also analyze the results and generate the required graphs in Excel. Please see the following post : Analyzing results of the Performance Analyzer with Excel

So there you go,  the VMM Performance Performance Analyzer can fit in any verification environment you have.. So make sure that you leverage this package  to make the  RTL-level performance measurements that are needed to validate micro-architectural and architectural assumptions, as well as to tune the RTL for optimal performance.

Posted in Coverage, Metrics, Interoperability, Optimization/Performance, Performance Analyzer, VMM infrastructure, Verification Planning & Management | 6 Comments »

Performance appraisal time – Getting the analyzer to give more feedback

Posted by Amit Sharma on 28th January 2011

S. Prashanth, Verification & Design Engineer, LSI Logic

Performance appraisal time – Getting the analyzer to give more feedback

We wanted to use the VMM performance analyzer to analyze the performance of the bus matrix we are verifying. To begin with, we wanted these information while accessing a shared resources (slave memory).

· Throughput/Effective Bandwidth for each master in terms of Mbytes/sec

· Worst case latency for each master

· Initiator and Target information associated with every transaction

By default, the performance analyzer records the initiator id, target id, start time and end time of each tenure (associated with a corresponding transaction) in the SQL data base. In addition to the useful information provided by the Performance Analyzer, we needed the number of bytes transferred for each transaction to be dumped in the SQL data base. This was required for calculating throughput which in our case was the number of bytes transferred from the start time of the first tenure until the end time of the last tenure of a master. Given that we had a complex interconnect with 17 initiators, it was difficult for us to correlate an initiator id with their names. So we wanted to add initiator names as well in the SQL data base. Let’s see how these information can be added from the environment.

An earlier blog on performance analyzer “Performance and Statistical analysis from HDL simulations using the VMM Performance Analyzer”  provides useful information on how to use VMM performance analyzer in verification environment. Now, starting with that, let me outline the additional steps we took for getting the statistical analysis we desired

Step 1: Define the fields and their data types required to be added to the data base in a string (user_fields). i.e., “MasterName VARCHAR(255)” for initiator name and “NumBytes SMALLINT” for number of bytes. Provide this string to the performance analyzer instance during initialization.

class tb_env extends vmm_env;
vmm_sql_db_sqlite db; //Sqlite data base
vmm_perf_analyzer bus_perf;
string user_fields;
virtual function void build();
super.build();
db = new(“perf_data”); //Initializing the data base
user_fields = “MasterName VARCHAR(255), NumBytes SMALLINT”;
bus_perf = new(“BusPerfAnalyzer”, db, , , , user_fields);
endfunction
endclass

Step 2: When each transaction ends, get information about the initator name and the number of bytes transferred in a string variable (user_values) . Then provide the variable to the performance analyzer through the end_tenure() method.

fork begin

vmm_perf_tenure perf_tenure = new(initiator_id, target_id, txn);

string user_values;

bus_perf.start_tenure(perf_tenure);

txn.notify.wait_for(vmm_data::ENDED);

user_values = $psprintf(“%s, %0d”, initiator.get_object_name(), txn.get_num_bytes());

bus_perf.end_tenure(perf_tenure, user_values);

end

join_none




With this, the performance analyzer dumps the additional user information in an SQL data base. The blog “Analyzing results of Performance Analyzer with Excel”  explains how to extract information from the SQL database generated. Using the spreadsheet, we could create our own plots and ensure that  management has all the analysis it needs to provide the perfect appraisal.

Posted in Optimization/Performance, Performance Analyzer, Verification Planning & Management | 1 Comment »

Why do the SystemC guys use TLM-2.0?

Posted by John Aynsley on 29th April 2010

JohnAynsley John Aynsley, CTO, Doulos

Since this is the Verification Martial Arts blog, I have focused so far on features of VMM 1.2 itself. But some of you may be wondering why all the fuss about TLM-2.0 anyway? Why is TLM-2.0 used in the SystemC domain?

I guess I should first give a quick summary of how and why SystemC is used. That’s easy. SystemC is a C++ class library with an open-source implementation, and it is used as “glue” to stick together component models when building system-level simulations or software virtual platforms (explained below). SystemC has Verilog-like features such as modules, ports, processes, events, time, and concurrency, so it is conceivable that SystemC could be used in place of an HDL. Indeed, hardware synthesis from SystemC is a fast-growing area. However, the primary use case for SystemC today is to create wrappers for existing behavioral models, which could be plain C/C++, in order to bring them into a concurrent simulation environment.

A software virtual platform is a software model of a hardware platform used for application software development. Today, such platforms typically include multiple processor cores, on-chip busses, memories, and a range of digital and analog hardware peripherals. The virtual platform would typically include an instruction set simulator for each processor core, and transaction-level models for the various busses, memories and peripherals, many of which will be intellectual property (IP) reused from previous projects or bought in from an external supplier.

The SystemC TLM-2.0 standard is targeted at the integration of transaction-level component models around an on-chip communication mechanism, specifically a memory-mapped bus. When you gather component models from multiple sources you need them to play together, but at the transaction level, using SystemC alone is insufficient to ensure interoperability. There are just too many degrees of freedom when writing a SystemC communication wrapper to ensure that two models will talk to each other off-the-shelf. TLM-2.0 provides a standardized modeling interface between transaction-level models of components that communicate over a memory-mapped bus, such that any two TLM-2.0-compliant models can be made to talk to each other.

In order to fulfil its purpose, the primary focus of the SystemC TLM-2.0 standard is on speed and interoperability. Speed means being able to execute application software at as close to full speed as possible and TLM-2.0 sets very aggressive simulation performance goals in this respect. Interoperability means being able to integrate models from different sources with a minimum of engineering effort, and in the case of integrating models that use different bus protocols, to do so without sacrificing any simulation speed.

So finally back to VMM. It turns out that the specific features of TLM-2.0 used to achieve speed and interoperability do not exactly translate into the SystemVerilog verification environment, where the speed goals are less aggressive and there is not such a singular focus on memory-mapped bus modeling. But, as I described in a previous post on this blog, there are still significant benefits to be gained from using a standard transaction-level interface within VMM, both for its intrinsic benefits and in particular when it comes to interacting with virtual platforms that exploit the TLM-2.0 standard.

Posted in Interoperability, Optimization/Performance, SystemC/C/C++, Transaction Level Modeling (TLM) | 2 Comments »

Transactor generator with VMM technology for efficient usage of CPU resources.

Posted by Oded Edelstein on 5th April 2010

OdedEdelsteinPic

Oded Edelstein – Founder and CEO of SolidVer

Background:

Many network designs require an efficient transactor generator
to cover DUT functionality.

In a random test we would like to cover all scenarios,
but also to use the CPU mostly on cases which push the design to its edge.

In this VMM example, I will demonstrate 3 cases, and solutions for a better
usage of CPU resources.

Cases:

Case A – In network designs packet size can vary between 40 Bytes, for small packets
and 10KBytes, for large MTU packets.
A test which is based on the number of packets(Transactions), might be very short
or very long depends on the total size of packets. The long tests scenarios can
be covered in a separate random test.

Case B – Some network designs forward packets to different channels(queues) with
different levels of bandwidth support. Random generation of channel
number does not cover many cases (e.g. filling a certain queue with packets),
since the probability that the same channel will be chosen one after the other,
in a system with many channels is very low.

Case C – In some projects, the transactor generates many packets to all queues
while some queues are randomly configured to a low bandwidth. This causes the
test to be very long, until all packets are being forwarded.
At the beginning of the test, the DUT is very busy – almost every cycle, it gets data.
But after the high bandwidth queues got all packets, the low bandwidth queues
continue slowly to get packets, until all packets have been forwarded.
Now, most of the DUT queues are empty and the DUT is using only a small
portion of its performance ability. That has no added value for coverage.

Solutions:

The following code example, shows a simple solution for the above cases.
The solution is based on the following techniques:

1. Test length is defined based on sum of packets size, instead of the number of
packets(transactions).

2. Add the random test a basic case where a number of packets are send  to the same channel
one after the other.
Low probability cases need to be identified and added as case inside
the random test (cases inside directed tests are not good enough for a good coverage).

3. No packets are generated in advance for all queues.
Packets are randomly generated and driven, on the fly, only to available queues.
From my experience, random tests can generate all parameters, and a good coverage can be achieved.
At the same time, a much better coverage can be achieved if idle periods, which consume
CPU during the test are identified and handle correctly.

Code Example:

//——————————————————————————————————
//
//  packet.sv
//
//——————————————————————————————————
class packet extends vmm_data;

rand byte payload[];
rand int packet_size;
rand int channel_num;

static int cnt;

constraint c_payload_size { payload.size == packet_size; }

constraint c_pkt_size_dist { packet_size  dist { 40:= 20,
[41:200]:= 50,
[200:2000]:= 5,
[2000:10000]:= 1,
10000 := 1};}

`vmm_data_member_begin(packet)
`vmm_data_member_scalar_array(payload, DO_ALL)
`vmm_data_member_scalar(packet_size, DO_ALL)
`vmm_data_member_scalar(channel_numm, DO_ALL)
`vmm_data_member_end(packet)

endclass : packet

//—————————————————————————–
// VMM Macros – Channel and Atomic Generator
//—————————————————————————–
`vmm_channel(packet)
`vmm_atomic_gen(packet, “Packet atomic generator”)

//——————————————————————————————————
//  End file packet.sv
//——————————————————————————————————

//——————————————————————————————————
//
//  bfm_master.sv
//
//————————————————————————————————————

//
// SUM_PACKETS_SIZE_IN_TEST : The sum of packets size in bytes that the BFM will drive the DUT.
// We used sum of packets size, to define test length, instead of number of packets, since packet
// size distribution could randomly vary between 40 bytes (small packets) – 10KByes (Large MTU packets).
// This could cause for some seeds to be very long, with no significant added value for coverage.
// These long scenarios were tested in a separate random test.
//
`define SUM_PACKETS_SIZE_IN_TEST 10000000 // 10MB

//
// In this example The DUT gets a packet with a channel number.
// The DUT holds a sperate FIFO for every channel.
//
`define MAX_NUMBER_OF_CAHNNELS   16
class bfm_master extends vmm_xactor;

vmm_log log;

// Packet Transaction channels
//
packet_channel    packet_chan;

//
// The DUT will send the BFM back pressure signal, separately for every channel,
// when the channel FIFO, inside the DUT is full.
// avail_channel_list – Holds a bit for every channel, The BFM can drive packets only on channels
//                      which are not back pressured.
//
bit [(`MAX_NUMBER_OF_CAHNNELS-1):0] avail_channel_list;
int done = 0;

extern function new (string instance,
integer stream_id,
packet_channel packet_chan);
extern function int generate_stream_size();
extern function int generate_avail_channel();

extern virtual task main();
extern virtual task drive_packet(packet packet_trans);

endclass: bfm_master

function bfm_master::new(string instance,
integer stream_id,
packet_channel packet_chan);

super.new(“BFM MASTER”, instance, stream_id);
log = new(“BFM MASTER”, “BFM MASTER”);
if (packet_chan == null) packet_chan = new(“BFM MASTER INPUT CHANNEL”, instance);
this.packet_chan = packet_chan;

endfunction: new

//—————————————————————————–
// main() – Main task for driving packets.
//—————————————————————————–

task bfm_master::drive_packet(packet packet_trans);
// drive the packet …
endtask: drive_packet

function int bfm_master::generate_avail_channel();
….
endfunction

function int bfm_master::generate_stream_size();
….
endfunction
task bfm_master::main();

//
// The sum of packets that the BFM will drive the DUT. when this value is
// above SUM_PACKETS_SIZE_IN_TEST the BFM will stop driving packets
//
int sum_packet_data_sent = 0;

//
// The channel on which the DUT will drive packets. This channel should not be back pressured
// while driving packets
//
int channel_num;

//
// packet_stream_size
// The number of packets that will be sent one after the other to the same channel.
// The idea behind this variable is to get a good coverage for cases where number
// of packets are sent to the same channel one after the other, to quickly fill the FIFO.
// Otherwise the BFM will generate statistically every time, a different
// channel. The probability that the same channel will be generated one after the other
// is very low. e.g. Statistically the probability that the same channel will be
// generated 5, 10, or 20 times, one after the other, is 16 power 5, 16 power 10 or
// 16 power 20, which is a very low probability.
//

int packet_stream_size;

// Counter for the number of packets that were driven in the test.
//
int packet_cnt = 0;
int i;
packet packet_trans;
super.main();
while(sum_packet_data_sent < `SUM_PACKETS_SIZE_IN_TEST) begin
// gen random channel from avail_channel_list;
channel_num = generate_avail_channel();
packet_stream_size =  generate_stream_size();

for(i = 0; i < packet_stream_size; i++ ) begin
this.wait_if_stopped_or_empty(this.packet_chan);
if(avail_channel_list[channel_num] == 1) begin
packet_chan.get(packet_trans);
packet_trans.channel_num = channel_num;
drive_packet(packet_trans);
sum_packet_data_sent = sum_packet_data_sent + packet_trans.packet_size;
packet_cnt++;
`vmm_note(log, $psprintf(“drive packet = %0d  size = %0d  channel = %0d  stream index = %0d  sum = %0d “,
packet_cnt, packet_trans.packet_size, channel_num, i, sum_packet_data_sent));
end
else begin
break;
end
end // for loop
end// end while loop
done = 1;

endtask: main

//——————————————————————————————————
//
//  End file bfm_master.sv
//
//——————————————————————————————————

Posted in Automation, Modeling, Optimization/Performance | 1 Comment »

Great article on managing complex constraints

Posted by Janick Bergeron on 12th March 2010

A two-part articles by Cisco and Synopsys engineers in IC Design and Verification Journal explains how complex constraints can be better managed to simplify the solving process, yet obtain high-quality results. Part1 deals with solutions spaces and constraint partitions. Part2 introduces the concept of soft constraint in e and default constraints in OpenVera.

You can read part1 and part2 here.

Posted in Debug, Modeling Transactions, Optimization/Performance, Stimulus Generation | Comments Off

Hitting the “Playback” button on VMM transactions

Posted by Avinash Agrawal on 11th December 2009

Avinash Agrawal

Avinash Agrawal, Corporate Applications, Synopsys




Often verification engineers face the challenge of recording transactions in one simulation, and wanting to replay the same set of transactions in the same sequence of transactions in a different simulation, and try different ways to make this happen.

Well, there is some good news !!!

VMM provides a facility where you can record the transactions going through a VMM channel and save it into a file. This can be done through the record method of the VMM channel. Later, for replay, disconnect the producer of the channel (may be generator/transactor, etc which sends transactions to the channel) and use playback method to load the channel with the transactions from the file in the same order.

Here the saved transaction file acts as a virtual producer. This way random stability is guaranteed. Note that byte_pack() or save() method of transaction (vmm_data) class must be implemented to use record mechanism and byte_unpack() or load() method of transaction (vmm_data) class must be implemented to use playback mechanism. Since playback avoids randomization of the transaction/corresponding scenarios, performance can be improved in case of complex transaction/scenario constraints. Also generation is not scheduling-dependent and will work with different versions of the simulator as well as with different simulators.

This record/replay mechanism can also be used to go through known states at one interface while stressing another interface with random scenarios within the same simulation itself.

The code below snippet shows how to use VMM record/playback.

task start();

`ifdef RECORD_MODE
    fork
      chan.record("Chan.dat"); //call record method of the channel
                               //with a filename
    join_none
    gen.start_xactor();       //start the generator (producer) of the  channel
                              // if it is record mode.
 `endif
 `ifdef PLAYBACK_MODE    // In playback mode, make sure that the
                         // generator (producer) is not started.
     fork begin
         bit success;
         trans tr = new;
         chan.playback(success, "Chan.dat", tr);  // call playback method of
                                               // the channel with the same file
         if (!success) begin
            `vmm_error(log, "Playback mode failed for channel");
         end
     end join_none
`endif

endtask

Posted in Optimization/Performance, Reuse | Comments Off

Performance and statistical analysis from HDL simulations using the VMM Performance Analyzer

Posted by Badri Gopalan on 30th April 2009

Badri Gopalan, Principal Engineer, Synopsys

There are several situations where RTL-level performance measurements are needed to validate micro-architectural and architectural assumptions, as well as to tune the RTL for optimal performance.

A few examples of these low-level measurement requirements are:

· Throughput, latency and effects of configuration parameters of a memory or DMA controller under different traffic scenarios

· The statistical distribution from a prediction scheme for various workloads

· Latency of a complex multi-level arbiter under different conditions

· End-to-end latency, throughput, QoS of a network switch for various types of data, control traffic

The VMM Performance Analyzer application provides a flexible and powerful framework to collect, analyze and visualize such performance statistics from their design and verification environments. It consists of a few base classes which allows you to define the performance metrics to be tracked, and collect run-time information for these performance metrics into tables in an SQL database. The data can be collected over multiple simulation runs. You then interact with the database with your favorite database analysis tool or spreadsheet. The SQL language itself offers simple yet powerful data query capabilities, which can be run either interactively or scripted for batch-mode. Alternatively, you can load the data into a spreadsheet and perform your analysis and visualization there.

At a conceptual level, you first identify the different atomic performance samples to be collected for analysis. These are referred to as “tenures”. For example, a memory transaction on a bus (from a specific master to a slave) is a tenure. The VMM-PA does the work of assigning an ID, collecting and tracking attributes such as the start time, end time, initiator and target IDs, and other associated information (suspended states, aborts, completions etc.,) as rows in a table. Each table corresponds to an instance of vmm_perf_analyzer object. You can (and probably will) have multiple tables (and thus multiple instances of the vmm_perf_analyzer object) in your simulation, dumping performance data into the database.

Here is a code snippet which illustrates the process of creating a vmm_perf_tenure tenure (a row in a table), a vmm_perf_analyzer table, and a vmm_sql_db_ascii (or vmm_sql_db_sqlite) database (a collection of tables), with some explanations following the code:

1. class my_env extends vmm_env;

2.    vmm_sql_db_sqlite db;                     //the database itself

3. vmm_perf_analyzer mem_perf_an; //One table in the database

4.    virtual function void build;

5.       super.build();

6. this.db = new(“perf_data.db”); //”perf_data.db” created on disk

7. this.mem_perf_an = new(“Mem_performance”, this.db);

8.    endfunction: build

9. endclass: my_env

10.

11. //Now, start a thread which will dump performance data to

12. // Mem_performance table in the database. Any event can be used to

13. // start or terminate tenures: it is left to the user control

14. initial begin

15. vmm_perf_tenure mem_perf_tenure = new();

16.    forever begin: mem_perf_thread

17.       this.mem_mon.notify.wait_for(mem_monitor::STARTED);

18. this.mem_perf_an.start_tenure(mem_perf_tenure);

19.       this.mem_mon.notify.wait_for(mem_monitor::ENDED);

20. this.mem_perf_an.end_tenure(mem_perf_tenure);

21.    end: mem_perf_thread

22. end

23. virtual task my_env::report;            //report is part of the environment class.. only the PA relevant code is presented

24. this.mem_perf_an.save_db(); //write any buffered data to disk

25. this.mem_perf_an.report();    //simple pre-defined performance report

26. endtask: report

· In lines 2 and 7, the SQLite database is created using vmm_sql_db_sqlite base class. You could create a different flavor of database, for instance, a plain text database, in which case, a list of SQL commands is created, which could then be replayed on your SQL data engine of choice. (See the Reference guide for more details). Typically you have one SQL database per test, however, you certainly can open multiple databases in the same test.

· In lines 3 and 8, a table in the database is created using vmm_perf_analyzer base class, which will help track statistics related to resource, in this case, a memory interface. Typically you will have multiple tables in a test, which correspond to tracking of statistics of multiple resources in the DUT or environment. This would correspond to multiple instances of the vmm_perf_analyzer base class.

· In lines 15, 18 and 20, one transaction item (“tenure”) is created and stored in the table. The transaction item is created by the vmm_perf_tenure base class. The tenure management methods such as start_tenure(), end_tenure(), suspend_tenure(), resume_tenure(), abort_tenure() allow you to express the state of the monitored tenure and reflect those in the performance tables. You can of course control when to execute these methods from your test, either timing control statements, events, callbacks, or whatever. Callbacks registered to the various vmm_xactor classes in your environment are the most scalable way to hook these into your environment, but it is your choice.

· In lines 24 and 25, data is flushed into the database at the end of simulation (in the vmm_env::report phase, to be more precise), and a basic / sample report is generated. It is important to note that you will in all likelihood be generating custom reports from the SQL database itself. That is explored further below.

Now that you have a code snippet showing you the process of monitoring statistics on shared resources in the design, you want to be able to write your custom queries, reports, and charts off the database. One could do this in a few ways:

1. Connect a spreadsheet to the database, and use the spreadsheet capabilities to generate statistics, charts etc. There was an earlier blog post on how to accomplish this: refer to “Analyzing results of the Performance Analyzer with Excel” (http://www.vmmcentral.org/vmartialarts/?p=23)

2. Use a SQL engine such as SQLite, MySQL, PostGreSQL, or any others to read in the SQL commands and generate custom query scripts which can then be used in batch mode. SQLite (http://www.sqlite.org), for example, has various plugins, such as Perl, TCL, C/C++ etc., so you can write scripts or queries in your favorite languages. There are several publicly available and commercial front ends you can use to read in the SQL data and perform your analyses (I’ve used SqliteSPY http://www.yunqa.de/delphi/doku.php/products/sqlitespy/index in the past). There are several quick-start tutorials for the SQL syntax available on the internet which should get you up and running with SQL in short order. To generate plots, one could use applications such as gnuplot, R, octave etc. It is probably more convenient to use spreadsheets to create graphs of various kinds.

In the next blog item related to the VMM Performance Analyzer, I will discuss some other aspects of the Performance Analyzer application (all of which is available by reading the User Guide: http://vmmcentral.com/resources.html#docs). I will also provide some examples of SQL code which demonstrate the analyses you can perform.

Posted in Debug, Optimization/Performance, Performance Analyzer | 10 Comments »

Using vmm_test Base Class

Posted by Fabian Delguste on 24th April 2009

Fabian Delguste, Snr CAE Manager, Synopsys

VMM 1.1 comes with a new base class called vmm_test that can be used for all tests. The main motivation for developing this base class was to enable a single compile-elaboration-simulate step for all tests rather than multiple ones. It is recommended to implement tests in a program thread as this provides a good way of encapsulating a testbench, and reduces races between design and testbench code. All test examples showed the tests implemented directly in program blocks. The drawback of this technique is that the user needs to recompile, elaborate and simulate each test individually. When dealing with large regressions consisting of 1000s of tests, multiple elaborations can waste a significant amount of time whereas tests using vmm_test only requires one elaboration. A given test can be selected at run-time using a command-line option. Switches like +ntb_random_seed can be used in conjunction with these tests.

To understand better how this base class works, let’s look at the example that ships with the VMM 1.1 release in the sv/examples/std_lib/vmm_test directory. This example shows how to constrain some ALU transactions. These transactions, modelled by the alu_data class , are randomly generated by an instance of a vmm_atomic_gen and passed to an ALU driver using a vmm_channel.

Before digging in the gory details of vmm_test, let’s see how tests are traditionally written:

1. class add_test_data extends alu_data;

2.   constraint cst_test {

3.      kind == ADD;

4.    }

5. endclass

6.

7. program alu_test();

8.    alu_env env;

9.    initial begin

10.     add_test_data tdata = new;

11.     env.build();

12.     env.gen.randomized_obj = tdata;

13.     env.run();

14.   end

15.endprogram

· In lines 1-4, the alu_data class is extended to the new class add_test_data which contains test-specific constraints. In this test, only ADD operations are carried modelled by this class.

· In line 7, a program block is used to instantiate the environment alu_env based on vmm_env.

· In line 10-13, the environment is built and the new transaction add_test_data is used as the factory instance for our vmm_atomic_gen transactor.

Of course, this test is very specific and a user clearly needs to create a similar program block with other constraints to fulfill the corresponding test plan. For instance, this test can be duplicated many times to send {MUL, SUB, DIV} ALU operations to the ALU driver. In this case, multiple program blocks are required, so are multiple elaborations and binaries to simulate these tests.

VMM 1.1 provides a way to include all test files in a single compilation. The previous test can now be written like this:

1. class add_test_data extends alu_data;

2.   constraint cst_test {

3.     kind == ADD;

4.   }

5. endclass

6.

7.  `vmm_test_begin(test_add, alu_env, “Addition”)

8.  env.build();

9.   begin

10.   add_test_data tdata = new;

11.   env.gen.randomized_obj = tdata;

12. end

13. env.run();

14. `vmm_test_end(test_add)

· In line 7, the vmm_test short hand macro `vmm_test_begin is used to declare the test name (test_add in our example), the name of the vmm_env where all transactors reside (alu_env in our example) and a label that is used to tag this particular test.

· In lines 8-13, users can build the environment, insert the factory instance and kick off the test.

· In line 14, the vmm_test short hand macro `vmm_test_end is used to terminate this test declaration.

Of course, other tests with variations in constraints can be written in the same way. Since the environment is exposed after the `vmm_test_begin short hand macro, it is possible to register callbacks, replace generators or do any other operation which is traditionally done in the VMM program block.

An important aspect of these tests is that whenever they are included, they become statically declared and visible to the environment.

Now let’s see how to include these tests in VMM program block:

1. `include “test_add.sv”

2. `include “test_sub.sv”

3. `include “test_mul.sv”

4. `include “test_ls.sv”

5. `include “test_rs.sv”

6.

7. program alu_test();

8.    alu_env env;

9.    initial begin

10.    vmm_test_registry registry = new;

11.    env = new(alu_drv_port, alu_mon_port);

12.    registry.run(env);

13.  end

14. endprogram

· In line 1-5, all tests are simply included

· In line 10, registry, which is an instance of vmm_test_registry,is constructed. This object contains all tests implemented using vmm_test_begin that have been previously included

· In line 12, registry can be run and a handle to the environment is passed as an argument to this class. This is how all vmm_tests can have access to the environment

Running a specific test is achieved by providing the test name on the command line, for example:

simv +vmm_test=test_add

simv +vmm_test=test_sub

Note that calling simv without the +vmm_test switch returns a FATAL error and lists all registered tests. This is a good way to document the available tests.

Posted in Creating tests, Optimization/Performance | 11 Comments »

Analyzing results of the Performance Analyzer with Excel

Posted by Janick Bergeron on 13th April 2009

jb_blog

Janick Bergeron, Synopsys Fellow

The VMM Performance Analyzer stores its performance-related data in a SQL database. SQL was chosen because it is an IEEEANSI/ISO standard with a multiplicity of implementation, from powerful enterprise systems like Oracle, to open source versions like MySQL to simple file-based like SQLite. SQL offers the power of a rich query and analysis language to generate the reports that are relevant to your application.

But not everyone knows SQL. You need an SQL-aware application to do fancier stuff. And guess what! Excel can import SQL data! And everyone knows Excel!

In this post, I will show how you can get VMM Performance Analyzer data from a SQLite database into an Excel spreadsheet. A similar mechanism can be used if you are using MySQL or any other SQL implementation that offers an ODBC (Open Database Conduit).

First, download and install the SQLite OBDC from http://www.ch-werner.de/sqliteodbc/sqliteodbc.exe on your PC and install.

Next, create a new OBDC Data source by opening the OBDC Data Source Administrator by navigating to Start -> Settings -> Control Panel -> Administrative Tools -> Data Sources (OBDC).

sqlite1

Click the “Add…” button to add a new data source. Scroll down to select the “SQLite3 OBDC Driver” entry.

sqlite2

C lick the “Finish” button. This will then open the OBDC data source configuration window. Specify a name for the data source and browse to the SQLite database file. A SQLite ODBC data source connects to a single database file. It is therefore a good idea to name this new data source according to the performance data it contains. Leave all other options to their default value.

sqlite3

Click “OK” to complete the creation of the new data source on your computer.

sqlite4

Repeat these steps for each SQLite database that will need to be analyzed using Microsoft Excel on your computer. Click “OK” when all data connections have been defined.

Start Excel with a blank workbook, then Select Data -> From Other Sources -> From Microsoft Query.

sqlite11
This will bring up a dialog box to select a data source. Select the data source corresponding to the desired SQLite database.

sqlite6

Click “OK”. The following error pops up. Don’t worry! The next step is to correct that error.

sqlite7

Click “OK” to close the error pop-up. This will bring up the Microsoft Query Wizard.

sqlite8

Click on “Options”. Modify any of the check boxes but make sure the “Tables” option is (or remains) checked. You must make a modification to at least one check box.

sqlite92

Click OK.  The Microsoft Query Wizard should now be populated with the list of tables found in the SQLite database file. From this point on, which table and which columns you select depend on the analysis you wish to perform.

For example, to perform an arbiter fairness analysis, you would select the “InitiatorID” and “active” columns of the “Arb” table.

sqlite10

Click “Next >” three times then return to Microsoft Excel by clicking “Finish”. The Import Data dialog box will appear.

sqlite13Click “OK” to insert the data as a table and voila! You now have a table populated from the data dynamically extracted from the SQLite database.

sqlite111

From there, a PivotChart can be used to display the average and maximum arbiter response time for each initiator. As you can see, the arbiter appears to be fair, given the small number of samples collected.

sqlite12

I hope you’ll find this step-by-step guide useful.

What cool thing have you done with the Performance Analyzer?

Posted in Debug, Optimization/Performance, Performance Analyzer | 5 Comments »

Garbage collection

Posted by Janick Bergeron on 18th July 2008

In SystemVerilog, unlike C, you don’t have to explictly free dynamically allocated class instances. Like most modern programming languages, SystemVerilog includes a garbage collector that frees memory that is no longer needed.

Although there are a few garbage collection processes out there, the lack of a root object (like the sys object in e) almost garantees that Reference Counting is used1.

To illustrate how reference counting works, consider the following code segment:

fork
   while (...) begin
      eth_frame pkt = new;         // Line 1
      scoreboard.push_back(pkt);   // Line 2
      ...
   end                             // Line 3
   forever begin
      eth_frame pkt;
      pkt = rx();
      if (!scoreboard[0].compare(pkt)) ...
      scoreboard.pop_front();      // Line 4
   end
join

When the eth_frame instance gets first created on line 1, its reference count is set to “1″. When it is later added to the scoreboard on line 2, its reference count is incremented to “2″ (one for the “pkt” variable, one for the “scoreboard” queue). When the while loop iterates on line 3, the dynamic “pkt” variable disappears, reducing the reference count to 1. When the class instance is finally popped out of the scoreboard on line 4, its reference count is decremented to “0″ and its memory is garbage collected.

This usually works well until you create circular references, i.e. objects that refer to each other.

Circular references are always created whenever you introduce the concept of a “parent” object. Take a RAL model for example, an instance of the vmm_ral_reg class has a reference to its parent vmm_ral_block class instance that gets returned by the vmm_ral_reg::get_block() method. But the vmm_ral_block class instance has a reference to all of the vmm_ral_reg instances it contains, which get returned by the vmm_ral_block::get_registers() method.

That create a circular reference for every register in a block.

That means that a RAL model cannot be garbage collected. Ever.

In the case of a RAL model, that’s not a problem because it corresponds to the DUT and that DUT, being modelled using modules and interfaces, is static in nature and can never be dynamically modified.

But in the case of objects that get created in large numbers but should live only for as long as needed (such as packets or transactions who only need to live while they are processed by the DUT), then you have to be very careful to break that circular reference to enable garbage collection. Otherwise you will end up consuming an ever-increase amount of memory.

1Turns out not to be accurate. See follow-up comment.

Posted in Optimization/Performance, Register Abstraction Model with RAL | 5 Comments »

Message, message on the wall!

Posted by Janick Bergeron on 19th June 2008

Why does the VMM message interface (the vmm_log class) have a start_msg() and a text() method that must be used in this convoluted way:

if (log.start_msg(vmm_log::DEBUG_TYP, vmm_log::TRACE_SEV)) begin
   log.text("This is a trace message");
   log.end_msg();
end

Why not the much simpler one-liner:

log.message(vmm_log::DEBUG_TYP, vmm_log::TRACE_SEV,
            "This is a trace message");

which would then eliminate the need for the macro:

`vmm_trace(log, "This is a trace message");

Given the examples above, there is absolutely no reason. However, this example illustrates why:

`vmm_trace(log, $psprintf("Read 'h%h from 'h%h with status %s",
                           data, addr, status.name()));

which expands to:

if (log.start_msg(vmm_log::DEBUG_TYP, vmm_log::TRACE_SEV)) begin
   log.text($psprintf("Read 'h%h from 'h%h with status %s",
                      data, addr, status.name()));
   log.end_msg();
end

The $psprintf() (and all other formatting system tasks $sformat(), $format(), $write(), $display(), etc…) may be simple to use but they are very run-time expensive. And if you are not going to display a message, why incur the cost of composing its image?

When using a single procedure call, the value of all of its arguments must be determined before it is called. Thus, using this approach:

log.trace($psprintf("Read 'h%h from 'h%h with status %s",
                    data, addr, status.name());

incurs the cost of creating the message image every single time. And most of the time, this debug message will simply be filtered out (think about the thousands and thousands of regression runs where debug is not enabled!).

On the other hand, checking first if messages of a certain type or severity are going to be filtered out or not and only then composing the image of the message improves your run-time performance.

By how much? Of course, it depends on the number of messages that will eventually get filtered out. But just to give you an idea, I ran this experiment using VCS:

program p;

initial
begin
   int i;
   string msg;
   i = $urandom() % 2; // See footnote1
   if (i == 1) i--;
   repeat (100000) begin
`ifdef GUARD
      if (i)
`endif
      msg = $psprintf("%s, line %0d: Message #%0d at %t",
                      `__FILE__, `__LINE__, 0, $time());
   end
end

endprogram

With `GUARD defined, which causes the $psprintf() call to be skipped, I get run-times of approximately 0.025 seconds. With `GUARD undefined, which causes the $psprintf() call to be executed, I get run-times of approximately 0.230 second or 10x slower simulation performance.

Personally, I think the performance gain is worth the little extra bit of code to write. Remember to always optimize the right thing: you’ll write that code once but you’ll run it thousands and thousands of times. So saving a few lines of codes is not always the right decision.

1 I use a convoluted way to set i to 0 to prevent an optimizing compiler from optimizing the entire if statement away.

Posted in Debug, Messaging, Optimization/Performance | 2 Comments »