Posted by Shankar Hemmady on 31st August 2009
Andrew Piziali, Independent Consultant
You are using coverage, along with other metrics, to measure verification progress as part of your verification methodology.1 2 Yet, lurking in the flow are the seeds of a bug escape that will blindside you. How so?
Imagine you are responsible for verifying an in-order, three-way x86 superscalar processor in the last millennium, before the Constrained Random Generation. Since your management wouldn’t spring for an instruction set architecture (ISA) test generator, you hired a team of new college grads to write thousands of assembly language tests. Within the allocated development time, the tests were written, they were functionally graded and achieved 100% coverage, and they all finally passed. Yeah! But, not so fast …
When first silicon was returned and Windows was booted on the processor, it crashed. The diagnosis revealed a variant of one of the branch instructions was misbehaving. (This sounds better than “It had a bug escape.”) How could this be? We reviewed our branch instruction coverage models and confirmed they were complete. Since all of the branch tests passed, how could this bug slip through?
Further analysis revealed this branch instruction was absent from the set of branch tests yet used in one of the floating point tests. Since the floating point test was aimed at verifying the floating point operation of the processor, we were not surprised to find it was insensitive to a failure of this branch instruction. In other words, as long as the floating point operations verified by the test behaved properly, the test passed, independent of the behavior of the branch instruction. From a coverage aspect, the complete ISA test suite was functionally graded rather than each sub-suite graded according to its functional requirements. Hence, we recorded full coverage.
The problem was now clear: the checking and coverage aspects of each test were not coupled, conditioning coverage recording on passing checked behavior. If we had either (1) functionally graded each test suite only for the functionality it was verifying or (2) conditionally recorded each coverage point based upon a corresponding check passing, this bug would not have slipped through. Using either approach, we would have discovered this particular branch variant was absent from the branch test suite. In the first case, that coverage point would have remained empty for the branch test suite. Likewise, in the second case we would not have recorded the coverage point because no branch instruction check would have been activated and passed.
Returning to the 21st century, the lesson we can take away from this experience is that coverage—functional, code and assertion—is suspect unless, during analysis, you confirm that for each coverage point a corresponding checker was active and passed. From the perspective of implementing your constrained random verification environment, each checker should emit an event (or some other notification), synchronous with the coverage recording operation, indicating it was active and the functional behavior was correct. The coverage code should condition recording each coverage point on that event. If you are using a tool like VMM Planner to analyze coverage, you may use its “-feature” switch to restrict the annotation of feature-specific parts of your verification plan to the coverage database(s) of that feature’s test suite.
You might ask if functional qualification3 would address this problem. Functional qualification answers the question “Will my verification environment detect, propagate and report each functional bug?” As such, it provides insight into how well your environment detects bugs but says nothing about the quality of the coverage aspect of the environment. I will address this topic in a future post if there is sufficient interest.
Remember, make your coverage count by coupling checking with coverage!
2Functional Verification Coverage Measurement and Analysis, 2008, Andrew Piziali, Springer