j-marjanovic.io


About


CV


Atom feed


Notes from Chisel Community Conference China 2021

Here are my notes from the Chisel Community Conference China 2021. The conference took place on June 26th 2021, and was organized as a hybrid conference, open to both on-site and remote participants (over Zoom). The organizers have promised that the recorded talks will be made available online in the next couple of weeks.

Since the conference was located in China it started in the early morning hours for the participants from Europe. I decided that 4:00 is a good compromise between my love for Chisel and my need for sleep; I have only missed the two invited talks at the beginning. In general the conference was quite interesting, and it was clear that Chisel is particularly suited for highly configurable designs, e.g. processors and data-processing pipelines.

Written in italics are my comments.


[Invited Talk] Chisel breakdown 2

...

cloneType

  • override def cloneType --> autoclonetype2
  • why have I never needed this?
  • autoclonetype1 - deprecated, was based on reflection
  • generated by the compiler plugin

aspect phase

  • insert additional hardware, layout, verification, ...

Top 10 Common Misconception about Chisel

  • Institute of Computing Technology at CAS

  • in Chinese, I understood "Chisel", "DSL", and "okay"

RocketChip

  • RocketChip is complicated, several additional ... (config, the bus framework, register gen)
  • "if you are new to Chisel, DO NOT read the source code of RocketChip"
  • note to self: go read the source code of RocketChip

Verilog is more expressive than Chisel

  • the presenter argues that logic is fundamentally:

    • modules
    • combinatorial logic
    • registers
  • technically this is all supported in Chisel

  • I am not sure if I would agree - I think there is still place for Verilog for low-level stuff, just like some parts of the code (e.g. in Linux kernel) are still written in assembly

Chisel compile errors

types of errors:

  • Scala compile errors
  • Scala run-time error
  • Chisel build error
  • FIRRTL transform error
  • Circuit simulation error

  • distinction between fault (uninit variable), error (returning a garbage value) and failure (segmentation fault)

  • how are these different?
  • the presenter argues that Chisel has a stricter type system than Verilog
  • shouldn't Chisel be compared to SystemVerilog for a fair comparison? (also default_nettype none)
  • "hidden fault" -> "observable failure"

Blackbox for (System)Verilog

Testing

  • Chisel Tester, Chisel Tester 2, UVM from Verilog
  • "Agent Faker": TL-C UVM above Chisel Tester 2, open source

equality between Chisel and generated Verilog code

  • aka "the Chisel compiler is not formally verified"
  • very complex task and unnecessary, one can run tests also on the generated Verilog
  • known-good --> successful Chisel projects: RocketChip, BOOM, lowRISC, NutShell, Labeled RISC-V, XiangShan

Quality of Results for Chisel

  • misconception: Java slower than C --> Chisel hardware slower than Verilog hardware
  • OK, I experienced this first hand - my colleague was asking me what the fmax is for a typical Chisel-generated logic
  • will he also mention that Chisel is not an HLS?

Chisel is not HLS

  • I predict the future
  • advanced features in Chisel/Scala (i managed to understand "map", "mapper")
  • PPA: Power-Performance-Area

generated core readability

  • comments in verilog
  • EmbeddedTLB
  • wasn't there a patch to improve readbility of the generated code - check previous CCC

high-performance circuits in Chisel


Introducing Decoder Generation API to Chisel

example

  • 7 segment LED
  • input: b0 - b4, output: a - g, k for cathode
  • non-valid states = (default, don't care)

theory

  • AND plane, OR plane (this looks like PAL/GAL, right?, is this really relevant for modern LUT-based FPGAs and ASICs?)

  • now a full example of the 7-segment decoder implemented in PLA

  • logic optimization (definition from Wikipedia)

    • Quine-McClusky
    • Espresso
  • sparser PLA

Chisel utils

  • experimental.decode.TruthTable, DecodeTableAnnotation, decoder, QMCMinimizer, EspressoMinimizer

  • how does this affect later stages (e.g. optimization during the synthesis)

  • nice Scala feature - "list unpacking" - val a :: b :: [...] = x.toBools()


Practice of High-performance Chip Agile Development with Chisel

  • XiangShan CPU

  • agile development (iterative)

  • SystemVerilog interfaces, Chisel Bundles
  • processor parameters
  • Ultra - apparently the highest-end impl of the processor

  • Chisel solution - Vec in a Bundle as an I/O

  • FIRRTL transform for printf

  • configurability

  • "Chisel = syntactic sugar for Verilog"
  • link: wallace tree multiplier, CCCC 2021

  • recursion is allowed in Chisel, the generated Verilog code does not include recursion

  • https://github.com/OpenXiangShan/XiangShan/pull/812

  • distinction between Chisel, Scala

  • Wire, Reg - straightforward to understand
  • advanced Scala features: object, abstract class, trait

  • advice: start with Chisel, learn Scala later, i would not agree, learn Scala first

  • co-simulation

  • Scala-based modules in Chisel Test2 cannot be used in other tools

  • behavioral models to replace actual Chisel modules

  • assertion generation in Chisel

  • generated names can change between Chisel versions, and can cause problems for physical design

summary

  • chisel is an advanced HDL, not HLS
  • parametrization
  • does not affect PPA (vs Verilog)

Revisiting Diplomacy

Verilog defines = "that is a piece of garbage" :D

  • defines are handled by a preprocessor
  • no easy way to provide a validation

  • diplomacy: parameter negotiation framework

  • CDE (Content Dependent Values)
  • Scala implicits
  • Scala type inference and type checking

API

  • user API (extend Config)
  • design API (trait)

parameters

  • hierarchy
  • topology

  • global defintion = bad

  • Parameter, LazyModule, passed implicitly

  • topology parameters

  • interfaces as DAG (Directed Acyclic Graph)
  • acylic: how does DMA look like?
  • interfaces: AXI, TileLink, ...
  • 2-phase elaboration

Summary

  • Diplomacy refactor -> stand-alone library
  • (plans for) TileLink, AXI, ACE, CHI, WishBone

  • RocketChip newbies

  • TileLink implementation

Using partial swarm optimization to reduce verification time

  • prerecorded intro, issues with audio

  • Partial Swarm Optimization

    • v_i - particle velocity
    • C - learning factor
    • global best

CDMA - some kind of a DMA, apparently
MCIF - presumably memory controller interface

1 channel with weights
3 data channels: IMG, WG, DC

data word: 78*3 bits

process:

  1. generate random solution
  2. calculate the fitness
  3. update the particle speed and position
  4. end a high-quality stimulus was found, else goto 1

A general method of generating stimulus based on SVM

  • verification consumes a lot of resources
  • using SVM to predict coverage

SVM - support vector machine (binary classification)

  • using Matlab -> different types, different functions

  • Convolution Pipeline:

  • CDMA, CBUF, CSC, CMAC, ...

CSC - convolution sequence controller = gets the data from DMA, loads/schedules it into MAC

nvdla-csc

  • SG - sequence generator
  • DL - data loader
  • WL - weight loader

training

  • data, labels = coverage data
  • process:

  • training

    1. random stimuli
    2. get the coverage from VCS
    3. use this to create training set for SVM
    4. train SVM

an arbitrary value is chosen as a threshold between the labels

couldn't this be done better with Reinforcement Learning? and is the SVM the right tool to use?

Q: relationship between the project and Chisel?
A: answer in Chinese


Summary of Problems and Experiences during the Processor Development based on Chisel

  • short intro (chisel sheatsheet, Programming in Scala, software thinking, hardware thinking)
  • val vs def (this could be nasty to debug, def creates a new instance on each call)

  • Cat vs Map, hardware thinking vs software thinking --> Cat starts at MSB, Map starts at LSB

  • width - in Cat it should be explicitly defined

  • careful with DontCare

Syntactic Salt - I like this very much

  • competitive assignment (loop index)

developer perspective

  • "Chisel = excelent HDL, free devs from dirty work"
  • "be familiar with Scala before using Chisel"
  • check the generated verilog

I liked this talk, the examples were relevant

Q: a question about the competitive assignment (in Chinese)

Q: i understood TPU, SiFive and xie xie


Stimuli Generation by Constrained Markov Chain Monte Carlo Simulation for Chisel-based Deep Learning Accelerator Verification Platform

was this title also generated with a Markov Chain?

nvdla.org

Direction Convolution

  • input: CSC, output CACC
  • each MAC cell: 64 multipliers
  • pipelined status

Constrained Random Testing

  • Markov-Chain Monte-Carlo (MCMC)
  • Monte-Carlo: draw independent samples from the distribution
  • Markov-Chain: the current value is probabilistically dependent on the previous value
  • Metropolis-Hastings Algorithm (proposal, an acceptance of the proposal)

implementation

  • use a pool of states to generate a low-correlation stimulus
  • MCMC-based fault location

very precise and academic presentation, unclear how it relates to Chisel and how this method can be used in practice


Implementation of a Highly Configurable Wallace Tree Multiplier with Chisel

  • recursion for Wallace tree compression
  • only 120 lines of Chisel code
  • configurable pipelining

the algorithm

  • booth-4 encoding
  • n*n mul <-> n/2 partial products
  • sign-extend (i match [...])
  • tree-compression - columns represented as Array[Seq[Bool]]
  • Seq vs Array?
  • what is the benefit of using Array (from Java) vs Vec (from Chisel)
  • compress the whole tree (+ register insertion)

  • summary: highly configurable, better scalability, easier to read

Q: something about latency on the slide about pipelining


Agile IC Design Team Working in Chisel, Empowered by Diplomacy and Config

https://www.streamcomputing.com/en/

  • RocketChip/BOOM

  • Qs when reading the code: implicit, :*= and :=*, ...

Diplomacy

  • Scala framework
  • negotation/castint/modifying parameters
  • LazymoduleImp

Config

  • Scala framework
  • definition of parameters globaly

  • Chisel =/= Diplomacy and Config

  • Diplomacy and Config = pure Scala

  • pros:

    • suitable for SoCs
    • availability of open-source IP
  • cons:

    • hard to fully understand

building a Chisel Team

  • firm commitment to Chisel
  • no modifications of the generated Verilog
  • divide an conquer: dedicated experts for Diplomacy and Config

  • stages:

    1. Chisel basis (Bundle, Reg, WIre,, ...)
    2. Diplomacy and Config (case class, high-order function, other advanced features)
  • minor tricks: wrap important signals in modules, use CamelCase in Chisel


Experience Sharing: Develop NutShell using Chisel

https://github.com/OSCPU/NutShell

  • NutShell (5 undergraduates in 4 months)
  • SDRAM, SPI and UART,
  • boots Linux (Debian/Fedora)
  • single-issue, in-order core
  • RV64IMAC, Zifence, Zicsr
  • runs at 60 MHz on Zynq 7000, 200 MHz on Zynq US+, 350 MHz in 110 nm SMIC

example:

  • MaskedRegMap abstract class - address, read and write side effects
  • apply method, generate method

peripheral devices

  • MMIO - AXI4 or AXI4Lite
  • AXI4SlaveModule abstract class

suggestion

  • avoid Verilog-like Chisel
  • software-like Chisel - careful with var
  • find the balance

Chisel Implementation Tutorial - in a lightweight ...

the title in the program was "Light Weight Chisel3 KnitKit"

  • Data -> Bits -> Clock/Wire/... (outdated diagram)

  • RegInit with different types of resets

running through slides

  • io connection with withClockAndReset

https://github.com/colin4124/Chisel-Implementation-Tutorial

created 6 hours ago

Q: something about AutoBundle


Use Firrtl Transform to Control the Effective Range of 'printf' in Large Scale Circuits

  • printf in Chisel can be translated in fwrite in Verilog
  • extensive printing will slow down the simulation
  • typically only as small part of the code is inspected
  • using FIRRTL transform

Implementation

4 types of annotation:

EnablePrintfAnnotation, Disable.., DisableAll.., Remove Assert...

execute(c: CircuitState)

  • ancestor (hierarchy)

  • firrtl.analyses.CircuitGraph


Reinforcement Learning based Stimulus Generation for Chisel Module Verification

Constrained Random Test

  • test generator

  • CACC: convolution accumulator

  • assembly SRAM group, delivery SRAM group (banks of 64Bx32 SRAMs)
  • assembly = accumulation with saturation

Reinforcement Learning

  • Agent
  • Action
  • Environment
  • Reward
  • State

Q-learning (Q = quality), Bellman equation,

states = coverage, actions = stimulus gamma = reward discount

  • agent is exploring the environment, Q-table is updated, Q is maximized

approach

  • extremely large input vector: 2 ** 131
  • multiplexer: stimulus = 128 digits, select = 3 bits

controller written in Python! (at a Chisel workshop) :)


Genetic Algorithm based Stimulus Generation for Chisel Module Verification

  • Single Point Data Processor = post-processor of NVDLA
  • inputs: i32 x 16 from CACC

Genetic Algorithm

  • population
  • fitness calculation
  • mating pool
  • parents selection
  • mating (crossover and mutation)
  • offsprings
  • back to population

approach

  • 20 binary stimulus vectors
  • fitness calculation - coverage from VCS
  • matting: exchange bits in vectors between stimuli vectors, mutation: bit flips

again no mention of Chisel, no result presented


Recurring Neural Networks based Stimuli Generation for Chisel Module Verification

  • constrained random verification (PRG for randomization, constraints for stimuli)

Recurring Neural Network

  • neuron: x * w - offset + threshold
  • Hopfield Network (each neurons output is connected to all other neurons but not to itself)

PDP (Planar Data Processing)

  • results of the pooling process
  • outputs: max, min, average
  • traverses width, height, channel
  • maximum pooling (pipelined)

approach

  • inputs: control, status from the previous module, data payload
  • output: status, and data
  • Python based controller, simulation with VCS

Quasar: SweRV-EL2 implemented in CHISEL

  • Convert SweRV-EL2 from SystemVerilog to Chisel
  • comparison of SystemVerilog to Chisel

Quasar

  • 4-stage, mostly in-order, RV32IMC, runs at 600 MHz at 16 nm
  • development procedure: first unit test, then comparison between SweRV-EL2
  • LEC (*is this a formal) on the generated Verilog

Pros/Cons

  • parametreizable & scalable code
  • no linting problems
  • people are reluctant about adopting Chisel
  • "Chisel makes the verification more tedious"

Analysis of the results

  • fMAX, area and power almost the same (2-3% difference)
  • Chisel: 12 kLOC, SystemVerilog: 19 kLOC

Roadmap

  • F-extension (will be open-sourced in the near future)
  • vector extension

https://github.com/Lampro-Mellon/Chisel-Training


ChiselVerify: A Verification Framework for Chisel

  • verification = testing before tape-out
  • validation = testing after tape-out

Current solutions

  • ChiselTest (not many functions), ScalaTest
  • SystemVerilog, UVM (verbose, multiple languages)

ChiselVerify

  • an extension to ChiselTest
  • 4 parts
  • functional coverage
  • constraint random verification
  • bus functional model
  • timed assertions

functional coverage

  • statement coverage / functional coverage
  • verification plan <- cover groups <- cover points] (Range, Conditions, Cross, Timed)
  • coverage database
  • coverage reporter
val cr = new CoverageReporter(dut)
  cr.register(
      CoverPoints(...),
      ...
  )
...
cr.printReport()

constraint random verification

  • constraint programmable language
  • JaCoP as an SMT solver
  • custom distribution

bus functional models

  • AXI4 interface
  • Transactions
  • software abstraction

timed assertions

  • types of delays:
  • Exactly
  • Eventually
  • Always
  • Never

summary

  • "test Chisel designs in Scala"

https://github.com/chiselverify/chiselverify


Teaching Digital Design with Chisel

Q at CCC 2020: "Is Chisel ready for class?"
A: yes

  • two courses: Digital Electronics 1 & 2
  • VHDL until 2019, DE1 still uses VHDL, DE2 uses Chisel
  • "VHDL is dying a little bit"
  • IntelliJ, sbt (sbt run, sbt test)
  • Digital Design with Chisel (https://www.imm.dtu.dk/~masca/chisel-book.html)
  • LOC(VHDL)/LOC(Chisel) ~ 2
  • Simulation + GUI in Swing

Towards Agile Networking Hardware - Chisel at OVHcloud

  • OVHcloud = 1st European cloud provider
  • 22 Tbps bandwidth, DDoS
  • anti-DDoS (scrubbing with FPGAs)
  • attackers are agile, to respond
  • HLS - reduces performance and agility
  • does HCL improve agility without affecting performance

de-risking

flow

  • limitation: parameters are embedded in names
  • solution: a wrapper in Verilog for the Chisel-generated Verilog
  • successful SV/Chisel cohabitation
  • sv2chisel ("low level Chisel"), challenges: clock and reset retrieval, choosing types
  • https://github.com/ovh/sv2chisel

Summary

  • CocoTB instead of Chisel-testers
  • Pipeline abstraction (PhD thesis, a DSL on top of Chisel)