Here are my notes from the Chisel Community Conference China 2021. The conference took place on June 26th 2021, and was organized as a hybrid conference, open to both on-site and remote participants (over Zoom). The organizers have promised that the recorded talks will be made available online in the next couple of weeks.
Since the conference was located in China it started in the early morning hours for the participants from Europe. I decided that 4:00 is a good compromise between my love for Chisel and my need for sleep; I have only missed the two invited talks at the beginning. In general the conference was quite interesting, and it was clear that Chisel is particularly suited for highly configurable designs, e.g. processors and data-processing pipelines.
Written in italics are my comments.
[Invited Talk] Chisel breakdown 2
...
cloneType
override def cloneType
--> autoclonetype2- why have I never needed this?
autoclonetype1
- deprecated, was based on reflection- generated by the compiler plugin
aspect phase
- insert additional hardware, layout, verification, ...
Top 10 Common Misconception about Chisel
-
Institute of Computing Technology at CAS
-
in Chinese, I understood "Chisel", "DSL", and "okay"
RocketChip
- RocketChip is complicated, several additional ... (config, the bus framework, register gen)
- "if you are new to Chisel, DO NOT read the source code of RocketChip"
- note to self: go read the source code of RocketChip
Verilog is more expressive than Chisel
-
the presenter argues that logic is fundamentally:
- modules
- combinatorial logic
- registers
-
technically this is all supported in Chisel
- I am not sure if I would agree - I think there is still place for Verilog for low-level stuff, just like some parts of the code (e.g. in Linux kernel) are still written in assembly
Chisel compile errors
types of errors:
- Scala compile errors
- Scala run-time error
- Chisel build error
- FIRRTL transform error
-
Circuit simulation error
-
distinction between fault (uninit variable), error (returning a garbage value) and failure (segmentation fault)
- how are these different?
- the presenter argues that Chisel has a stricter type system than Verilog
- shouldn't Chisel be compared to SystemVerilog for a fair comparison? (also
default_nettype none
) - "hidden fault" -> "observable failure"
Blackbox
for (System)Verilog
Testing
- Chisel Tester, Chisel Tester 2, UVM from Verilog
- "Agent Faker": TL-C UVM above Chisel Tester 2, open source
equality between Chisel and generated Verilog code
- aka "the Chisel compiler is not formally verified"
- very complex task and unnecessary, one can run tests also on the generated Verilog
- known-good --> successful Chisel projects: RocketChip, BOOM, lowRISC, NutShell, Labeled RISC-V, XiangShan
Quality of Results for Chisel
- misconception: Java slower than C --> Chisel hardware slower than Verilog hardware
- OK, I experienced this first hand - my colleague was asking me what the fmax is for a typical Chisel-generated logic
- will he also mention that Chisel is not an HLS?
Chisel is not HLS
- I predict the future
- advanced features in Chisel/Scala (i managed to understand "map", "mapper")
- PPA: Power-Performance-Area
generated core readability
- comments in verilog
EmbeddedTLB
- wasn't there a patch to improve readbility of the generated code - check previous CCC
high-performance circuits in Chisel
- https://github.com/OpenXiangShan/XiangShan, https://openxiangshan.github.io/
- looks impressive
Introducing Decoder Generation API to Chisel
example
- 7 segment LED
- input: b0 - b4, output: a - g, k for cathode
- non-valid states = (default, don't care)
theory
-
AND plane, OR plane (this looks like PAL/GAL, right?, is this really relevant for modern LUT-based FPGAs and ASICs?)
-
now a full example of the 7-segment decoder implemented in PLA
-
logic optimization (definition from Wikipedia)
- Quine-McClusky
- Espresso
-
sparser PLA
Chisel utils
-
experimental.decode.TruthTable
,DecodeTableAnnotation
,decoder
,QMCMinimizer
,EspressoMinimizer
-
how does this affect later stages (e.g. optimization during the synthesis)
-
nice Scala feature - "list unpacking" -
val a :: b :: [...] = x.toBools()
Practice of High-performance Chip Agile Development with Chisel
-
XiangShan CPU
-
agile development (iterative)
- SystemVerilog
interface
s, ChiselBundle
s - processor parameters
-
Ultra - apparently the highest-end impl of the processor
-
Chisel solution -
Vec
in aBundle
as an I/O -
FIRRTL transform for
printf
-
configurability
- "Chisel = syntactic sugar for Verilog"
-
link: wallace tree multiplier, CCCC 2021
-
recursion is allowed in Chisel, the generated Verilog code does not include recursion
-
distinction between Chisel, Scala
- Wire, Reg - straightforward to understand
-
advanced Scala features: object, abstract class, trait
-
advice: start with Chisel, learn Scala later, i would not agree, learn Scala first
-
co-simulation
-
Scala-based modules in Chisel Test2 cannot be used in other tools
-
behavioral models to replace actual Chisel modules
-
assertion generation in Chisel
-
generated names can change between Chisel versions, and can cause problems for physical design
summary
- chisel is an advanced HDL, not HLS
- parametrization
- does not affect PPA (vs Verilog)
Revisiting Diplomacy
-
not an HLS, HCL = HW Construction Language
-
configuration
-
RocketChip
-
an example of a configuration with Verilog: https://github.com/riscv-mcu/e203_hbirdv2/blob/master/rtl/e203/core/config.v
Verilog defines = "that is a piece of garbage" :D
defines
are handled by a preprocessor-
no easy way to provide a validation
-
diplomacy: parameter negotiation framework
- CDE (Content Dependent Values)
- Scala
implicit
s - Scala type inference and type checking
API
- user API (
extend Config
) - design API (
trait
)
parameters
- hierarchy
-
topology
-
global defintion = bad
-
Parameter
,LazyModule
, passed implicitly -
topology parameters
- interfaces as DAG (Directed Acyclic Graph)
- acylic: how does DMA look like?
- interfaces: AXI, TileLink, ...
- 2-phase elaboration
Summary
- Diplomacy refactor -> stand-alone library
-
(plans for) TileLink, AXI, ACE, CHI, WishBone
-
RocketChip newbies
- TileLink implementation
Using partial swarm optimization to reduce verification time
-
prerecorded intro, issues with audio
-
Partial Swarm Optimization
- v_i - particle velocity
- C - learning factor
- global best
CDMA - some kind of a DMA, apparently
MCIF - presumably memory controller interface
1 channel with weights
3 data channels: IMG, WG, DC
data word: 78*3 bits
process:
- generate random solution
- calculate the fitness
- update the particle speed and position
- end a high-quality stimulus was found, else goto 1
A general method of generating stimulus based on SVM
- verification consumes a lot of resources
- using SVM to predict coverage
SVM - support vector machine (binary classification)
-
using Matlab -> different types, different functions
-
Convolution Pipeline:
- CDMA, CBUF, CSC, CMAC, ...
CSC - convolution sequence controller = gets the data from DMA, loads/schedules it into MAC
nvdla-csc
- SG - sequence generator
- DL - data loader
- WL - weight loader
training
- data, labels = coverage data
-
process:
-
training
- random stimuli
- get the coverage from VCS
- use this to create training set for SVM
- train SVM
an arbitrary value is chosen as a threshold between the labels
couldn't this be done better with Reinforcement Learning? and is the SVM the right tool to use?
Q: relationship between the project and Chisel?
A: answer in Chinese
Summary of Problems and Experiences during the Processor Development based on Chisel
- short intro (chisel sheatsheet, Programming in Scala, software thinking, hardware thinking)
-
val
vsdef
(this could be nasty to debug,def
creates a new instance on each call) -
Cat
vsMap
, hardware thinking vs software thinking --> Cat starts at MSB, Map starts at LSB -
width - in
Cat
it should be explicitly defined -
careful with
DontCare
Syntactic Salt - I like this very much
- competitive assignment (loop index)
developer perspective
- "Chisel = excelent HDL, free devs from dirty work"
- "be familiar with Scala before using Chisel"
- check the generated verilog
I liked this talk, the examples were relevant
Q: a question about the competitive assignment (in Chinese)
Q: i understood TPU, SiFive and xie xie
Stimuli Generation by Constrained Markov Chain Monte Carlo Simulation for Chisel-based Deep Learning Accelerator Verification Platform
was this title also generated with a Markov Chain?
nvdla.org
Direction Convolution
- input: CSC, output CACC
- each MAC cell: 64 multipliers
- pipelined status
Constrained Random Testing
- Markov-Chain Monte-Carlo (MCMC)
- Monte-Carlo: draw independent samples from the distribution
- Markov-Chain: the current value is probabilistically dependent on the previous value
- Metropolis-Hastings Algorithm (proposal, an acceptance of the proposal)
implementation
- use a pool of states to generate a low-correlation stimulus
- MCMC-based fault location
very precise and academic presentation, unclear how it relates to Chisel and how this method can be used in practice
Implementation of a Highly Configurable Wallace Tree Multiplier with Chisel
- recursion for Wallace tree compression
- only 120 lines of Chisel code
- configurable pipelining
the algorithm
- booth-4 encoding
- n*n mul <-> n/2 partial products
- sign-extend (
i match [...]
) - tree-compression - columns represented as
Array[Seq[Bool]]
Seq
vsArray
?- what is the benefit of using
Array
(from Java) vsVec
(from Chisel) -
compress the whole tree (+ register insertion)
-
summary: highly configurable, better scalability, easier to read
Q: something about latency on the slide about pipelining
Agile IC Design Team Working in Chisel, Empowered by Diplomacy and Config
https://www.streamcomputing.com/en/
-
RocketChip/BOOM
-
Qs when reading the code:
implicit
,:*=
and:=*
, ...
Diplomacy
- Scala framework
- negotation/castint/modifying parameters
- LazymoduleImp
Config
- Scala framework
-
definition of parameters globaly
-
Chisel =/= Diplomacy and Config
-
Diplomacy and Config = pure Scala
-
pros:
- suitable for SoCs
- availability of open-source IP
-
cons:
- hard to fully understand
building a Chisel Team
- firm commitment to Chisel
- no modifications of the generated Verilog
-
divide an conquer: dedicated experts for Diplomacy and Config
-
stages:
- Chisel basis (Bundle, Reg, WIre,, ...)
- Diplomacy and Config (case class, high-order function, other advanced features)
-
minor tricks: wrap important signals in modules, use CamelCase in Chisel
Experience Sharing: Develop NutShell using Chisel
https://github.com/OSCPU/NutShell
- NutShell (5 undergraduates in 4 months)
- SDRAM, SPI and UART,
- boots Linux (Debian/Fedora)
- single-issue, in-order core
- RV64IMAC, Zifence, Zicsr
- runs at 60 MHz on Zynq 7000, 200 MHz on Zynq US+, 350 MHz in 110 nm SMIC
example:
MaskedRegMap
abstract class - address, read and write side effectsapply
method,generate
method
peripheral devices
- MMIO - AXI4 or AXI4Lite
AXI4SlaveModule
abstract class
suggestion
- avoid Verilog-like Chisel
- software-like Chisel - careful with
var
- find the balance
Chisel Implementation Tutorial - in a lightweight ...
the title in the program was "Light Weight Chisel3 KnitKit"
-
Data -> Bits -> Clock/Wire/... (outdated diagram)
-
RegInit
with different types of resets
running through slides
- io connection with
withClockAndReset
https://github.com/colin4124/Chisel-Implementation-Tutorial
created 6 hours ago
Q: something about AutoBundle
Use Firrtl Transform to Control the Effective Range of 'printf' in Large Scale Circuits
printf
in Chisel can be translated infwrite
in Verilog- extensive printing will slow down the simulation
- typically only as small part of the code is inspected
- using FIRRTL transform
Implementation
4 types of annotation:
EnablePrintfAnnotation
, Disable..
, DisableAll..
, Remove Assert...
execute(c: CircuitState)
-
ancestor (hierarchy)
-
firrtl.analyses.CircuitGraph
Reinforcement Learning based Stimulus Generation for Chisel Module Verification
Constrained Random Test
-
test generator
-
CACC: convolution accumulator
- assembly SRAM group, delivery SRAM group (banks of 64Bx32 SRAMs)
- assembly = accumulation with saturation
Reinforcement Learning
- Agent
- Action
- Environment
- Reward
- State
Q-learning (Q = quality), Bellman equation,
states = coverage, actions = stimulus gamma = reward discount
- agent is exploring the environment, Q-table is updated, Q is maximized
approach
- extremely large input vector: 2 ** 131
- multiplexer: stimulus = 128 digits, select = 3 bits
controller written in Python! (at a Chisel workshop) :)
Genetic Algorithm based Stimulus Generation for Chisel Module Verification
- Single Point Data Processor = post-processor of NVDLA
- inputs: i32 x 16 from CACC
Genetic Algorithm
- population
- fitness calculation
- mating pool
- parents selection
- mating (crossover and mutation)
- offsprings
- back to population
approach
- 20 binary stimulus vectors
- fitness calculation - coverage from VCS
- matting: exchange bits in vectors between stimuli vectors, mutation: bit flips
again no mention of Chisel, no result presented
Recurring Neural Networks based Stimuli Generation for Chisel Module Verification
- constrained random verification (PRG for randomization, constraints for stimuli)
Recurring Neural Network
- neuron: x * w - offset + threshold
- Hopfield Network (each neurons output is connected to all other neurons but not to itself)
PDP (Planar Data Processing)
- results of the pooling process
- outputs: max, min, average
- traverses width, height, channel
- maximum pooling (pipelined)
approach
- inputs: control, status from the previous module, data payload
- output: status, and data
- Python based controller, simulation with VCS
Quasar: SweRV-EL2 implemented in CHISEL
- Convert SweRV-EL2 from SystemVerilog to Chisel
- comparison of SystemVerilog to Chisel
Quasar
- 4-stage, mostly in-order, RV32IMC, runs at 600 MHz at 16 nm
- development procedure: first unit test, then comparison between SweRV-EL2
- LEC (*is this a formal) on the generated Verilog
Pros/Cons
- parametreizable & scalable code
- no linting problems
- people are reluctant about adopting Chisel
- "Chisel makes the verification more tedious"
Analysis of the results
- fMAX, area and power almost the same (2-3% difference)
- Chisel: 12 kLOC, SystemVerilog: 19 kLOC
Roadmap
- F-extension (will be open-sourced in the near future)
- vector extension
https://github.com/Lampro-Mellon/Chisel-Training
ChiselVerify: A Verification Framework for Chisel
- verification = testing before tape-out
- validation = testing after tape-out
Current solutions
- ChiselTest (not many functions), ScalaTest
- SystemVerilog, UVM (verbose, multiple languages)
ChiselVerify
- an extension to ChiselTest
- 4 parts
- functional coverage
- constraint random verification
- bus functional model
- timed assertions
functional coverage
- statement coverage / functional coverage
- verification plan <- cover groups <- cover points] (Range, Conditions, Cross, Timed)
- coverage database
- coverage reporter
val cr = new CoverageReporter(dut)
cr.register(
CoverPoints(...),
...
)
...
cr.printReport()
constraint random verification
- constraint programmable language
- JaCoP as an SMT solver
- custom distribution
bus functional models
- AXI4 interface
- Transactions
- software abstraction
timed assertions
- types of delays:
Exactly
Eventually
Always
Never
summary
- "test Chisel designs in Scala"
https://github.com/chiselverify/chiselverify
Teaching Digital Design with Chisel
Q at CCC 2020: "Is Chisel ready for class?"
A: yes
- two courses: Digital Electronics 1 & 2
- VHDL until 2019, DE1 still uses VHDL, DE2 uses Chisel
- "VHDL is dying a little bit"
- IntelliJ, sbt (
sbt run
,sbt test
) - Digital Design with Chisel (https://www.imm.dtu.dk/~masca/chisel-book.html)
- LOC(VHDL)/LOC(Chisel) ~ 2
- Simulation + GUI in Swing
Towards Agile Networking Hardware - Chisel at OVHcloud
- OVHcloud = 1st European cloud provider
- 22 Tbps bandwidth, DDoS
- anti-DDoS (scrubbing with FPGAs)
- attackers are agile, to respond
- HLS - reduces performance and agility
- does HCL improve agility without affecting performance
de-risking
- counter store (hash table with a cuckoo filter)
- https://hal.archives-ouvertes.fr/hal-03157426
- reset, async reset added recently
- with no reset the P&R results are better
- Pull Request for Preset
flow
- limitation: parameters are embedded in names
- solution: a wrapper in Verilog for the Chisel-generated Verilog
- successful SV/Chisel cohabitation
- sv2chisel ("low level Chisel"), challenges: clock and reset retrieval, choosing types
- https://github.com/ovh/sv2chisel
Summary
- CocoTB instead of Chisel-testers
- Pipeline abstraction (PhD thesis, a DSL on top of Chisel)