<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>j-marjanovic.io - Jan Marjanovic</title><link href="www.j-marjanovic.io/" rel="alternate"></link><link href="www.j-marjanovic.io/feeds/jan-marjanovic.atom.xml" rel="self"></link><id>www.j-marjanovic.io/</id><updated>2021-12-29T18:00:00+01:00</updated><entry><title>Exploring the PS-PL AXI interfaces on Zynq UltraScale+ MPSoC</title><link href="www.j-marjanovic.io/exploring-the-ps-pl-axi-interfaces-on-zynq-ultrascale-mpsoc.html" rel="alternate"></link><published>2021-12-29T18:00:00+01:00</published><updated>2021-12-29T18:00:00+01:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2021-12-29:www.j-marjanovic.io/exploring-the-ps-pl-axi-interfaces-on-zynq-ultrascale-mpsoc.html</id><summary type="html">&lt;p&gt;I recently held &lt;a href="https://indico.desy.de/event/31387/contributions/112660/attachments/70411/89645/mtcaws2021_marjanovic_soc_amcs.pdf"&gt;a presentation at the 10th MicroTCA Workshop for Industry and
Research&lt;/a&gt;, where I
presented some hardware we have developed, discussed the advantages of using
SoCs, and showed a couple of examples where we successfully leveraged the features
of these devices.&lt;/p&gt;
&lt;p&gt;At my day job the dataflow through …&lt;/p&gt;</summary><content type="html">&lt;p&gt;I recently held &lt;a href="https://indico.desy.de/event/31387/contributions/112660/attachments/70411/89645/mtcaws2021_marjanovic_soc_amcs.pdf"&gt;a presentation at the 10th MicroTCA Workshop for Industry and
Research&lt;/a&gt;, where I
presented some hardware we have developed, discussed the advantages of using
SoCs, and showed a couple of examples where we successfully leveraged the features
of these devices.&lt;/p&gt;
&lt;p&gt;At my day job the dataflow through the system is relatively simple, the data
is captured in large chunks and the direction is always well defined, i.e.
for data acquisition systems from FPGA to CPU and then into the network/storage.&lt;/p&gt;
&lt;p&gt;On the other hand, there are a lot of applications that can benefit from a
tighter coupling between the CPU and the FPGA, for example: HW-accelerated
compression and decompression, HW-accelerated regex, HW-accelerated graph
traversal, machine learning accelerators,...&lt;/p&gt;
&lt;p&gt;This is why I have decided to explore all ports available in Zynq US+ MPSoC in
my free time and to describe this adventure in this blog post.&lt;/p&gt;
&lt;h1&gt;Introduction&lt;/h1&gt;
&lt;p&gt;Xilinx Zynq UltraScale+ MPSoC provides four different types of interfaces
between the so-called &lt;em&gt;Processing System (PS)&lt;/em&gt; and &lt;em&gt;Programmable Logic (PL)&lt;/em&gt;,
leveraging the wide variety of different protocols standardized in &lt;a href="https://developer.arm.com/architectures/system-architectures/amba"&gt;Advanced
Microcontroller Bus
Architecture&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In this blog post I will explore the performance characteristics of three
different interfaces:  &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Accelerator Coherency Port interface&lt;/li&gt;
&lt;li&gt;High-Performance Coherent interface&lt;/li&gt;
&lt;li&gt;High-Performance interface&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;AXI Coherency Extension&lt;/em&gt; port is the most interesting of all and it deserves
a dedicated blog post.&lt;/p&gt;
&lt;p&gt;The purpose of this blog post is to gather information on how to make the
interfaces work correctly and to measure the performance in different scenarios.
The information on various interfaces is scattered across different documents
(&lt;a href="https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf"&gt;Zynq UltraScale+ Device Technical Reference
Manual&lt;/a&gt;,
&lt;a href="https://developer.arm.com/documentation/ddi0470/"&gt;CoreLink CCI-400 Cache Coherent Interconnect Technical Reference
Manual&lt;/a&gt;, and &lt;a href="https://developer.arm.com/documentation/ddi0500/"&gt;Arm Cortex-A53
MPCore Processor Technical Reference
Manual&lt;/a&gt;), &lt;a href="https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842098/Zynq+UltraScale+MPSoC+Cache+Coherency"&gt;Xilinx wiki
pages&lt;/a&gt;,
&lt;a href="https://www.xilinx.com/support/answers/69446.html"&gt;Xilinx Answer Records&lt;/a&gt; and
various forum threads and GitHub issues.&lt;/p&gt;
&lt;h2&gt;PL-PS Interfaces&lt;/h2&gt;
&lt;p&gt;Shown in the figure below are the interfaces between PS and PL.&lt;/p&gt;
&lt;p&gt;&lt;img alt="PS-PL AXI Interface Datapaths - from UG1085 (courtesy Xilinx)" src="www.j-marjanovic.io/images/2021_zynqmp_ports/intro/ug1085_ps_pl_axi_interface.png" style="width:40%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;In this blog post I will focus on the following three interfaces:&lt;/p&gt;
&lt;h3&gt;High-Performance interface&lt;/h3&gt;
&lt;p&gt;This interface connects directly to the &lt;em&gt;DDR Memory Subsystem&lt;/em&gt; and completely
bypasses the Cache Coherent Interconnect and the APU. According to UG1085, this
interface is ideal for large datasets. The software needs to bypass the
cache when accessing the data.&lt;/p&gt;
&lt;h3&gt;High-Performance Coherent interface&lt;/h3&gt;
&lt;p&gt;This interface is connected to a port on CCI-400 interconnect. When configured
accordingly the memory transactions are communicated to the APU, providing
tighter integration with software.&lt;/p&gt;
&lt;p&gt;AR 69446 mentions that:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The HPC ports are preferable to the ACP port in most applications as they
provide higher bandwidth and do not disturb the contents of the processor L2
cache.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;Accelerator Coherency Port interface&lt;/h3&gt;
&lt;p&gt;Cortex-A53 TRM describes the ACP interface in the following way:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The ACP is provided to reduce software cache maintenance operations when
sharing memory regions with other masters, and to allow other masters to
allocate data into the L2 cache.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;According to UG1085:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;[...] ACP is optimal for medium-grain acceleration, such as a
block-level crypto accelerator and video macro-block level processing.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h1&gt;Motivation&lt;/h1&gt;
&lt;p&gt;Cache-aware interfaces allow tighter integration between SW and HW which is of
particular interest for accelerating workloads with FPGA. Instead of explicitly
copying the data to the FPGA, doing the computation in the FPGA, and copying
data back from the FPGA, with cache-aware interfaces we can modify the data
directly in the program memory, thus skipping the unnecessary copying.&lt;/p&gt;
&lt;p&gt;I have prepared &lt;a href="https://github.com/j-marjanovic/meta-zynqmp-pl-ps-interfaces/blob/rel-v2021.1/recipes-app/numpy-acc/files/numpy_example.ipynb"&gt;a Jupyter
notebook&lt;/a&gt;
that demonstrates this approach. An AXI Proxy (explained in detail in the
next chapter)  is used to read from and to a Numpy array.&lt;/p&gt;
&lt;p&gt;The screenshot below shows the most important part of the notebook, where the
AXI Proxy is used to modify the content of the Numpy array from the FPGA side.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Writing to a Numpy array" src="www.j-marjanovic.io/images/2021_zynqmp_ports/motivation/numpy_acc.png" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;h1&gt;Test setup&lt;/h1&gt;
&lt;h2&gt;Hardware&lt;/h2&gt;
&lt;p&gt;All measurements were performed on an Ultra96-V2 board. The board contains an
XCZU3EG, 2GB of DDR4 memory connected to the Processing System, and not much
more.&lt;/p&gt;
&lt;p&gt;The blue LED near the USB connector is used to indicate activity on AXI PL-PS ports:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Ultra96-V2" src="www.j-marjanovic.io/images/2021_zynqmp_ports/intro/u96.jpg" style="width:50%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;h2&gt;AXI Proxy&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://github.com/j-marjanovic/chisel-stuff/tree/master/example-12-axi-proxy"&gt;AXI Proxy
IP&lt;/a&gt;
acts (as the name suggests) as a proxy between an AXI4-Lite subordinate port and
an AXI4 manager port. With this IP we can measure read and write latency on
all aforementioned ports.&lt;/p&gt;
&lt;p&gt;The subordinate port provides registers where the software can prepare the data
to be written, retrieve the data which was read, start read and write
transactions and measure the time an individual transaction took. The
transaction time is measured in clock cycles between an address being provided
(&lt;code&gt;AxVALID&lt;/code&gt; going high) and a full response is received. To match the
requirements of the ACP interface, all transactions are 64-byte long (4 beats,
128-bit wide). The software has also the possibility to set some of the &lt;a href="https://github.com/j-marjanovic/chisel-stuff/blob/master/example-12-axi-proxy/src/main/scala/axi_proxy/AxiProxy.scala#L136"&gt;AXI
side-band
signals&lt;/a&gt;
which affect the caching properties: &lt;code&gt;AxCACHE&lt;/code&gt;, &lt;code&gt;AxPROT&lt;/code&gt;, and &lt;code&gt;AxUSER&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Shown in the figure below is the Vivado block diagram used to perform the tests
with &lt;strong&gt;AXI Proxy&lt;/strong&gt;. There are three instances of the IP, each connected to one
of the ports on the Zynq MPSoC block. System ILA is used to provide additional
visibility of the connections between AXI Proxy and PL-PS ports on the Zynq
UltraScale+ MPSoC block.&lt;/p&gt;
&lt;p&gt;&lt;img alt="AXI Proxy" src="www.j-marjanovic.io/images/2021_zynqmp_ports/intro/vivado_axi_proxy.png" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;h2&gt;AXI Traffic Generator&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://github.com/j-marjanovic/chisel-stuff/tree/master/example-13-axi-traffic-gen"&gt;AXI Traffic
Generator&lt;/a&gt;
can generate a sequence of AXI bursts with an incrementing address and a known
pattern, which simulates behavior of a DMA. With this IP we can measure the read
and write throughput of transactions of different sizes.&lt;/p&gt;
&lt;p&gt;Also here each AXI burst contains 64 bytes (4 beats, 128-bit). The IP measures
the number of clock cycles the entire transfer took.&lt;/p&gt;
&lt;p&gt;Shown in the figure below is the Vivado block diagram used to perform the tests
with &lt;strong&gt;Jan's AXI Traffic Generator&lt;/strong&gt;. As with the AXI Proxy, there are three
instances and a System ILA to observe the transactions.&lt;/p&gt;
&lt;p&gt;&lt;img alt="AXI Traffic Generator" src="www.j-marjanovic.io/images/2021_zynqmp_ports/intro/vivado_axi_tg.png" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;h2&gt;Yocto layer&lt;/h2&gt;
&lt;p&gt;To generate the SD card image containing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the FPGA bitstreams, &lt;/li&gt;
&lt;li&gt;FPGA manager (to download the bitstreams), &lt;/li&gt;
&lt;li&gt;device tree overlays (the description of FPGA),&lt;/li&gt;
&lt;li&gt;programs to control the IPs, &lt;/li&gt;
&lt;li&gt;the required Python libraries and &lt;/li&gt;
&lt;li&gt;Jupyter notebooks&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I have created a Yocto layer, available on my GitHub:
&lt;a href="https://github.com/j-marjanovic/meta-zynqmp-pl-ps-interfaces"&gt;meta-zynqmp-pl-ps-interfaces&lt;/a&gt;.
The usage description is available in the &lt;code&gt;README.md&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;Configuration&lt;/h2&gt;
&lt;h3&gt;u-dma-buf&lt;/h3&gt;
&lt;p&gt;For use with ACP and HPC port, the u-dma-buffer needs to have the
&lt;a href="https://github.com/ikwzm/udmabuf#dma-coherent"&gt;&lt;code&gt;dma-coherent&lt;/code&gt;&lt;/a&gt; flag set.
This is achieved with &lt;a href="https://github.com/j-marjanovic/meta-zynqmp-pl-ps-interfaces/blob/rel-v2021.1/recipes-bsp/device-tree/files/app-pl-custom.dtsi#L6-L12"&gt;an entry&lt;/a&gt; in the device tree.&lt;/p&gt;
&lt;h3&gt;HP port&lt;/h3&gt;
&lt;p&gt;HP port requires no special configuration on the FPGA side, and the &lt;code&gt;u-dma-buf&lt;/code&gt;
needs to be opened with the &lt;a href="https://github.com/ikwzm/udmabuf#when-hardware-does-not-maintain-the-coherency"&gt;&lt;code&gt;O_SYNC&lt;/code&gt; flag&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;HPC port&lt;/h3&gt;
&lt;h4&gt;Addressing&lt;/h4&gt;
&lt;p&gt;The HPC by default uses physical addressing, although the SMMU can be configured
to use the virtual addressing, as mentioned in UG1085:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;"Comparably, S_AXI_HPCx_FPD uses a virtual address [...]"&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h4&gt;FSBL and reg.init&lt;/h4&gt;
&lt;p&gt;&lt;a href="https://github.com/j-marjanovic/meta-zynqmp-pl-ps-interfaces/tree/rel-v2021.1/recipes-bsp/fsbl/files"&gt;A couple of
patches&lt;/a&gt;
for the FSBL configure CCI in the correct configuration so that the
transactions on the HPC port are considered shareable.&lt;/p&gt;
&lt;h3&gt;ACP port&lt;/h3&gt;
&lt;p&gt;The ACP interface is in detail described in &lt;a href="https://developer.arm.com/documentation/ddi0500/e/level-2-memory-system/acp"&gt;ARM Cortex-A53 MPCore Processor
Technical Reference
Manual&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Since this interface provides direct access into L2 cache, and therefore the
transfer needs to follow certain restrictions: to achieve the best performance
the transfers should be 64 bytes long (one cache line), and &lt;code&gt;AxCACHE&lt;/code&gt; and
&lt;code&gt;AxPROT&lt;/code&gt; signals need to be set to certain values.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://developer.arm.com/documentation/ihi0022/e/AMBA-AXI3-and-AXI4-Protocol-Specification/Transaction-Attributes/Memory-types"&gt;Memory types&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developer.arm.com/documentation/ihi0022/e/AMBA-AXI3-and-AXI4-Protocol-Specification/Transaction-Attributes/Access-permissions"&gt;Access permissions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developer.arm.com/documentation/ddi0500/e/level-2-memory-system/acp/acp-user-signals"&gt;ACP user signals&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;regs.init&lt;/h4&gt;
&lt;p&gt;&lt;a href="https://github.com/j-marjanovic/meta-zynqmp-pl-ps-interfaces/blob/rel-v2021.1/recipes-bsp/bootbin/files/regs.init"&gt;regs.init&lt;/a&gt; is used to write to the APU Configuration Register (&lt;code&gt;LPD_SLCR&lt;/code&gt;), which
enables the broadcasting of the transactions towards the CCI, as described on
&lt;a href="https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842098/Zynq+UltraScale+MPSoC+Cache+Coherency#ZynqUltraScaleMPSoCCacheCoherency-5.2.2RegisterWriteAtEarlyBoot"&gt;Xilinx Wiki on Cache
Coherence&lt;/a&gt;.&lt;/p&gt;
&lt;h1&gt;Measurements&lt;/h1&gt;
&lt;h2&gt;Throughput/interface utilization&lt;/h2&gt;
&lt;p&gt;The throughput was measured with the AXI Traffic Generator, described in one of the
previous chapters. What we measure here is not strictly throughput but
interface utilization, i.e. in what percentage of the clock cycles was there a
beat transmitted on the port. One can easily derive the throughput (in B/s) from
the interface parameters (16-byte wide, running at 250 MHz). During the tests,
only one interface was active at the time.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The entire measurement procedure is documented in a Jupyter notebook: &lt;a href="https://github.com/j-marjanovic/meta-zynqmp-pl-ps-interfaces/blob/rel-v2021.1/recipes-app/apps-pl-ps-interfaces/files/notebooks/00-traffic-gen.ipynb"&gt;00-traffic-gen.ipynb&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;The FPGA project for Ultra96-V2 board is available here: &lt;a href="https://github.com/j-marjanovic/chisel-stuff/tree/master/example-13-axi-traffic-gen/ultra96v2_prj"&gt;example-13-axi-traffic-gen/ultra96v2_prj&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The final results are presented in the graph below.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Interface utilization measurement" src="www.j-marjanovic.io/images/2021_zynqmp_ports/meas/throughput_250MHz.png" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;The HP port exhibits a typical behavior of a process where the start-up time
plays a significant role - the interface utilization is lower for smaller
transactions, but approaches an asymptotical limit at a certain size. All ports
use the same settings (4 transactions in flight, 64 bytes per transaction) to
make the comparison between different ports fair; this value is clearly too
low for the long latencies of DDR4. To achieve higher throughput, one should
increase the burst length.&lt;/p&gt;
&lt;p&gt;CCI adds additional latency to the HPC port and the throughput on this port is
even lower. Interestingly, once the transfer size is larger than the L2 cache,
the interface utilization starts getting higher. This would indicate that the
cache switches to write-through "mode" once it has seen a certain access
pattern; the CCI400 TRM is quite vague in this regard.&lt;/p&gt;
&lt;p&gt;The ACP benefits from the L1 and L2 caches and the interface utilization is the
highest of all ports for small transfers (below 1 MB). Once the transfer size
gets larger than the L2 cache size the interface utilization is severely
affected.&lt;/p&gt;
&lt;h2&gt;Latency&lt;/h2&gt;
&lt;p&gt;The latency was measured with AXI Proxies. A single measurement sample was
obtained with the following procedure:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;the SW generates a random value and instructs AXI Proxy to write this random
   value into a shared buffer through one of the ports&lt;/li&gt;
&lt;li&gt;the SW then reads the data from the buffer and compares it to what AXI
   Proxy has written&lt;/li&gt;
&lt;li&gt;the SW then generates another random value and writes it directly to the
   buffer&lt;/li&gt;
&lt;li&gt;the SW then instructs the AXI Proxy to read the data and then compares the
   value to the previously generated value&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;With this procedure we can check that the writes and reads are visible in both
directions, and this procedure also mimics a typical usage of an FPGA
accelerator. During the tests, only one interface was active at the time.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The entire measurement procedure is documented in a Jupyter notebook: &lt;a href="https://github.com/j-marjanovic/meta-zynqmp-pl-ps-interfaces/blob/rel-v2021.1/recipes-app/apps-pl-ps-interfaces/files/notebooks/02-axi-proxy-on-repeat.ipynb"&gt;02-axi-proxy-on-repeat.ipynb&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;The FPGA project for Ultra96 board is available here: &lt;a href="https://github.com/j-marjanovic/chisel-stuff/tree/master/example-12-axi-proxy/ultra96v2_prj"&gt;example-12-axi-proxy/ultra96v2_prj&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The results of the latency measurements are presented in the graph below.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Latency measurement" src="www.j-marjanovic.io/images/2021_zynqmp_ports/meas/latency_250MHz.png" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;On the cached interfaces (ACP and HPC) we see, expectedly, very uniform latency
values. On the other side, on the HP interface we see that the latency to the
DDR4 is quite high and also quite variable, presumably because the transaction
needs to be scheduled after some other transactions or memory refresh is
currently being performed.&lt;/p&gt;
&lt;p&gt;There is an interesting start-up behavior on the ACP and HPC interfaces. I am
assuming that the L2 cache is an &lt;em&gt;inclusive cache&lt;/em&gt; and contains all lines from
the L1 cache. When a write request arrives at the L2 cache and the line is in
&lt;em&gt;Invalid&lt;/em&gt; state, the L2 can immediately decide that it can acknowledge the write
transaction. When the write request arrives at the L1 cache, however, the L1
cache needs to first inform the L2 cache to invalidate the line, and only then
proceed by acknowledging the write transaction.&lt;/p&gt;
&lt;h1&gt;Conclusion&lt;/h1&gt;
&lt;p&gt;In this blog post I have explored three different types of interfaces between PL
and PS in Zynq UltraScale+ MPSoC. Different types of interfaces provide
different trade-offs in terms of coupling between SW/HW, ease of use,
throughput, and latency. Cache-aware interfaces can be used to access the data
structures in running software and can therefore be used to seamlessly
accelerate certain parts of the program.&lt;/p&gt;
&lt;p&gt;Some edge cases were not explored (e.g. increasing the burst length for the
interface utilization measurement), but I decided to skip those to make the blog
post short(er). With Zynq US+ MPSoC boards being ubiquitous, this task is "left
as an exercise to the reader".&lt;/p&gt;
&lt;p&gt;I hope that this blog post, alongside the Vivado projects and the Yocto layer
can be used as a reference on how to use those ports.&lt;/p&gt;
&lt;hr&gt;
&lt;h1&gt;Appendix&lt;/h1&gt;
&lt;h2&gt;System ILA waveforms&lt;/h2&gt;
&lt;h3&gt;AXI Proxy&lt;/h3&gt;
&lt;h4&gt;HP port&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;root@u96v2-sbc:~#&lt;/span&gt; app-axi-proxy --interface hp --use-osync
&lt;span class="go"&gt;Udmabuf&lt;/span&gt;
&lt;span class="go"&gt;  name = axi:udmabuf@0x0&lt;/span&gt;
&lt;span class="go"&gt;  virt addr = 0xffff9c0c7000&lt;/span&gt;
&lt;span class="go"&gt;  phys addr = 0x5e100000&lt;/span&gt;
&lt;span class="go"&gt;  size = 33554432&lt;/span&gt;
&lt;span class="go"&gt;  flags = 0x101002&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  5, name =             AxiProxy, addr = 0xa0020000, size = 65536, note = hp}&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  6, name =             AxiProxy, addr = 0xa0030000, size = 65536, note = hpc}&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  4, name =             AxiProxy, addr = 0xa0040000, size = 65536, note = acp}&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  1, name =             axi-pmon, addr = 0xfd0b0000, size = 65536, note = }&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  2, name =             axi-pmon, addr = 0xfd490000, size = 65536, note = }&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  0, name =             axi-pmon, addr = 0xffa00000, size = 65536, note = }&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  3, name =             axi-pmon, addr = 0xffa10000, size = 65536, note = }&lt;/span&gt;
&lt;span class="go"&gt;AxiProxy info:&lt;/span&gt;
&lt;span class="go"&gt;  id reg = 0xa8122081&lt;/span&gt;
&lt;span class="go"&gt;  version = 0.3.1&lt;/span&gt;
&lt;span class="go"&gt;SW read, HW written:&lt;/span&gt;
&lt;span class="go"&gt;  10, 10&lt;/span&gt;
&lt;span class="go"&gt;  20, 20&lt;/span&gt;
&lt;span class="go"&gt;  30, 30&lt;/span&gt;
&lt;span class="go"&gt;  40, 40&lt;/span&gt;
&lt;span class="go"&gt;readback, expected:&lt;/span&gt;
&lt;span class="go"&gt;  10, 10&lt;/span&gt;
&lt;span class="go"&gt;  20, 20&lt;/span&gt;
&lt;span class="go"&gt;  30, 30&lt;/span&gt;
&lt;span class="go"&gt;  40, 40&lt;/span&gt;
&lt;span class="go"&gt;stats: dur_wr = 36, dur_rd = 70&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;&lt;img alt="AXI Proxy read and write - HP port" src="www.j-marjanovic.io/images/2021_zynqmp_ports/system_ila/proxy_hp.png" style="width:100%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;h4&gt;HPC port&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;root@u96v2-sbc:~#&lt;/span&gt; app-axi-proxy --interface hpc --axi-cache &lt;span class="m"&gt;15&lt;/span&gt; --axi-prot &lt;span class="m"&gt;2&lt;/span&gt; --axi-user &lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="go"&gt;Udmabuf&lt;/span&gt;
&lt;span class="go"&gt;  name = axi:udmabuf@0x0&lt;/span&gt;
&lt;span class="go"&gt;  virt addr = 0xffff931d8000&lt;/span&gt;
&lt;span class="go"&gt;  phys addr = 0x5e100000&lt;/span&gt;
&lt;span class="go"&gt;  size = 33554432&lt;/span&gt;
&lt;span class="go"&gt;  flags = 0x2&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  5, name =             AxiProxy, addr = 0xa0020000, size = 65536, note = hp}&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  6, name =             AxiProxy, addr = 0xa0030000, size = 65536, note = hpc}&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  4, name =             AxiProxy, addr = 0xa0040000, size = 65536, note = acp}&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  1, name =             axi-pmon, addr = 0xfd0b0000, size = 65536, note = }&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  2, name =             axi-pmon, addr = 0xfd490000, size = 65536, note = }&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  0, name =             axi-pmon, addr = 0xffa00000, size = 65536, note = }&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  3, name =             axi-pmon, addr = 0xffa10000, size = 65536, note = }&lt;/span&gt;
&lt;span class="go"&gt;AxiProxy info:&lt;/span&gt;
&lt;span class="go"&gt;  id reg = 0xa8122081&lt;/span&gt;
&lt;span class="go"&gt;  version = 0.3.1&lt;/span&gt;
&lt;span class="go"&gt;SW read, HW written:&lt;/span&gt;
&lt;span class="go"&gt;  10, 10&lt;/span&gt;
&lt;span class="go"&gt;  20, 20&lt;/span&gt;
&lt;span class="go"&gt;  30, 30&lt;/span&gt;
&lt;span class="go"&gt;  40, 40&lt;/span&gt;
&lt;span class="go"&gt;readback, expected:&lt;/span&gt;
&lt;span class="go"&gt;  10, 10&lt;/span&gt;
&lt;span class="go"&gt;  20, 20&lt;/span&gt;
&lt;span class="go"&gt;  30, 30&lt;/span&gt;
&lt;span class="go"&gt;  40, 40&lt;/span&gt;
&lt;span class="go"&gt;stats: dur_wr = 46, dur_rd = 40&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;&lt;img alt="AXI Proxy read and write - HPC port" src="www.j-marjanovic.io/images/2021_zynqmp_ports/system_ila/proxy_hpc.png" style="width:100%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;h4&gt;ACP port&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;root@u96v2-sbc:~#&lt;/span&gt; app-axi-proxy --interface acp --axi-cache &lt;span class="m"&gt;15&lt;/span&gt; --axi-prot &lt;span class="m"&gt;2&lt;/span&gt; --axi-user &lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="go"&gt;Udmabuf&lt;/span&gt;
&lt;span class="go"&gt;  name = axi:udmabuf@0x0&lt;/span&gt;
&lt;span class="go"&gt;  virt addr = 0xffffa1356000&lt;/span&gt;
&lt;span class="go"&gt;  phys addr = 0x5e100000&lt;/span&gt;
&lt;span class="go"&gt;  size = 33554432&lt;/span&gt;
&lt;span class="go"&gt;  flags = 0x2&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  5, name =             AxiProxy, addr = 0xa0020000, size = 65536, note = hp}&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  6, name =             AxiProxy, addr = 0xa0030000, size = 65536, note = hpc}&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  4, name =             AxiProxy, addr = 0xa0040000, size = 65536, note = acp}&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  1, name =             axi-pmon, addr = 0xfd0b0000, size = 65536, note = }&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  2, name =             axi-pmon, addr = 0xfd490000, size = 65536, note = }&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  0, name =             axi-pmon, addr = 0xffa00000, size = 65536, note = }&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  3, name =             axi-pmon, addr = 0xffa10000, size = 65536, note = }&lt;/span&gt;
&lt;span class="go"&gt;AxiProxy info:&lt;/span&gt;
&lt;span class="go"&gt;  id reg = 0xa8122081&lt;/span&gt;
&lt;span class="go"&gt;  version = 0.3.1&lt;/span&gt;
&lt;span class="go"&gt;SW read, HW written:&lt;/span&gt;
&lt;span class="go"&gt;  10, 10&lt;/span&gt;
&lt;span class="go"&gt;  20, 20&lt;/span&gt;
&lt;span class="go"&gt;  30, 30&lt;/span&gt;
&lt;span class="go"&gt;  40, 40&lt;/span&gt;
&lt;span class="go"&gt;readback, expected:&lt;/span&gt;
&lt;span class="go"&gt;  10, 10&lt;/span&gt;
&lt;span class="go"&gt;  20, 20&lt;/span&gt;
&lt;span class="go"&gt;  30, 30&lt;/span&gt;
&lt;span class="go"&gt;  40, 40&lt;/span&gt;
&lt;span class="go"&gt;stats: dur_wr = 25, dur_rd = 14&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;&lt;img alt="AXI Proxy read and write - ACP port" src="www.j-marjanovic.io/images/2021_zynqmp_ports/system_ila/proxy_acp.png" style="width:100%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;h3&gt;&lt;/h3&gt;
&lt;h4&gt;HP port&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;root@u96v2-sbc:~#&lt;/span&gt; app-axi-traffic-gen --count &lt;span class="m"&gt;32&lt;/span&gt; --interface hp --use-osync
&lt;span class="go"&gt;UioDevice{number =  4, name =        AxiTrafficGen, addr = 0xa0040000, size = 65536, note = acp}&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  5, name =        AxiTrafficGen, addr = 0xa0050000, size = 65536, note = hp}&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  6, name =        AxiTrafficGen, addr = 0xa0060000, size = 65536, note = hpc}&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  1, name =             axi-pmon, addr = 0xfd0b0000, size = 65536, note = }&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  2, name =             axi-pmon, addr = 0xfd490000, size = 65536, note = }&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  0, name =             axi-pmon, addr = 0xffa00000, size = 65536, note = }&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  3, name =             axi-pmon, addr = 0xffa10000, size = 65536, note = }&lt;/span&gt;
&lt;span class="go"&gt;AxiTrafficGen info:&lt;/span&gt;
&lt;span class="go"&gt;  id reg = 0xa8172a9e&lt;/span&gt;
&lt;span class="go"&gt;  version = 0.9.7&lt;/span&gt;
&lt;span class="go"&gt;Udmabuf&lt;/span&gt;
&lt;span class="go"&gt;  name = axi:udmabuf@0x0&lt;/span&gt;
&lt;span class="go"&gt;  virt addr = 0xffff954fe000&lt;/span&gt;
&lt;span class="go"&gt;  phys addr = 0x5e100000&lt;/span&gt;
&lt;span class="go"&gt;  size = 33554432&lt;/span&gt;
&lt;span class="go"&gt;  flags = 0x101002&lt;/span&gt;
&lt;span class="go"&gt;Transfering 32 bursts&lt;/span&gt;
&lt;span class="go"&gt;Memory check (size = 32 bursts) successfully completed&lt;/span&gt;
&lt;span class="go"&gt;Memory check (size = 32 bursts) successfully completed&lt;/span&gt;
&lt;span class="go"&gt;stats: rd cyc = 670, wr cyc = 265, rd_ok = 128&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;&lt;img alt="AXI Traffic Generator write and read - HP port" src="www.j-marjanovic.io/images/2021_zynqmp_ports/system_ila/tg_hp.png" style="width:100%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;h4&gt;HPC port&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;root@u96v2-sbc:~#&lt;/span&gt; app-axi-traffic-gen --count &lt;span class="m"&gt;32&lt;/span&gt; --interface hpc --axi-cache &lt;span class="m"&gt;15&lt;/span&gt; --axi-prot &lt;span class="m"&gt;2&lt;/span&gt; --axi-user &lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  4, name =        AxiTrafficGen, addr = 0xa0040000, size = 65536, note = acp}&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  5, name =        AxiTrafficGen, addr = 0xa0050000, size = 65536, note = hp}&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  6, name =        AxiTrafficGen, addr = 0xa0060000, size = 65536, note = hpc}&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  1, name =             axi-pmon, addr = 0xfd0b0000, size = 65536, note = }&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  2, name =             axi-pmon, addr = 0xfd490000, size = 65536, note = }&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  0, name =             axi-pmon, addr = 0xffa00000, size = 65536, note = }&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  3, name =             axi-pmon, addr = 0xffa10000, size = 65536, note = }&lt;/span&gt;
&lt;span class="go"&gt;AxiTrafficGen info:&lt;/span&gt;
&lt;span class="go"&gt;  id reg = 0xa8172a9e&lt;/span&gt;
&lt;span class="go"&gt;  version = 0.9.7&lt;/span&gt;
&lt;span class="go"&gt;Udmabuf&lt;/span&gt;
&lt;span class="go"&gt;  name = axi:udmabuf@0x0&lt;/span&gt;
&lt;span class="go"&gt;  virt addr = 0xffffa781e000&lt;/span&gt;
&lt;span class="go"&gt;  phys addr = 0x5e100000&lt;/span&gt;
&lt;span class="go"&gt;  size = 33554432&lt;/span&gt;
&lt;span class="go"&gt;  flags = 0x2&lt;/span&gt;
&lt;span class="go"&gt;Transfering 32 bursts&lt;/span&gt;
&lt;span class="go"&gt;Memory check (size = 32 bursts) successfully completed&lt;/span&gt;
&lt;span class="go"&gt;Memory check (size = 32 bursts) successfully completed&lt;/span&gt;
&lt;span class="go"&gt;stats: rd cyc = 391, wr cyc = 756, rd_ok = 128&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;&lt;img alt="AXI Traffic Generator write - HPC port" src="www.j-marjanovic.io/images/2021_zynqmp_ports/system_ila/tg_hpc_write.png" style="width:100%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;&lt;img alt="AXI Traffic Generator reads - HPC port" src="www.j-marjanovic.io/images/2021_zynqmp_ports/system_ila/tg_hpc_read.png" style="width:100%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;h4&gt;ACP port&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;root@u96v2-sbc:~#&lt;/span&gt; app-axi-traffic-gen --count &lt;span class="m"&gt;32&lt;/span&gt; --interface acp --axi-cache &lt;span class="m"&gt;15&lt;/span&gt; --axi-prot &lt;span class="m"&gt;2&lt;/span&gt; --axi-user &lt;span class="m"&gt;1&lt;/span&gt;     
&lt;span class="go"&gt;UioDevice{number =  4, name =        AxiTrafficGen, addr = 0xa0040000, size = 65536, note = acp}&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  5, name =        AxiTrafficGen, addr = 0xa0050000, size = 65536, note = hp}&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  6, name =        AxiTrafficGen, addr = 0xa0060000, size = 65536, note = hpc}&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  1, name =             axi-pmon, addr = 0xfd0b0000, size = 65536, note = }&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  2, name =             axi-pmon, addr = 0xfd490000, size = 65536, note = }&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  0, name =             axi-pmon, addr = 0xffa00000, size = 65536, note = }&lt;/span&gt;
&lt;span class="go"&gt;UioDevice{number =  3, name =             axi-pmon, addr = 0xffa10000, size = 65536, note = }&lt;/span&gt;
&lt;span class="go"&gt;AxiTrafficGen info:&lt;/span&gt;
&lt;span class="go"&gt;  id reg = 0xa8172a9e&lt;/span&gt;
&lt;span class="go"&gt;  version = 0.9.7&lt;/span&gt;
&lt;span class="go"&gt;Udmabuf&lt;/span&gt;
&lt;span class="go"&gt;  name = axi:udmabuf@0x0&lt;/span&gt;
&lt;span class="go"&gt;  virt addr = 0xffff9469a000&lt;/span&gt;
&lt;span class="go"&gt;  phys addr = 0x5e100000&lt;/span&gt;
&lt;span class="go"&gt;  size = 33554432&lt;/span&gt;
&lt;span class="go"&gt;  flags = 0x2&lt;/span&gt;
&lt;span class="go"&gt;Transfering 32 bursts&lt;/span&gt;
&lt;span class="go"&gt;Memory check (size = 32 bursts) successfully completed&lt;/span&gt;
&lt;span class="go"&gt;Memory check (size = 32 bursts) successfully completed&lt;/span&gt;
&lt;span class="go"&gt;stats: rd cyc = 140, wr cyc = 248, rd_ok = 128&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;&lt;img alt="AXI Traffic Generator reads - ACP port" src="www.j-marjanovic.io/images/2021_zynqmp_ports/system_ila/tg_acp.png" style="width:100%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;h2&gt;FSBL output&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="go"&gt;Xilinx Zynq MP First Stage Boot Loader &lt;/span&gt;
&lt;span class="go"&gt;Release 2021.1   Jun  6 2021  -  07:07:32&lt;/span&gt;
&lt;span class="go"&gt;MultiBootOffset: 0x0&lt;/span&gt;
&lt;span class="go"&gt;Reset Mode      :       System Reset&lt;/span&gt;
&lt;span class="go"&gt;Platform: Silicon (4.0), Running on A53-0 (64-bit) Processor, Device Name: XCZU3EG&lt;/span&gt;
&lt;span class="go"&gt;SD0 Boot Mode &lt;/span&gt;
&lt;span class="go"&gt;PMU Firmware 2021.1     Jun  6 2021   07:07:32&lt;/span&gt;
&lt;span class="go"&gt;PMU_ROM Version: xpbr-v8.1.0-0&lt;/span&gt;
&lt;span class="go"&gt;Protection configuration applied&lt;/span&gt;
&lt;span class="go"&gt;EL = 3&lt;/span&gt;
&lt;span class="go"&gt;CCI_REG: register dump&lt;/span&gt;
&lt;span class="go"&gt;  offset 0 = 0&lt;/span&gt;
&lt;span class="go"&gt;  offset 10 = 0&lt;/span&gt;
&lt;span class="go"&gt;  offset 14 = 8000003F&lt;/span&gt;
&lt;span class="go"&gt;  offset 18 = 0&lt;/span&gt;
&lt;span class="go"&gt;  offset 1C = 0&lt;/span&gt;
&lt;span class="go"&gt;  offset 40 = 0&lt;/span&gt;
&lt;span class="go"&gt;CCI_REG: debug enable&lt;/span&gt;
&lt;span class="go"&gt;CCI_REG: register dump&lt;/span&gt;
&lt;span class="go"&gt;  offset 0 = 0&lt;/span&gt;
&lt;span class="go"&gt;  offset 10 = 0&lt;/span&gt;
&lt;span class="go"&gt;  offset 14 = 8000003F&lt;/span&gt;
&lt;span class="go"&gt;  offset 18 = 0&lt;/span&gt;
&lt;span class="go"&gt;  offset 1C = 0&lt;/span&gt;
&lt;span class="go"&gt;  offset 40 = 3&lt;/span&gt;
&lt;span class="go"&gt;CCI: enable snoop, ctrl before = C0000000&lt;/span&gt;
&lt;span class="go"&gt;CCI: enable snoop, ctrl after = C0000001&lt;/span&gt;
&lt;span class="go"&gt;CCI: shareable override reg - before = 0&lt;/span&gt;
&lt;span class="go"&gt;CCI: shareable override reg - after = 3&lt;/span&gt;
&lt;span class="go"&gt;Exit from FSBL &lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;hr&gt;
&lt;div style="font-size: 90%;" &gt;
Xilinx, Inc. Xilinx, the Xilinx logo, Vivado, Zynq are trademarks of Xilinx in the United States and
other countries.
&lt;/div&gt;

&lt;div style="font-size: 90%;" &gt;
AMBA, ARM, Cortex and TrustZone are registered trademarks of ARM Limited (or its
subsidiaries) in the EU and/or elsewhere. CoreLink is  a trademark of ARM
Limited (or its subsidiaries) in the EU and/or elsewhere.
&lt;/div&gt;

&lt;div style="font-size: 90%;" &gt;
All trademarks and registered trademarks are the property of their respective owners.
&lt;/div&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="FPGA, Zynq, Cache-Coherence"></category></entry><entry><title>Notes on "A Primer on Memory Consistency and Cache Coherence"</title><link href="www.j-marjanovic.io/notes-on-a-primer-on-memory-consistency-and-cache-coherence.html" rel="alternate"></link><published>2021-12-23T16:00:00+01:00</published><updated>2021-12-23T16:00:00+01:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2021-12-23:www.j-marjanovic.io/notes-on-a-primer-on-memory-consistency-and-cache-coherence.html</id><summary type="html">&lt;p&gt;&lt;a href="https://www.morganclaypool.com/doi/abs/10.2200/S00962ED2V01Y201910CAC049"&gt;A Primer on Memory Consistency and Cache Coherence, Second Edition&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Authors:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Vijay Nagarajan, University of Edinburgh&lt;/li&gt;
&lt;li&gt;Daniel J. Sorin, Duke University&lt;/li&gt;
&lt;li&gt;Mark D. Hill, University of Wisconsin, Madison&lt;/li&gt;
&lt;li&gt;David A. Wood, University of Wisconsin, Madison&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;"This primer is intended for readers who have encountered memory consistency
and cache coherence informally …&lt;/p&gt;&lt;/blockquote&gt;</summary><content type="html">&lt;p&gt;&lt;a href="https://www.morganclaypool.com/doi/abs/10.2200/S00962ED2V01Y201910CAC049"&gt;A Primer on Memory Consistency and Cache Coherence, Second Edition&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Authors:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Vijay Nagarajan, University of Edinburgh&lt;/li&gt;
&lt;li&gt;Daniel J. Sorin, Duke University&lt;/li&gt;
&lt;li&gt;Mark D. Hill, University of Wisconsin, Madison&lt;/li&gt;
&lt;li&gt;David A. Wood, University of Wisconsin, Madison&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;"This primer is intended for readers who have encountered memory consistency
and cache coherence informally, but now want to understand what they entail
in more detail."&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul&gt;
&lt;li&gt;[this book talks to me]&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Chapter 1&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;memory consistency&lt;/em&gt; - effects of stores and loads on the memory, correct behavior&lt;/li&gt;
&lt;li&gt;&lt;em&gt;cache coherence&lt;/em&gt; - HW implementation to ensure correct behavior when caches are involved&lt;/li&gt;
&lt;li&gt;&lt;em&gt;memory (consistency) model&lt;/em&gt; - the specification about allowed behavior of MT programs&lt;/li&gt;
&lt;li&gt;&lt;em&gt;snooping&lt;/em&gt; - cache controller performs a broadcast&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Chapter 2&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;memory-side cache&lt;/em&gt; (last-level cache) - not a concern in regards to coherence, just reduces the latency&lt;/li&gt;
&lt;li&gt;"Informally, a coherence protocol must ensure that writes are made visible to all processors."&lt;/li&gt;
&lt;li&gt;&lt;em&gt;consistency-agnostic&lt;/em&gt; coherence (atomic writes) vs &lt;em&gt;consistency-directed&lt;/em&gt; coherence ([posted] writes, ordering rules)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;single-writer-multiple-reader (SWMR) invariant&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;data-value invariant&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;other definitions of invariants&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Chapter 3&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;reordering, write buffer&lt;/li&gt;
&lt;li&gt;&lt;em&gt;MC (memory consistency) model&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;core pipeline can have an impact on the consistency&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Sequential consistency&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://lamport.azurewebsites.net/pubs/lamport-how-to-make.pdf"&gt;L. Lamport: How to Make a Correct Multiprocess Program Execute Correctly on a Multiprocessor&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;program order&lt;/em&gt;, &lt;em&gt;memory order&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;op1 &amp;lt;m op2&lt;/code&gt;, &lt;code&gt;op1 &amp;lt;p op2&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;for SC&lt;/strong&gt;: &lt;code&gt;op1 &amp;lt;p op2 ==&amp;gt; op1 &amp;lt;m op2&lt;/code&gt; (&lt;code&gt;==&amp;gt;&lt;/code&gt; = implies)&lt;/li&gt;
&lt;li&gt;requirement: LL, LS, SS, SL dependencies, loads get the value of the last store, atomic RWM&lt;/li&gt;
&lt;li&gt;naive implementation: &lt;strong&gt;the switch&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;speculative loads are just discarded, speculative stores only present the address and not the data --&amp;gt;
  cache can check if (or inform) other caches have the datum&lt;/li&gt;
&lt;li&gt;multi-threading: other threads must have independent write buffer (so to behave like other processors);
  normal write buffers (with queue) are anyway not possible in SC&lt;/li&gt;
&lt;li&gt;[ &lt;a href="https://www.youtube.com/watch?v=EYmEaF4qJ9I"&gt;Mod-01 Lec-32 Case study: MIPS R10000&lt;/a&gt; ]&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Chapter 4&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;total store order (TSO)&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;"[TSO] model astonishes some people..."&lt;/li&gt;
&lt;li&gt;TSO behaves equally for programs that: 1. store data first, 2. then store &lt;code&gt;done&lt;/code&gt; flag&lt;/li&gt;
&lt;li&gt;write buffer, FIFO&lt;/li&gt;
&lt;li&gt;&lt;code&gt;FENCE&lt;/code&gt; instr&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Chapter 5&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;relaxed (weak) memory consistency&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;most application do not require strong consistency&lt;/li&gt;
&lt;li&gt;accesses can be re-ordered, &lt;em&gt;Coalescing Write Buffer&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Example Relaxed Consistency Model (XC)&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;FENCE&lt;/code&gt; instruction&lt;/li&gt;
&lt;li&gt;TSO for accessing the same address (LL, LS, SS)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;data race&lt;/em&gt; - two or more programs accessing the same memory, at least one access is a write&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Release Consistency&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;ACQUIRE&lt;/code&gt;, &lt;code&gt;RELEASE&lt;/code&gt; instead of &lt;code&gt;FENCE&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;RVWMO&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;dependency-induced ordering&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;same address ordering&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Atomic Memory Operation (AMO)&lt;/em&gt; and &lt;em&gt;Load Reserve/Store Conditional&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;IBM POWER&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;"On a first pass of this primer, readers may wish to skim or skip this
section; this memory model is significantly more complicated than the models
presented thus far in this primer."&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;ARMv7, ARMv8&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;ARMv7 similar to POWER&lt;/li&gt;
&lt;li&gt;ARMv8 provides a total memory order, has &lt;code&gt;ACQUIRE&lt;/code&gt; and &lt;code&gt;RELEASE&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;High-level Language Models&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;SC for DRF&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Chapter 6&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;two invariants: SWMR, data-value&lt;/li&gt;
&lt;li&gt;&lt;em&gt;coherence controller&lt;/em&gt;: a FSM, one per each cache (&lt;em&gt;cache controller&lt;/em&gt;) or memory (&lt;em&gt;memory controller&lt;/em&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;coherence protocol&lt;/em&gt;: communication between FSM&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;"Other agents, such as I/O devices, may behave like cache controllers, memory
controllers, or both depending upon their specific requirements."&lt;/p&gt;
&lt;h2&gt;States&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;four characteristics:&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;validity&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;dirtiness&lt;/li&gt;
&lt;li&gt;exclusivity&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;ownership&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;stable states:&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;M&lt;/strong&gt;odified: valid and potentially dirty&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;O&lt;/strong&gt;wned: valid and owned, possibly stale, read-only&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;E&lt;/strong&gt;xclusive: no other cache has a copy, read-only&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;S&lt;/strong&gt;hared: valid but not exclusive and not dirty, read-only&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;I&lt;/strong&gt;nvalid: not present in the cache or stale&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;transient states:
  &lt;div class="math"&gt;$$ XY^Z $$&lt;/div&gt;
&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;snooping&lt;/em&gt; vs &lt;em&gt;directory&lt;/em&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;em&gt;invalidate&lt;/em&gt; vs &lt;em&gt;update&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Chapter 7&lt;/h1&gt;
&lt;p&gt;"all coherence controllers observe (snoop) coherence requests in the same order
and collectively "do the right thing" to maintain coherence."&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;transactions need to be ordered&lt;/li&gt;
&lt;li&gt;&lt;em&gt;serialization (ordering) point&lt;/em&gt;, e.g. bus arbitration&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;baseline: MSI&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;atomic requests and atomic transactions&lt;/li&gt;
&lt;li&gt;processor loads go from Invalid to Shared&lt;/li&gt;
&lt;li&gt;processor stores go from Invalid to Modified&lt;/li&gt;
&lt;li&gt;both loads and stores wait for the data before transitioning to the steady state&lt;/li&gt;
&lt;li&gt;if Modified, GetS and GetM from other cores transition to Shared and Invalid, respectively&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Non-atomic requests, atomic transactions&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;queue or buffer between the cache controller and the bus (or interconnect)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Exclusive state&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;handles case when the block is first read and then written -&amp;gt; no need to inform other cores&lt;/li&gt;
&lt;li&gt;GetS brings the block state to S or E (depends on states of other controllers)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Owned state&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;when a block is in state M or E and receives GetS&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Non-atomic bus&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;in-order&lt;/em&gt; vs &lt;em&gt;out-of-order&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;FIFO queues between the controller and request and response bus&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Interconnect network&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;tree topology&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Timestamp Snooping&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Chapter 8&lt;/h1&gt;
&lt;p&gt;"Directory protocols were originally developed to address the lack of
scalability of snooping protocols."&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;directory&lt;/em&gt; with a global view&lt;/li&gt;
&lt;li&gt;transactions typically involve 2 or 3 steps:&lt;/li&gt;
&lt;li&gt;a request by the controller and a reply by the directory&lt;/li&gt;
&lt;li&gt;a request by the controller, requests to other controllers and replies from other controllers&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;4th step is also possible&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;directory&lt;/em&gt; as the serialization point&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;for some requests (e.g. GetM) all caches with a block in state S must ACK&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;MSI&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;an entry (per each block) consists of the state bits, owner id (if in state M)
  and sharer list (bit vector, if in state S)&lt;/li&gt;
&lt;li&gt;I or S to M: directory returns to the requestor the number of controllers which must ACK&lt;/li&gt;
&lt;li&gt;PutM: contains data (from the controller to the directory)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;virtual networks&lt;/em&gt; (message class) to avoid deadlocks (requests blocking responses)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Exclusive state&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;GetS and not shared by other controllers --&amp;gt; E&lt;/li&gt;
&lt;li&gt;a core can silently (no coherence request required) upgrade E to M&lt;/li&gt;
&lt;li&gt;E block: owned vs not-owned&lt;/li&gt;
&lt;li&gt;PutE (directory from E to I)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Owned state&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[how did it got dirty if it is read-only?]&lt;/li&gt;
&lt;li&gt;Owned is basically a frozen Modified state&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Directory state&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;full representation only for smaller number of cores&lt;/li&gt;
&lt;li&gt;&lt;em&gt;coarse directory&lt;/em&gt;: more controllers are "grouped" in a same bit&lt;/li&gt;
&lt;li&gt;&lt;em&gt;limited pointer directory&lt;/em&gt;: broadcast after I sharers, invalidate one of the sharers, trap to a SW handler&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Directory organization&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;one entry per each block of the memory&lt;/li&gt;
&lt;li&gt;&lt;em&gt;directory cache&lt;/em&gt;: accesses have (usually) good locality, smaller datums ([ROMANES EUNT DOMUS])--&amp;gt; even small caches have a high hit rate&lt;/li&gt;
&lt;li&gt;backed by DRAM&lt;/li&gt;
&lt;li&gt;&lt;em&gt;inclusive&lt;/em&gt;: only for blocks cached, a miss indicates the I state&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Null Directory Cache&lt;/em&gt;: [this sounds more like a snooping protocol]&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Optimizations&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;"The typical, general solution to the problem of a centralized bottleneck is
 to distribute the resource."&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul&gt;
&lt;li&gt;distributed directories&lt;/li&gt;
&lt;li&gt;non-stalling directory protocols&lt;/li&gt;
&lt;li&gt;eviction of blocks in S state&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Chapter 9&lt;/h1&gt;
&lt;h2&gt;Instruction caches&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;"[...] truly self-modifying code is rare,"&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul&gt;
&lt;li&gt;core writes to D&lt;span class="math"&gt;\(, I\)&lt;/span&gt; only observers writes&lt;/li&gt;
&lt;li&gt;&lt;code&gt;icbi&lt;/code&gt; (instruction cache block invalidate) instruction on POWER&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Virtual caches&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;advantage: no need for address translation (on the critical path)&lt;/li&gt;
&lt;li&gt;require reverse address translation&lt;/li&gt;
&lt;li&gt;&lt;em&gt;synonyms&lt;/em&gt;: same virtual address mapped to multiple physical addresses&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Write-true caches&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;two-state protocol (VI)&lt;/li&gt;
&lt;li&gt;simpler evictions&lt;/li&gt;
&lt;li&gt;some issues with multi-threading&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Coherent DMA&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;"DMA controllers have very different locality patterns than conventional cores"
- GetMs are wasteful, the entire block is typically overwritten --&amp;gt; special requests&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;Multi-level caches&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;inclusive cache&lt;/em&gt;: e.g. L2 contains a superset of L1&lt;/li&gt;
&lt;li&gt;multiple multi-processors: LLC as a &lt;em&gt;memory-side&lt;/em&gt; or &lt;em&gt;core-side&lt;/em&gt; cache&lt;/li&gt;
&lt;li&gt;hierarchical protocol: &lt;em&gt;intra-chip&lt;/em&gt;, &lt;em&gt;inter-chip&lt;/em&gt; (e.g. intra-chip snooping, inter-chip directory)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Performance optimization&lt;/h2&gt;
&lt;h3&gt;Migratory sharing&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;one thread reads and writes, then another thread reads and writes, ...&lt;/li&gt;
&lt;li&gt;E state already helps&lt;/li&gt;
&lt;li&gt;HW to detect GetS and then GetM&lt;/li&gt;
&lt;li&gt;alternatively, add Migratory M state (I -&amp;gt; S -&amp;gt; M)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;False sharing&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;more cores accessing the same block without actually dependencies; solutions: sub-block coherence, speculation&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Liveness&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;deadlock - cyclical dependencies&lt;/li&gt;
&lt;li&gt;protocol deadlock&lt;/li&gt;
&lt;li&gt;cache resource deadlock&lt;/li&gt;
&lt;li&gt;virtual networks&lt;/li&gt;
&lt;li&gt;livelock - a special case of starvation&lt;/li&gt;
&lt;li&gt;starvation (solved with fair arbitration)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Token Coherence&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;a third method (alongside snooping and directory-based protocols)&lt;/li&gt;
&lt;li&gt;tokens instead of status bits&lt;/li&gt;
&lt;li&gt;cores exchange tokens&lt;/li&gt;
&lt;li&gt;a core with one token can read, a core with all tokens can write to a block&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Chapter 10&lt;/h1&gt;
&lt;blockquote&gt;
&lt;p&gt;"This is the age of specialization."&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul&gt;
&lt;li&gt;SoC share one physical memories, e.g. GPUs have two separate physical memories&lt;/li&gt;
&lt;li&gt;&lt;em&gt;cooperative thread array (CTA)&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;SIMT - Single-Instruction-Multiple-Thread&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;http://www0.cs.ucl.ac.uk/staff/j.alglave/papers/asplos15.pdf&lt;/li&gt;
&lt;li&gt;GPU: relaxed memory order&lt;/li&gt;
&lt;li&gt;one approach: consistency-agnostics coherence protocol - not suitable for GPUs (large L1 caches, many threads = many transactions)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Temporal coherence&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;lease (limited amount of time), block gets automatically invalidated&lt;/li&gt;
&lt;li&gt;global notion of time&lt;/li&gt;
&lt;li&gt;GetV(t), Write, WriteV(t)&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;stalled writes (until the lease expires)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;GWCT (Global Write Completion Time) - all FENCE must stall until this time&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;performance sensitive to the selection of lease time&lt;/li&gt;
&lt;li&gt;timestamp complexity (e.g. rollover)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Release consistency-directed coherence&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;atomic operations which order memory accesses in one direction (in contrast with FENCE)&lt;/li&gt;
&lt;li&gt;scope (CTA vs GPU)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Heterogeneous systems&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;more devices with different memory consistency models&lt;/li&gt;
&lt;li&gt;intuition: the weaker model of the two&lt;/li&gt;
&lt;li&gt;OpenCL: SC for HRF (Heterogeneous-Race-Free)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Heterogeneous coherence protocols&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;global controller&lt;/li&gt;
&lt;li&gt;local controllers contain shims (or translators)&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;"the global coherence interface must disambiguate CPU sharers from GPU
sharers"&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul&gt;
&lt;li&gt;coherence tracking at a larger granularity (e.g. page instead of a cache line)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;scratchpads&lt;/em&gt; in GPUs (programmer-controlled memories)&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Chapter 11&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;specification = contract between the user and the implementation&lt;/li&gt;
&lt;li&gt;&lt;em&gt;input actions&lt;/em&gt;, &lt;em&gt;internal actions&lt;/em&gt;, &lt;em&gt;output actions&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;safety&lt;/em&gt; and &lt;em&gt;liveness&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Operational specification&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;state machine&lt;/li&gt;
&lt;li&gt;liveness ensured with &lt;em&gt;temporal logic&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;linearizability&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://mclab.di.uniroma1.it/site/index.php/software/18-cmurphi"&gt;CMurphi&lt;/a&gt; and TLA&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Axiomatic specification&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Alloy, Herd&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Litmus tests&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;https://www.cl.cam.ac.uk/~sf502/regressions/rmem/&lt;/li&gt;
&lt;li&gt;https://github.com/nvlabs/litmustestgen&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Validation&lt;/h2&gt;
&lt;h3&gt;Formal methods&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;manual methods (e.g. assigning timestamps as values)&lt;/li&gt;
&lt;li&gt;model checkers (state space becomes quickly very large)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Off-line testing&lt;/li&gt;
&lt;li&gt;On-line testing (checker implemented in HW)&lt;/li&gt;
&lt;/ul&gt;
&lt;script type="text/javascript"&gt;if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
    var align = "center",
        indent = "0em",
        linebreak = "false";

    if (false) {
        align = (screen.width &lt; 768) ? "left" : align;
        indent = (screen.width &lt; 768) ? "0em" : indent;
        linebreak = (screen.width &lt; 768) ? 'true' : linebreak;
    }

    var mathjaxscript = document.createElement('script');
    mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
    mathjaxscript.type = 'text/javascript';
    mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML';

    var configscript = document.createElement('script');
    configscript.type = 'text/x-mathjax-config';
    configscript[(window.opera ? "innerHTML" : "text")] =
        "MathJax.Hub.Config({" +
        "    config: ['MMLorHTML.js']," +
        "    TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'none' } }," +
        "    jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
        "    extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
        "    displayAlign: '"+ align +"'," +
        "    displayIndent: '"+ indent +"'," +
        "    showMathMenu: true," +
        "    messageStyle: 'normal'," +
        "    tex2jax: { " +
        "        inlineMath: [ ['\\\\(','\\\\)'] ], " +
        "        displayMath: [ ['$$','$$'] ]," +
        "        processEscapes: true," +
        "        preview: 'TeX'," +
        "    }, " +
        "    'HTML-CSS': { " +
        "        availableFonts: ['STIX', 'TeX']," +
        "        preferredFont: 'STIX'," +
        "        styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
        "        linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
        "    }, " +
        "}); " +
        "if ('default' !== 'default') {" +
            "MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
                "var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
                "VARIANT['normal'].fonts.unshift('MathJax_default');" +
                "VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
                "VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
                "VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
            "});" +
            "MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
                "var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
                "VARIANT['normal'].fonts.unshift('MathJax_default');" +
                "VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
                "VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
                "VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
            "});" +
        "}";

    (document.body || document.getElementsByTagName('head')[0]).appendChild(configscript);
    (document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
&lt;/script&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="misc"></category><category term="Books"></category></entry><entry><title>Stratix V accelerator card from eBay, part 8</title><link href="www.j-marjanovic.io/stratix-v-accelerator-card-from-ebay-part-8.html" rel="alternate"></link><published>2021-10-31T11:00:00+01:00</published><updated>2021-10-31T11:00:00+01:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2021-10-31:www.j-marjanovic.io/stratix-v-accelerator-card-from-ebay-part-8.html</id><summary type="html">&lt;p&gt;My &lt;a href="stratix-v-accelerator-card-from-ebay-part-7.html"&gt;last blog&lt;/a&gt; post on this
topic explored the PCI Express connections on this board. I have determined that
the board contains a custom FPGA with two Hard IPs for PCIe, where the
commercially-available part only contains one. At the end of the post I explored
different options on how …&lt;/p&gt;</summary><content type="html">&lt;p&gt;My &lt;a href="stratix-v-accelerator-card-from-ebay-part-7.html"&gt;last blog&lt;/a&gt; post on this
topic explored the PCI Express connections on this board. I have determined that
the board contains a custom FPGA with two Hard IPs for PCIe, where the
commercially-available part only contains one. At the end of the post I explored
different options on how to access/enable the second IP, and none of my attempts
were successful. The post concluded on an optimistic note, with an expectation
that this obstacle will somehow be solved.&lt;/p&gt;
&lt;p&gt;Luckily, a couple of weeks after my post, while I was still contemplating how to
approach this obstacle, &lt;a href="https://twitter.com/gatecatte"&gt;@gatecatte&lt;/a&gt; managed to
figure out how to convince Quartus to let us generate a bitstream with two Hard
IP blocks; one just needs to override
&lt;code&gt;DEV_DIE_INFO::is_global_id_enabled&lt;/code&gt; function to return &lt;code&gt;true&lt;/code&gt;.
&lt;a href="https://twitter.com/rombik_su"&gt;@rombik_su&lt;/a&gt; then prepared an easy-to-use patch,
available
&lt;a href="https://gist.github.com/wirebond/9e75db58112bb49c6b2debad7dc13cb2"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This discovery enabled me to continue with the preparation of an example design
for the Pikes Peak and Storey Peaks boards.&lt;/p&gt;
&lt;p&gt;In this blog post I describe the work done to get the PCIe interface (including
a high-performance DMA) up and running and present some results of integration
tests with two different computers.&lt;/p&gt;
&lt;h1&gt;Introduction&lt;/h1&gt;
&lt;h2&gt;Existing IPs&lt;/h2&gt;
&lt;p&gt;Quartus contains several IPs for PCI Express, where the Hard IP is wrapped with
different interfaces. The easiest one to use would be the &lt;em&gt;Avalon-MM Stratix V
Hard IP for PCI Express&lt;/em&gt;; this IP provides an Avalon-MM Host interface to handle
the Memory Reads and Memory Writes as well as an Avalon-MM Agent interface to
provide a Direct Memory Access (DMA) port. Another IP already contained in
Quartus, called &lt;em&gt;Modular Scatter-Gather DMA&lt;/em&gt; can be directly connected to this
port to transfer the data between the on-board DDR3 memory and the system
memory.&lt;/p&gt;
&lt;p&gt;Unfortunately, for some reason, this IP does not support the operation in Gen3
x8 mode.&lt;/p&gt;
&lt;p&gt;&lt;img alt="PCIe IP with Avalon-MM interface" src="www.j-marjanovic.io/images/2021_fpga_card_part_8/avalon_pcie_sv_hip_avmm.png" style="width:50%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;h2&gt;Development of a new IP&lt;/h2&gt;
&lt;p&gt;At this point I have decided to have some fun and develop a PCIe endpoint IP
from scratch. The following were my requirements/guidelines:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;integration with Stratix V Hard IP&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;high throughput&lt;/strong&gt;: the IP should be able to operate with a 256-bit-wide
  interface at 250 MHz, and achieve high interface utilization&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;relatively simple&lt;/strong&gt;: it is reasonable to sacrifice a small amount of
  performance to keep the IP simple. For example I do not use the "Multiple
    packets per cycle" option - this reduces the maximum throughput by roughly
    6% (16 bytes per 256-byte packet), but makes the logic much simpler.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;use of standardized interfaces&lt;/strong&gt;: e.g. Avalon-MM and Avalon-ST&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;a base for custom developments&lt;/strong&gt;: I want to reuse this IP for other
  applications with this board, and the IP should be general enough to allow
    different use cases&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;written in &lt;a href="https://www.chisel-lang.org/"&gt;Chisel HDL&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After several weekends of development I am proud to present my creation: 
&lt;a href="https://github.com/j-marjanovic/chisel-stuff/tree/master/example-14-pcie-endpoint"&gt;PCIe endpoint with a high-performance DMA and an Avalon-MM interface&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The IP is currently in version 0.7 - it is usable and it works, but some
edge cases are not yet handled (one is described in this blog post) and
some smaller improvements are also expected in the future.&lt;/p&gt;
&lt;h2&gt;Architecture&lt;/h2&gt;
&lt;p&gt;The figure below shows the main building block of the IP; several
modules are responsible for the reception part and several modules
responsible for the transmission part.&lt;/p&gt;
&lt;p&gt;&lt;img alt="PCIe endpoint" src="www.j-marjanovic.io/images/2021_fpga_card_part_8/pcie_endpoint.drawio.png" style="width:50%; display: block; margin-left: auto; margin-right: auto;"&gt; &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;MemoryReadWriteCompl&lt;/strong&gt; parses the received packet and forwards it to the
  next corresponding module&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AvalonAgent&lt;/strong&gt; converts MWr and MRd PCIe packets into transactions on
  Avalon-MM interface; it is able to handle both 32-bit and 64-bit requests (by
  performing two accesses on the 32-bit interface)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CompletionRecv&lt;/strong&gt; parses the received completion packets and forwards them on
  an Avalon-ST interface. It informs the &lt;strong&gt;BusManagerEngine&lt;/strong&gt; about the received
  completion packets so that the later can keep the track of the number of dwords in
  flight.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;BusManagerRegs&lt;/strong&gt; provides a read and write access to registers (similar to
  &lt;strong&gt;AvalonAgent&lt;/strong&gt;) which are needed for the DMA engine (e.g. destination address,
    number of bytes to transfer, ...)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;BusManagerEngine&lt;/strong&gt; is the main module which performs the DMA function - it
  generates either MWr or MRd (depending on the instructions from software). When
    the transfer is complete it informs the &lt;strong&gt;InterruptCtrl&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CompletionGen&lt;/strong&gt; generates a completion packet after a MRd packets with the
  information provided by a corresponding module&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;TxArbiter&lt;/strong&gt; manages the access to the &lt;code&gt;tx_st&lt;/code&gt; port&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;InterruptCtrl&lt;/strong&gt; generates interrupt requests on the behalf of other
  modules and from the external interface&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Test suite&lt;/h2&gt;
&lt;p&gt;The project includes a non-very-extensive, but still very useful test
suite:&lt;/p&gt;
&lt;p&gt;&lt;img alt="The result of Chisel test" src="www.j-marjanovic.io/images/2021_fpga_card_part_8/scala_test.png" style="width:70%; display: block; margin-left: auto; margin-right: auto;"&gt; &lt;/p&gt;
&lt;p&gt;Here it is clear that this is a hobby project, a serious IP would have more
unit tests and also some more high-level tests.&lt;/p&gt;
&lt;h2&gt;Data generator and data checker&lt;/h2&gt;
&lt;p&gt;To validate the IP in real hardware, I have developed two auxiliary IPs:
&lt;a href="https://github.com/j-marjanovic/pp-sp-reference-design/tree/92452f3ea6e52a95f841efe127d4e728a6295013/ip_cores/avalon_st_generator"&gt;Avalon-ST
Generator&lt;/a&gt;
and &lt;a href="https://github.com/j-marjanovic/pp-sp-reference-design/tree/92452f3ea6e52a95f841efe127d4e728a6295013/ip_cores/avalon_st_checker"&gt;Avalon-ST
Checker&lt;/a&gt;.
The first one generates a 256-bit-wide data stream composed of 16-bit counter
values and the second one checks the received data stream against a reference
counter and stores the results in internal registers. Both IPs also contain
Avalon-MM interface which can be used to control the IPs. Shown in the figure
below are the connections between all relevant IPs for the PCIe test.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Block diagram of the example design" src="www.j-marjanovic.io/images/2021_fpga_card_part_8/example_design_block_diagram.png" style="width:50%; display: block; margin-left: auto; margin-right: auto;"&gt; &lt;/p&gt;
&lt;h2&gt;Integration with Quartus Platform Designer&lt;/h2&gt;
&lt;p&gt;Below is an screenshot of the PCIe endpoint IP connected to the &lt;em&gt;Stratix V
Hard IP for PCI Express&lt;/em&gt; in Quartus Platform Designer:&lt;/p&gt;
&lt;p&gt;&lt;img alt="PCIe endpoint in Quartus Platform Designer (aka QSys)" src="www.j-marjanovic.io/images/2021_fpga_card_part_8/example_usage.png" style="width:50%; display: block; margin-left: auto; margin-right: auto;"&gt; &lt;/p&gt;
&lt;h1&gt;Linux driver, test program and TUI&lt;/h1&gt;
&lt;p&gt;The FPGA design is only one half of this story, the second half is a Linux
driver. The driver is also available on my GitHub page: 
&lt;a href="https://github.com/j-marjanovic/pp-sp-linux-driver"&gt;Linux driver for Unofficial Pikes Peak/Storey Peak reference design&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The driver is a relatively standard Linux PCIe driver, it registers its &lt;code&gt;probe&lt;/code&gt;
function with the PCIe subsystem based on &lt;a href="https://devicehunt.com/view/type/pci/vendor/1172/device/00A7"&gt;vendor ID/device ID for Altera
Stratix V&lt;/a&gt; and
subsystem vendor ID of &lt;code&gt;0x01a2&lt;/code&gt; (for JAN) and devices IDs of &lt;code&gt;0x0001&lt;/code&gt; for the
first interface, &lt;code&gt;0x0002&lt;/code&gt; for the second interface and &lt;code&gt;0x0a00&lt;/code&gt; for custom
applications.&lt;/p&gt;
&lt;p&gt;Once the match is found, it performs the house-keeping tasks (allocating memory,
claiming BARs, allocating DMA buffer, enabling the device) and it finally
creates a character device (in &lt;code&gt;/dev&lt;/code&gt;) with a name derived from the PCIe
address.&lt;/p&gt;
&lt;p&gt;The user-space programs interact with the driver in two ways. To access the
registers on BAR0 (i.e. reads and writes) the programs can &lt;code&gt;mmap()&lt;/code&gt; the char
device and dereference a pointer to &lt;code&gt;uint32_t&lt;/code&gt; or &lt;code&gt;uint64_t&lt;/code&gt;. Two &lt;code&gt;ioctl&lt;/code&gt;s can
be used to exchange the DMA buffer with the user-space program, one for
retrieving the buffer content and another for setting it. Finally, another
&lt;code&gt;ioctl&lt;/code&gt; can be used to start the DMA transfer; the user-space program should
provide the transfer size and the direction, and the driver will perform the
desired request. To reduce the CPU utilization, the thread is put to sleep while
the DMA transfer is running and is waken up by an interrupt request from the DMA
engine in the FPGA.&lt;/p&gt;
&lt;h2&gt;Test program&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;pp_sp_test&lt;/code&gt; is a small test program that can be used to exercise the driver
and perform DMA transfers with the DMA engine in the FPGA. The program uses
the Avalon-ST Checker and Avalon-ST Generator IPs to verify the content of
the transfer.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt; ./pp_sp_test --help                                                               
&lt;span class="go"&gt;Usage: ./pp_sp_test --dev DEV [--write] [--read]&lt;/span&gt;

&lt;span class="go"&gt;Perform DMA transfers using the DMA engine in PP/SP FPGA&lt;/span&gt;

&lt;span class="go"&gt;options:&lt;/span&gt;
&lt;span class="go"&gt;  --help      print these help and exit&lt;/span&gt;
&lt;span class="go"&gt;  --dev       char device (e.g. /dev/pp_sp_pcie...)&lt;/span&gt;
&lt;span class="go"&gt;  --write     card to host (DMA write) transfer&lt;/span&gt;
&lt;span class="go"&gt;  --read      host to card (DMA read) transfer&lt;/span&gt;
&lt;span class="go"&gt;  --nr_bytes  number of bytes to transfer (default 256)&lt;/span&gt;
&lt;span class="go"&gt;  --count     count of loops to perform the reads and/or writes (default 1)&lt;/span&gt;
&lt;span class="go"&gt;  --msleep    sleep (in milliseconds) during each loop (default 0)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Here is an example where 8 KiB of data was transferred in both directions. First
the data was transferred from the card to the host (&lt;code&gt;c2h&lt;/code&gt;), and then from the
host to the card (&lt;code&gt;h2c&lt;/code&gt;). In the end, the result of the check (in the FPGA) is
printed.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt; ./pp_sp_test --dev /dev/pp_sp_pcie_0000:42:00.0 --write --read --nr_bytes &lt;span class="m"&gt;8192&lt;/span&gt;
&lt;span class="go"&gt;Arguments:&lt;/span&gt;
&lt;span class="go"&gt;  dev = /dev/pp_sp_pcie_0000:42:00.0&lt;/span&gt;
&lt;span class="go"&gt;  c2h = 1, h2c = 1&lt;/span&gt;
&lt;span class="go"&gt;  nr_bytes = 8192, count = 1&lt;/span&gt;
&lt;span class="go"&gt;===================================&lt;/span&gt;
&lt;span class="go"&gt;[loop] i = 0&lt;/span&gt;
&lt;span class="go"&gt;[gen] id reg = a51579e2&lt;/span&gt;
&lt;span class="go"&gt;[gen] state = 1&lt;/span&gt;
&lt;span class="go"&gt;[gen] nr samp = 64&lt;/span&gt;
&lt;span class="go"&gt;[c2h] DMA duration 0.024146 ms&lt;/span&gt;
&lt;span class="go"&gt;[c2h] ioctl took 0.021000 ms&lt;/span&gt;
&lt;span class="go"&gt;[gen] state = 0&lt;/span&gt;
&lt;span class="go"&gt;[gen] nr samp = 8192&lt;/span&gt;
&lt;span class="go"&gt;[check] id reg = a5157c8c&lt;/span&gt;
&lt;span class="go"&gt;[h2c] DMA duration 0.021397 ms&lt;/span&gt;
&lt;span class="go"&gt;[h2c] ioctl took 0.019000 ms&lt;/span&gt;
&lt;span class="go"&gt;[check] samp = 8192 / 8192&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;h2&gt;TUI&lt;/h2&gt;
&lt;p&gt;A more elaborate program to control the DMA engine and the corresponding
example application is provided in the form of a Terminal User Interface in the
&lt;a href="https://github.com/j-marjanovic/pp-sp-linux-driver/tree/main/tui"&gt;tui
directory&lt;/a&gt;.
With this program a user can select the size and the direction of the transfer
for both interfaces. The interface displays the result of the &lt;a href="https://github.com/j-marjanovic/pp-sp-linux-driver/blob/aaf043ed50befd2f23a8faf77294604050e597c6/pp_sp_pcie.c#L136-L146"&gt;throughput
measurement in kernel
space&lt;/a&gt;
and the statistics (minimum, average, and maximum value) for the last 100
measurements.&lt;/p&gt;
&lt;p&gt;Some screenshots of TUI are shown in the next chapters.&lt;/p&gt;
&lt;h1&gt;Test setups&lt;/h1&gt;
&lt;p&gt;I have used Storey Peak board (in PCIe form-factor) with two systems to test the
PCI express:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;FUJITSU D3642-B1 motherboard with Intel i5-9600K&lt;/li&gt;
&lt;li&gt;Dell PowerEdge R720 with two Intel E5-2680&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;R720 has an x16 PCIe slot but does not support bifurcation, while the x16
PCIe slot on the Fujitsu motherboard can be split into the x8x8 configuration with
a BIOS setting.&lt;/p&gt;
&lt;p&gt;The system with the Fujitsu motherboard ran Ubuntu 18.04 LTS with Linux kernel
4.15.0-161 while the R720 ran Ubuntu 20.04 LTS with Linux kernel 5.11.22 (custom
build with CMA enabled but not used in this experiment).&lt;/p&gt;
&lt;p&gt;I have prepared three different FPGA designs: &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;one with only lower 8 lanes connected and the Hard IP on the right side of the device,&lt;/li&gt;
&lt;li&gt;one with only upper 8 lanes connected and the Hard IP on the left side of the device, and&lt;/li&gt;
&lt;li&gt;one with all 16 lanes connected (in x8x8 configuration) and with both Hard IPs&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The Quartus project for the last one is available in the commit 
&lt;a href="https://github.com/j-marjanovic/pp-sp-reference-design/commit/3cb9c506e68fc4c4a60922f7d382fb888ff03713"&gt;3cb9c50 Change subsystem device id for the second PCIe&lt;/a&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align="center"&gt;only PCIE lanes 0 to 7&lt;/th&gt;
&lt;th align="center"&gt;only PCIe lanes 8 to 15&lt;/th&gt;
&lt;th align="center"&gt;PCIe lanes 0 to 7 and 8 to 15&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align="center"&gt;&lt;img alt="PCIe lanes 0 - 7" src="www.j-marjanovic.io/images/2021_fpga_card_part_8/device_right.png" style="width:90%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/td&gt;
&lt;td align="center"&gt;&lt;img alt="PCIe lanes 8 - 15" src="www.j-marjanovic.io/images/2021_fpga_card_part_8/device_left.png" style="width:90%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/td&gt;
&lt;td align="center"&gt;&lt;img alt="PCIe lanes 0 - 7, 8 - 15" src="www.j-marjanovic.io/images/2021_fpga_card_part_8/device_both.png" style="width:90%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;As an aside, it can be noted that the JTAG pins are located in the lower-left
corner, and it is interesting to see how the fitter decided to place some of
the logic halfway between the PCIe core on the right side and the JTAG pins on
the left side.&lt;/p&gt;
&lt;h1&gt;Measurements&lt;/h1&gt;
&lt;h2&gt;Fujitsu motherboard with right side PCIe link (lanes 0 to 7)&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt; sudo lspci -d &lt;span class="m"&gt;1172&lt;/span&gt;: -vv
&lt;span class="go"&gt;01:00.0 Non-VGA unclassified device: Altera Corporation Stratix V (rev 01)&lt;/span&gt;
&lt;span class="go"&gt;    Subsystem: Device 01a2:0001&lt;/span&gt;
&lt;span class="go"&gt;[...]&lt;/span&gt;
&lt;span class="go"&gt;    Region 0: Memory at a1000000 (32-bit, non-prefetchable) [size=4M]&lt;/span&gt;
&lt;span class="go"&gt;    Region 2: Memory at a1400000 (32-bit, non-prefetchable) [size=256K]&lt;/span&gt;
&lt;span class="go"&gt;    Capabilities: [50] MSI: Enable+ Count=1/4 Maskable- 64bit+&lt;/span&gt;
&lt;span class="go"&gt;        Address: 00000000fee003f8  Data: 0000&lt;/span&gt;
&lt;span class="go"&gt;[...]&lt;/span&gt;
&lt;span class="go"&gt;        LnkCap: Port #1, Speed 8GT/s, Width x8, ASPM not supported, Exit Latency L0s &amp;lt;4us, L1 &amp;lt;1us&lt;/span&gt;
&lt;span class="go"&gt;            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+&lt;/span&gt;
&lt;span class="go"&gt;        LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+&lt;/span&gt;
&lt;span class="go"&gt;            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-&lt;/span&gt;
&lt;span class="go"&gt;        LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-&lt;/span&gt;
&lt;span class="go"&gt;[...]&lt;/span&gt;
&lt;span class="go"&gt;    Kernel driver in use: pp_sp_pcie&lt;/span&gt;
&lt;span class="go"&gt;[...]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;&lt;img alt="DMA write on Fujitsu MB, right interface" src="www.j-marjanovic.io/images/2021_fpga_card_part_8/fujitsu_right_x8_write.png" style="width:70%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;&lt;img alt="DMA read on Fujitsu MB, right interface" src="www.j-marjanovic.io/images/2021_fpga_card_part_8/fujitsu_right_x8_read.png" style="width:70%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;Here we see something interesting - the Avalon-ST Checker reports that some
of the samples do not match the reference values. I investigate this issue
in the following chapter.&lt;/p&gt;
&lt;h2&gt;Fujitsu motherboard with left side PCIe link (lanes 8 to 15)&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt; sudo lspci -d &lt;span class="m"&gt;1172&lt;/span&gt;: -vv
&lt;span class="go"&gt;02:00.0 Non-VGA unclassified device: Altera Corporation Stratix V (rev 01)&lt;/span&gt;
&lt;span class="go"&gt;    Subsystem: Device 01a2:0001&lt;/span&gt;
&lt;span class="go"&gt;[...]&lt;/span&gt;
&lt;span class="go"&gt;    Region 0: Memory at a1000000 (32-bit, non-prefetchable) [size=4M]&lt;/span&gt;
&lt;span class="go"&gt;    Region 2: Memory at a1400000 (32-bit, non-prefetchable) [size=256K]&lt;/span&gt;
&lt;span class="go"&gt;    Capabilities: [50] MSI: Enable+ Count=1/4 Maskable- 64bit+&lt;/span&gt;
&lt;span class="go"&gt;        Address: 00000000fee00438  Data: 0000&lt;/span&gt;
&lt;span class="go"&gt;[...]&lt;/span&gt;
&lt;span class="go"&gt;        LnkCap: Port #2, Speed 8GT/s, Width x8, ASPM not supported, Exit Latency L0s &amp;lt;4us, L1 &amp;lt;1us&lt;/span&gt;
&lt;span class="go"&gt;            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+&lt;/span&gt;
&lt;span class="go"&gt;        LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+&lt;/span&gt;
&lt;span class="go"&gt;            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-&lt;/span&gt;
&lt;span class="go"&gt;        LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-&lt;/span&gt;
&lt;span class="go"&gt;[...]&lt;/span&gt;
&lt;span class="go"&gt;    Kernel driver in use: pp_sp_pcie&lt;/span&gt;
&lt;span class="go"&gt;[...]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;&lt;img alt="DMA write on Fujitsu MB, left interface" src="www.j-marjanovic.io/images/2021_fpga_card_part_8/fujitsu_left_x8_write.png" style="width:70%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;&lt;img alt="DMA read on Fujitsu MB, left interface" src="www.j-marjanovic.io/images/2021_fpga_card_part_8/fujitsu_left_x8_read.png" style="width:70%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;Similarly, also here the the Avalon-ST Checker reports corrupted samples.&lt;/p&gt;
&lt;h2&gt;Fujitsu motherboard with both PCIe links (lanes 0 to 7 and 8 to 15)&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;pp_sp_example &amp;gt; pcie status&lt;/span&gt;
&lt;span class="err"&gt;pcie 0:&lt;/span&gt;
&lt;span class="err"&gt;  id = 2c1e57a7&lt;/span&gt;
&lt;span class="err"&gt;  version = 10000&lt;/span&gt;
&lt;span class="err"&gt;  status.cur speed   = 3&lt;/span&gt;
&lt;span class="err"&gt;  status.LTTSM state = f&lt;/span&gt;
&lt;span class="err"&gt;  status.lane act    = 8&lt;/span&gt;
&lt;span class="err"&gt;  status.DL up       = 1&lt;/span&gt;
&lt;span class="err"&gt;pcie 1:&lt;/span&gt;
&lt;span class="err"&gt;  id = 2c1e57a7&lt;/span&gt;
&lt;span class="err"&gt;  version = 10000&lt;/span&gt;
&lt;span class="err"&gt;  status.cur speed   = 3&lt;/span&gt;
&lt;span class="err"&gt;  status.LTTSM state = f&lt;/span&gt;
&lt;span class="err"&gt;  status.lane act    = 8&lt;/span&gt;
&lt;span class="err"&gt;  status.DL up       = 1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt; sudo lspci -d &lt;span class="m"&gt;1172&lt;/span&gt;: -vv &lt;span class="p"&gt;|&lt;/span&gt; egrep &lt;span class="s1"&gt;&amp;#39;Altera|LnkSta|Subsystem&amp;#39;&lt;/span&gt;
&lt;span class="go"&gt;01:00.0 Non-VGA unclassified device: Altera Corporation Stratix V (rev 01)&lt;/span&gt;
&lt;span class="go"&gt;    Subsystem: Device 01a2:0001&lt;/span&gt;
&lt;span class="go"&gt;        LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-&lt;/span&gt;
&lt;span class="go"&gt;        LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+&lt;/span&gt;
&lt;span class="go"&gt;02:00.0 Non-VGA unclassified device: Altera Corporation Stratix V (rev 01)&lt;/span&gt;
&lt;span class="go"&gt;    Subsystem: Device 01a2:0001&lt;/span&gt;
&lt;span class="go"&gt;        LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-&lt;/span&gt;
&lt;span class="go"&gt;        LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;&lt;img alt="DMA write on Fujitsu MB, both interfaces" src="www.j-marjanovic.io/images/2021_fpga_card_part_8/fujitsu_both_x8_write.png" style="width:70%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;&lt;img alt="DMA read on Fujitsu MB, both interfaces" src="www.j-marjanovic.io/images/2021_fpga_card_part_8/fujitsu_both_x8_read.png" style="width:70%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;And not surprisingly, when using both interfaces to read the Avalon-ST Checker
reports some corrupted samples as well.&lt;/p&gt;
&lt;h2&gt;Dell PowerEdge R720&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt; sudo lspci -d &lt;span class="m"&gt;1172&lt;/span&gt;: -vv &lt;span class="p"&gt;|&lt;/span&gt; egrep &lt;span class="s1"&gt;&amp;#39;Altera|LnkSta|Subsys&amp;#39;&lt;/span&gt;
&lt;span class="go"&gt;42:00.0 Non-VGA unclassified device: Altera Corporation Stratix V (rev 01)&lt;/span&gt;
&lt;span class="go"&gt;    Subsystem: Device 01a2:0001&lt;/span&gt;
&lt;span class="go"&gt;        LnkSta: Speed 8GT/s (ok), Width x8 (ok)&lt;/span&gt;
&lt;span class="go"&gt;        LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;&lt;img alt="DMA write on R720" src="www.j-marjanovic.io/images/2021_fpga_card_part_8/r720_x8_write.png" style="width:70%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;&lt;img alt="DMA read on R720" src="www.j-marjanovic.io/images/2021_fpga_card_part_8/r720_x8_read.png" style="width:70%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;h1&gt;Issues&lt;/h1&gt;
&lt;h2&gt;Handling of out-of-order CplD packets&lt;/h2&gt;
&lt;p&gt;As we have seen above, when running in combination with Intel i5-9600K there is
a small percentage of the samples that Avalon-ST Checker reports as corrupted.&lt;/p&gt;
&lt;p&gt;To observe what is going on, I have placed a SignalTap on the important signals
of the Avalon-ST Checker, and let it trigger on invalid packets.&lt;/p&gt;
&lt;p&gt;&lt;img alt="SignalTap capture" src="www.j-marjanovic.io/images/2021_fpga_card_part_8/issue_signaltap.png" style="width:70%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;We can note that for a couple of cycles the &lt;code&gt;data_valid_p&lt;/code&gt; signal is high, but
&lt;code&gt;data_ok_p&lt;/code&gt; is low, indicating that the received data does not match the
reference data. After a couple of clock cycles the data stream seems to recover.&lt;/p&gt;
&lt;p&gt;If we store the data from this SignalTap capture into a &lt;code&gt;.csv&lt;/code&gt; file and
highlight the important fields in the header (blue is &lt;em&gt;Length&lt;/em&gt;, magenta is
&lt;em&gt;Tag&lt;/em&gt;, black is the payload data), we can observe that two of the packets are
returned out of order. We can observe that first a packet with the tag &lt;code&gt;0xb&lt;/code&gt; is
returned (in the sample at time -11), but only the first 0x30 dwords - there are
still 0x10 dwords outstanding to fulfill the 0x40 dwords (256 bytes) long
request. Instead of continuing with the remaining part of the packet with the
tag &lt;code&gt;0xb&lt;/code&gt;, the next packet contains the tag &lt;code&gt;0xc&lt;/code&gt; (in the sample at time -4).
Only after this packet, the transaction with the tag &lt;code&gt;0xb&lt;/code&gt; is completed (in the
sample at time -1).&lt;/p&gt;
&lt;p&gt;&lt;img alt="Out-of-order CplD packets" src="www.j-marjanovic.io/images/2021_fpga_card_part_8/out_of_order_cpld.png" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;PCI Express devices are allowed to return completion packets out-of-order:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Completions with different Transaction IDs are permitted to pass each other.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;My PCIe endpoint IP currently does not support out-of-order CplD packets, but
adding support for this case should be relatively straightforward: the
receiving side needs to keep track of the number of dwords remaining per each
tag, and queue the packets when the current tag is not yet complete.&lt;/p&gt;
&lt;h1&gt;Summary&lt;/h1&gt;
&lt;p&gt;In this blog post I have summarized my current experience with PCIe on Storey
Peak board. With a small hack one is able to use both PCIe Hard IPs available
in this device and use the PCIe interface in the full Gen3 x8x8 mode. &lt;/p&gt;
&lt;p&gt;I have developed a PCIe endpoint IP that connects to the PCIe Hard IP and
provides register access and DMA capabilities. The DMA is able to achieve 6.5
GB/s in both read and write direction on one interface. With both interfaces
active it achieves 11 GB/s in write direction and 12 GB/s in read direction.
I would consider this to be a good result.&lt;/p&gt;
&lt;p&gt;We can now declare the PCIe part "solved", and this concludes the last major
effort in reverse engineering of this board. The last remaining part is
providing an example for the use of QSFP connectors, and some general clean-up
of the &lt;a href="https://github.com/j-marjanovic/pp-sp-reference-design/"&gt;reference
design&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h1&gt;Appendix&lt;/h1&gt;
&lt;h2&gt;lspci on Fujitsu motherboard&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt; sudo lspci -d &lt;span class="m"&gt;1172&lt;/span&gt;: -vv
&lt;span class="go"&gt;01:00.0 Non-VGA unclassified device: Altera Corporation Stratix V (rev 01)&lt;/span&gt;
&lt;span class="go"&gt;    Subsystem: Device 01a2:0001&lt;/span&gt;
&lt;span class="go"&gt;    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+&lt;/span&gt;
&lt;span class="go"&gt;    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast &amp;gt;TAbort- &amp;lt;TAbort- &amp;lt;MAbort- &amp;gt;SERR- &amp;lt;PERR- INTx-&lt;/span&gt;
&lt;span class="go"&gt;    Latency: 0, Cache Line Size: 64 bytes&lt;/span&gt;
&lt;span class="go"&gt;    Interrupt: pin A routed to IRQ 136&lt;/span&gt;
&lt;span class="go"&gt;    Region 0: Memory at a1000000 (32-bit, non-prefetchable) [size=4M]&lt;/span&gt;
&lt;span class="go"&gt;    Region 2: Memory at a1400000 (32-bit, non-prefetchable) [size=256K]&lt;/span&gt;
&lt;span class="go"&gt;    Capabilities: [50] MSI: Enable+ Count=1/4 Maskable- 64bit+&lt;/span&gt;
&lt;span class="go"&gt;        Address: 00000000fee003f8  Data: 0000&lt;/span&gt;
&lt;span class="go"&gt;    Capabilities: [78] Power Management version 3&lt;/span&gt;
&lt;span class="go"&gt;        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)&lt;/span&gt;
&lt;span class="go"&gt;        Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-&lt;/span&gt;
&lt;span class="go"&gt;    Capabilities: [80] Express (v2) Endpoint, MSI 00&lt;/span&gt;
&lt;span class="go"&gt;        DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s &amp;lt;64ns, L1 &amp;lt;1us&lt;/span&gt;
&lt;span class="go"&gt;            ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 75.000W&lt;/span&gt;
&lt;span class="go"&gt;        DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-&lt;/span&gt;
&lt;span class="go"&gt;            RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+&lt;/span&gt;
&lt;span class="go"&gt;            MaxPayload 256 bytes, MaxReadReq 512 bytes&lt;/span&gt;
&lt;span class="go"&gt;        DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-&lt;/span&gt;
&lt;span class="go"&gt;        LnkCap: Port #1, Speed 8GT/s, Width x8, ASPM not supported, Exit Latency L0s &amp;lt;4us, L1 &amp;lt;1us&lt;/span&gt;
&lt;span class="go"&gt;            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+&lt;/span&gt;
&lt;span class="go"&gt;        LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+&lt;/span&gt;
&lt;span class="go"&gt;            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-&lt;/span&gt;
&lt;span class="go"&gt;        LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-&lt;/span&gt;
&lt;span class="go"&gt;        DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported&lt;/span&gt;
&lt;span class="go"&gt;        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled&lt;/span&gt;
&lt;span class="go"&gt;        LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-&lt;/span&gt;
&lt;span class="go"&gt;             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-&lt;/span&gt;
&lt;span class="go"&gt;             Compliance De-emphasis: -6dB&lt;/span&gt;
&lt;span class="go"&gt;        LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+&lt;/span&gt;
&lt;span class="go"&gt;             EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-&lt;/span&gt;
&lt;span class="go"&gt;    Capabilities: [100 v1] Virtual Channel&lt;/span&gt;
&lt;span class="go"&gt;        Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1&lt;/span&gt;
&lt;span class="go"&gt;        Arb:    Fixed- WRR32- WRR64- WRR128-&lt;/span&gt;
&lt;span class="go"&gt;        Ctrl:   ArbSelect=Fixed&lt;/span&gt;
&lt;span class="go"&gt;        Status: InProgress-&lt;/span&gt;
&lt;span class="go"&gt;        VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-&lt;/span&gt;
&lt;span class="go"&gt;            Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-&lt;/span&gt;
&lt;span class="go"&gt;            Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff&lt;/span&gt;
&lt;span class="go"&gt;            Status: NegoPending- InProgress-&lt;/span&gt;
&lt;span class="go"&gt;    Capabilities: [200 v1] Vendor Specific Information: ID=1172 Rev=0 Len=044 &amp;lt;?&amp;gt;&lt;/span&gt;
&lt;span class="go"&gt;    Capabilities: [300 v1] #19&lt;/span&gt;
&lt;span class="go"&gt;    Capabilities: [800 v1] Advanced Error Reporting&lt;/span&gt;
&lt;span class="go"&gt;        UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-&lt;/span&gt;
&lt;span class="go"&gt;        UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-&lt;/span&gt;
&lt;span class="go"&gt;        UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-&lt;/span&gt;
&lt;span class="go"&gt;        CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-&lt;/span&gt;
&lt;span class="go"&gt;        CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+&lt;/span&gt;
&lt;span class="go"&gt;        AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-&lt;/span&gt;
&lt;span class="go"&gt;    Kernel driver in use: pp_sp_pcie&lt;/span&gt;
&lt;span class="go"&gt;    Kernel modules: altera_cvp&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt; sudo lspci -d &lt;span class="m"&gt;1172&lt;/span&gt;: -vv
&lt;span class="go"&gt;02:00.0 Non-VGA unclassified device: Altera Corporation Stratix V (rev 01)&lt;/span&gt;
&lt;span class="go"&gt;    Subsystem: Device 01a2:0001&lt;/span&gt;
&lt;span class="go"&gt;    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+&lt;/span&gt;
&lt;span class="go"&gt;    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast &amp;gt;TAbort- &amp;lt;TAbort- &amp;lt;MAbort- &amp;gt;SERR- &amp;lt;PERR- INTx-&lt;/span&gt;
&lt;span class="go"&gt;    Latency: 0, Cache Line Size: 64 bytes&lt;/span&gt;
&lt;span class="go"&gt;    Interrupt: pin A routed to IRQ 137&lt;/span&gt;
&lt;span class="go"&gt;    Region 0: Memory at a1000000 (32-bit, non-prefetchable) [size=4M]&lt;/span&gt;
&lt;span class="go"&gt;    Region 2: Memory at a1400000 (32-bit, non-prefetchable) [size=256K]&lt;/span&gt;
&lt;span class="go"&gt;    Capabilities: [50] MSI: Enable+ Count=1/4 Maskable- 64bit+&lt;/span&gt;
&lt;span class="go"&gt;        Address: 00000000fee00438  Data: 0000&lt;/span&gt;
&lt;span class="go"&gt;    Capabilities: [78] Power Management version 3&lt;/span&gt;
&lt;span class="go"&gt;        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)&lt;/span&gt;
&lt;span class="go"&gt;        Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-&lt;/span&gt;
&lt;span class="go"&gt;    Capabilities: [80] Express (v2) Endpoint, MSI 00&lt;/span&gt;
&lt;span class="go"&gt;        DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s &amp;lt;64ns, L1 &amp;lt;1us&lt;/span&gt;
&lt;span class="go"&gt;            ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 75.000W&lt;/span&gt;
&lt;span class="go"&gt;        DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-&lt;/span&gt;
&lt;span class="go"&gt;            RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+&lt;/span&gt;
&lt;span class="go"&gt;            MaxPayload 256 bytes, MaxReadReq 512 bytes&lt;/span&gt;
&lt;span class="go"&gt;        DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-&lt;/span&gt;
&lt;span class="go"&gt;        LnkCap: Port #2, Speed 8GT/s, Width x8, ASPM not supported, Exit Latency L0s &amp;lt;4us, L1 &amp;lt;1us&lt;/span&gt;
&lt;span class="go"&gt;            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+&lt;/span&gt;
&lt;span class="go"&gt;        LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+&lt;/span&gt;
&lt;span class="go"&gt;            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-&lt;/span&gt;
&lt;span class="go"&gt;        LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-&lt;/span&gt;
&lt;span class="go"&gt;        DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported&lt;/span&gt;
&lt;span class="go"&gt;        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled&lt;/span&gt;
&lt;span class="go"&gt;        LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-&lt;/span&gt;
&lt;span class="go"&gt;             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-&lt;/span&gt;
&lt;span class="go"&gt;             Compliance De-emphasis: -6dB&lt;/span&gt;
&lt;span class="go"&gt;        LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+&lt;/span&gt;
&lt;span class="go"&gt;             EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-&lt;/span&gt;
&lt;span class="go"&gt;    Capabilities: [100 v1] Virtual Channel&lt;/span&gt;
&lt;span class="go"&gt;        Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1&lt;/span&gt;
&lt;span class="go"&gt;        Arb:    Fixed- WRR32- WRR64- WRR128-&lt;/span&gt;
&lt;span class="go"&gt;        Ctrl:   ArbSelect=Fixed&lt;/span&gt;
&lt;span class="go"&gt;        Status: InProgress-&lt;/span&gt;
&lt;span class="go"&gt;        VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-&lt;/span&gt;
&lt;span class="go"&gt;            Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-&lt;/span&gt;
&lt;span class="go"&gt;            Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff&lt;/span&gt;
&lt;span class="go"&gt;            Status: NegoPending- InProgress-&lt;/span&gt;
&lt;span class="go"&gt;    Capabilities: [200 v1] Vendor Specific Information: ID=1172 Rev=0 Len=044 &amp;lt;?&amp;gt;&lt;/span&gt;
&lt;span class="go"&gt;    Capabilities: [300 v1] #19&lt;/span&gt;
&lt;span class="go"&gt;    Capabilities: [800 v1] Advanced Error Reporting&lt;/span&gt;
&lt;span class="go"&gt;        UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-&lt;/span&gt;
&lt;span class="go"&gt;        UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-&lt;/span&gt;
&lt;span class="go"&gt;        UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-&lt;/span&gt;
&lt;span class="go"&gt;        CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-&lt;/span&gt;
&lt;span class="go"&gt;        CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+&lt;/span&gt;
&lt;span class="go"&gt;        AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-&lt;/span&gt;
&lt;span class="go"&gt;    Kernel driver in use: pp_sp_pcie&lt;/span&gt;
&lt;span class="go"&gt;    Kernel modules: altera_cvp&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;h2&gt;lspci on Dell R720&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt; sudo lspci -d &lt;span class="m"&gt;1172&lt;/span&gt;: -vv
&lt;span class="go"&gt;42:00.0 Non-VGA unclassified device: Altera Corporation Stratix V (rev 01)&lt;/span&gt;
&lt;span class="go"&gt;    Subsystem: Device 01a2:0001&lt;/span&gt;
&lt;span class="go"&gt;    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+&lt;/span&gt;
&lt;span class="go"&gt;    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast &amp;gt;TAbort- &amp;lt;TAbort- &amp;lt;MAbort- &amp;gt;SERR- &amp;lt;PERR- INTx-&lt;/span&gt;
&lt;span class="go"&gt;    Latency: 0, Cache Line Size: 64 bytes&lt;/span&gt;
&lt;span class="go"&gt;    Interrupt: pin A routed to IRQ 167&lt;/span&gt;
&lt;span class="go"&gt;    NUMA node: 1&lt;/span&gt;
&lt;span class="go"&gt;    Region 0: Memory at d5000000 (32-bit, non-prefetchable) [size=4M]&lt;/span&gt;
&lt;span class="go"&gt;    Region 2: Memory at d57c0000 (32-bit, non-prefetchable) [size=256K]&lt;/span&gt;
&lt;span class="go"&gt;    Capabilities: [50] MSI: Enable+ Count=1/4 Maskable- 64bit+&lt;/span&gt;
&lt;span class="go"&gt;        Address: 00000000fee00918  Data: 0000&lt;/span&gt;
&lt;span class="go"&gt;    Capabilities: [78] Power Management version 3&lt;/span&gt;
&lt;span class="go"&gt;        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)&lt;/span&gt;
&lt;span class="go"&gt;        Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-&lt;/span&gt;
&lt;span class="go"&gt;    Capabilities: [80] Express (v2) Endpoint, MSI 00&lt;/span&gt;
&lt;span class="go"&gt;        DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s &amp;lt;64ns, L1 &amp;lt;1us&lt;/span&gt;
&lt;span class="go"&gt;            ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 25.000W&lt;/span&gt;
&lt;span class="go"&gt;        DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq+&lt;/span&gt;
&lt;span class="go"&gt;            RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+&lt;/span&gt;
&lt;span class="go"&gt;            MaxPayload 256 bytes, MaxReadReq 512 bytes&lt;/span&gt;
&lt;span class="go"&gt;        DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-&lt;/span&gt;
&lt;span class="go"&gt;        LnkCap: Port #1, Speed 8GT/s, Width x8, ASPM not supported&lt;/span&gt;
&lt;span class="go"&gt;            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+&lt;/span&gt;
&lt;span class="go"&gt;        LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+&lt;/span&gt;
&lt;span class="go"&gt;            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-&lt;/span&gt;
&lt;span class="go"&gt;        LnkSta: Speed 8GT/s (ok), Width x8 (ok)&lt;/span&gt;
&lt;span class="go"&gt;            TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-&lt;/span&gt;
&lt;span class="go"&gt;        DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR-&lt;/span&gt;
&lt;span class="go"&gt;             10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-&lt;/span&gt;
&lt;span class="go"&gt;             EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-&lt;/span&gt;
&lt;span class="go"&gt;             FRS-, TPHComp-, ExtTPHComp-&lt;/span&gt;
&lt;span class="go"&gt;             AtomicOpsCap: 32bit- 64bit- 128bitCAS-&lt;/span&gt;
&lt;span class="go"&gt;        DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled&lt;/span&gt;
&lt;span class="go"&gt;             AtomicOpsCtl: ReqEn-&lt;/span&gt;
&lt;span class="go"&gt;        LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-&lt;/span&gt;
&lt;span class="go"&gt;             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-&lt;/span&gt;
&lt;span class="go"&gt;             Compliance De-emphasis: -6dB&lt;/span&gt;
&lt;span class="go"&gt;        LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+&lt;/span&gt;
&lt;span class="go"&gt;             EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-&lt;/span&gt;
&lt;span class="go"&gt;    Capabilities: [100 v1] Virtual Channel&lt;/span&gt;
&lt;span class="go"&gt;        Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1&lt;/span&gt;
&lt;span class="go"&gt;        Arb:    Fixed- WRR32- WRR64- WRR128-&lt;/span&gt;
&lt;span class="go"&gt;        Ctrl:   ArbSelect=Fixed&lt;/span&gt;
&lt;span class="go"&gt;        Status: InProgress-&lt;/span&gt;
&lt;span class="go"&gt;        VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-&lt;/span&gt;
&lt;span class="go"&gt;            Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-&lt;/span&gt;
&lt;span class="go"&gt;            Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff&lt;/span&gt;
&lt;span class="go"&gt;            Status: NegoPending- InProgress-&lt;/span&gt;
&lt;span class="go"&gt;    Capabilities: [200 v1] Vendor Specific Information: ID=1172 Rev=0 Len=044 &amp;lt;?&amp;gt;&lt;/span&gt;
&lt;span class="go"&gt;    Capabilities: [300 v1] Secondary PCI Express&lt;/span&gt;
&lt;span class="go"&gt;        LnkCtl3: LnkEquIntrruptEn-, PerformEqu-&lt;/span&gt;
&lt;span class="go"&gt;        LaneErrStat: 0&lt;/span&gt;
&lt;span class="go"&gt;    Capabilities: [800 v1] Advanced Error Reporting&lt;/span&gt;
&lt;span class="go"&gt;        UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-&lt;/span&gt;
&lt;span class="go"&gt;        UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-&lt;/span&gt;
&lt;span class="go"&gt;        UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-&lt;/span&gt;
&lt;span class="go"&gt;        CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-&lt;/span&gt;
&lt;span class="go"&gt;        CEMsk:  RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ AdvNonFatalErr+&lt;/span&gt;
&lt;span class="go"&gt;        AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-&lt;/span&gt;
&lt;span class="go"&gt;            MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-&lt;/span&gt;
&lt;span class="go"&gt;        HeaderLog: 00000000 00000000 00000000 00000000&lt;/span&gt;
&lt;span class="go"&gt;    Kernel driver in use: pp_sp_pcie&lt;/span&gt;
&lt;span class="go"&gt;    Kernel modules: altera_cvp&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;hr&gt;
&lt;div style="font-size: 80%;" &gt;
Intel, the Intel logo, Altera, Nios, Quartus and Stratix words and logos are
trademarks of Intel  Corporation  or  its subsidiaries  in  the  U.S.  and/or
other  countries.
&lt;/div&gt;

&lt;div style="font-size: 80%;" &gt;
PCI Express® and PCIe® are registered trademarks of PCI-SIG.
&lt;/div&gt;

&lt;div style="font-size: 80%;" &gt;
Linux® is the registered trademark of Linus Torvalds in the U.S. and other countries.
&lt;/div&gt;

&lt;div style="font-size: 80%;" &gt;
All trademarks and registered trademarks are the property of their respective owners.
&lt;/div&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="FPGA"></category></entry><entry><title>Notes from Xilinx® Adapt 2021</title><link href="www.j-marjanovic.io/notes-from-xilinxr-adapt-2021.html" rel="alternate"></link><published>2021-09-07T21:00:00+02:00</published><updated>2021-09-07T21:00:00+02:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2021-09-07:www.j-marjanovic.io/notes-from-xilinxr-adapt-2021.html</id><summary type="html">&lt;p&gt;&lt;a href="https://xilinx.cventevents.com/event/f7c4412f-572a-4b8b-b8d0-6b92aae2cf0d"&gt;Xilinx Adapt 2021&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;Day 1 (2021-09-07)&lt;/h1&gt;
&lt;h2&gt;Adaptive Computing: Innovation Accelerated [Ivo Bolsens (Xilinx)]&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;the hardware is adapted, rather the other way around&lt;/li&gt;
&lt;li&gt;DSA&lt;/li&gt;
&lt;li&gt;growing gap between the moore's law and AI requirements&lt;/li&gt;
&lt;li&gt;requirements: 6G (100 Gbps, 0.1 ms latency)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Xilinx wants to be a platform company&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;Vivado/FPGA -&amp;gt; Vitis …&lt;/li&gt;&lt;/ul&gt;</summary><content type="html">&lt;p&gt;&lt;a href="https://xilinx.cventevents.com/event/f7c4412f-572a-4b8b-b8d0-6b92aae2cf0d"&gt;Xilinx Adapt 2021&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;Day 1 (2021-09-07)&lt;/h1&gt;
&lt;h2&gt;Adaptive Computing: Innovation Accelerated [Ivo Bolsens (Xilinx)]&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;the hardware is adapted, rather the other way around&lt;/li&gt;
&lt;li&gt;DSA&lt;/li&gt;
&lt;li&gt;growing gap between the moore's law and AI requirements&lt;/li&gt;
&lt;li&gt;requirements: 6G (100 Gbps, 0.1 ms latency)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Xilinx wants to be a platform company&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;Vivado/FPGA -&amp;gt; Vitis/MPSoC -&amp;gt; Vitis-AI/ACAP (Peer Processor)&lt;/li&gt;
&lt;li&gt;"hardware developers are still the key audience for Xilinx"&lt;/li&gt;
&lt;li&gt;Cloud and Edge&lt;/li&gt;
&lt;li&gt;some examples: SmartSSD (partnership with Samsung), SmartNIC&lt;/li&gt;
&lt;li&gt;guest speaker: Alveo U250 in Azure&lt;ul&gt;
&lt;li&gt;Quantum optimization (10x gains a CPU)&lt;/li&gt;
&lt;li&gt;Synapse - SQL acceleration (e.g. CSV parsing)&lt;/li&gt;
&lt;li&gt;external use cases: Financial Services, Bio-Informatics&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;AI engine (Scalar ALU, Vector ALU)&lt;/li&gt;
&lt;li&gt;AIE Array (non-blocking interconnect, local memory, ISA-based Vector Engine)&lt;/li&gt;
&lt;li&gt;guest speaker: Samsung&lt;/li&gt;
&lt;li&gt;Bfloat, INT4&lt;/li&gt;
&lt;li&gt;guest speaker: HPC at Pacific Northwest National Lab&lt;ul&gt;
&lt;li&gt;computational chemistry application&lt;/li&gt;
&lt;li&gt;outlook: integration between physical science and data science&lt;/li&gt;
&lt;li&gt;heterogenous testbed: AMD Epyc, AMD Instinct GPU, Alveo&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Xilinx devices can handle the entire applications (e.g. ADAS)&lt;/li&gt;
&lt;li&gt;Kria SOM (most page views ever), comes with predefined bitstreams&lt;/li&gt;
&lt;li&gt;"accessible to software developers, domain experts"&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;The Future of Adaptive Computing [Ivo Bolsens, Vamsi Boppana (Xilinx)]&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;higher level -&amp;gt; IP Integrator, software&lt;/li&gt;
&lt;li&gt;2021.1 -&amp;gt; Machine Learning in Vivado&lt;ul&gt;
&lt;li&gt;"Inteligent Design runs"&lt;/li&gt;
&lt;li&gt;delay estimation, resource usage predictions&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;AI Engines also for liner algebra applications (not only Machine Learning)&lt;/li&gt;
&lt;li&gt;512-bit-wide vector machine&lt;/li&gt;
&lt;li&gt;chiplets (e.g. larger devices)&lt;/li&gt;
&lt;li&gt;rapidly evolving standards -&amp;gt; adaptable hardware&lt;/li&gt;
&lt;li&gt;DFX (e.g. in automotive, two different algorithms)&lt;/li&gt;
&lt;li&gt;Vitis AI: CNN, RNN, LSTM&lt;/li&gt;
&lt;li&gt;Hennessy &amp;amp; Patterson - Domain Specific Architecture&lt;/li&gt;
&lt;li&gt;open-source (community, de-facto standards)&lt;/li&gt;
&lt;li&gt;Xilinx App Store, Ubuntu software store&lt;/li&gt;
&lt;li&gt;Kria (ready-built application, downloadable from an on-line app store)&lt;/li&gt;
&lt;li&gt;roadmap (up to 1 nm), only for a selected products (cost-sensitive apps)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Design Rationale of Two Generations of AI Engines [Kees Vissers (Xilinx)]&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;again Hennessey and Patterson&lt;/li&gt;
&lt;li&gt;processor designed from a ground up&lt;/li&gt;
&lt;li&gt;comparison between &lt;strong&gt;traditional multi-core&lt;/strong&gt; and &lt;strong&gt;AI Engine Array&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;interconnect + DMA (no caches)&lt;/li&gt;
&lt;li&gt;better latency, efficiency than GPU and CPU&lt;/li&gt;
&lt;li&gt;1GHz+, 400 AI Engines per device&lt;/li&gt;
&lt;li&gt;AI Engine = conventional AI processor&lt;ul&gt;
&lt;li&gt;VLIW&lt;/li&gt;
&lt;li&gt;32-bit RISC&lt;/li&gt;
&lt;li&gt;512-bit SIMD (fixed point, floating point)&lt;/li&gt;
&lt;li&gt;Fixed-Point Vector Unit (similar to DSP48)&lt;/li&gt;
&lt;li&gt;FPMPY&lt;/li&gt;
&lt;li&gt;Multi-precision support (also Complex - for RF)&lt;/li&gt;
&lt;li&gt;memory:&lt;ul&gt;
&lt;li&gt;double buffering&lt;/li&gt;
&lt;li&gt;dataflow&lt;/li&gt;
&lt;li&gt;streaming communication (DMA between memories)&lt;/li&gt;
&lt;li&gt;multicast support&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;integration into PL: AXI-Stream&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Vitis libraries (vision, finance, linear algebra, ...)&lt;/li&gt;
&lt;li&gt;XAPP1351 (multi-rate filter), XAPP1352 (beamforming), XAPP1356 (FFT)&lt;/li&gt;
&lt;li&gt;Vitis AI (PyTorch/TensorFlow/Caffe to FPGA/AI engines)&lt;/li&gt;
&lt;li&gt;second gen AI Engine&lt;ul&gt;
&lt;li&gt;use case: ADAS, robotics, media&lt;/li&gt;
&lt;li&gt;in &lt;strong&gt;Versal AI Edge&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;bfloat16&lt;/li&gt;
&lt;li&gt;matrix * matrix multiply [(A0 x B0) + (A1 x B1) + ... ]&lt;/li&gt;
&lt;li&gt;data multicast (e.g. weights for NN)&lt;/li&gt;
&lt;li&gt;PCIe gen 5&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;high level AIE API (independent of underlying AIE)&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Day 2&lt;/h1&gt;
&lt;h2&gt;What's New in Vitis AI 1.4 and Vitis 2021.1 [George Wang (Xilinx)]&lt;/h2&gt;
&lt;h3&gt;Vitis&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;DPU&lt;/li&gt;
&lt;li&gt;libraries for AI engines: FIR, FFT, GEMM, vision&lt;/li&gt;
&lt;li&gt;GZIP, ZSTD library&lt;/li&gt;
&lt;li&gt;FIFO allocation with AI Engine&lt;/li&gt;
&lt;li&gt;x86 simulator for AIE&lt;/li&gt;
&lt;li&gt;Device Tree generator -&amp;gt; ZOCL node&lt;/li&gt;
&lt;li&gt;Vitis HLS: Flow Navigator&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Vitis AI 1.4&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;support for 2 Versals and Kria&lt;/li&gt;
&lt;li&gt;108 AI models in total in AI Model Zoo&lt;/li&gt;
&lt;li&gt;lidar, radar applications&lt;/li&gt;
&lt;li&gt;quantization-aware training, automatic network pruning&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Introduction to Kria System on Module [Karan Kantharia (Xilinx)]&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Xilinx idea: use SoM in final products&lt;/li&gt;
&lt;li&gt;vision market (becoming more fragmented): security camera, obj classifications, medial, AR/VR, emotion&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;KRIA K26 SOM&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Zynq (ARM + FPGA)&lt;/li&gt;
&lt;li&gt;several interfaces: LVDS, USB, MIPI, Ethernet, HDMI, DisplayPort, ...&lt;/li&gt;
&lt;li&gt;Xilinx idea: no more RTL/HW design -&amp;gt; up to 9 months faster Time to Market&lt;/li&gt;
&lt;li&gt;"no FPGA experience required"&lt;/li&gt;
&lt;li&gt;three options:&lt;ul&gt;
&lt;li&gt;for AI Developer: use AI model&lt;/li&gt;
&lt;li&gt;for SW devloper: use Vitis&lt;/li&gt;
&lt;li&gt;for HW developer: use Vivado ML&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Yocto and Ubuntu supported&lt;/li&gt;
&lt;li&gt;FCC, ... certified&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Workshop: Vitis AI 101: End-to-End Model Deployment with Vitis AI [Fan Zhang (Xilinx)]&lt;/h2&gt;
&lt;p&gt;https://github.com/fanz-xlnx/Adapt_Workshop_VAI101&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;ubuntu&lt;/span&gt;&lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="o"&gt;-***:~&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt; &lt;span class="n"&gt;lspci&lt;/span&gt;
&lt;span class="mo"&gt;00&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;00.0&lt;/span&gt; &lt;span class="n"&gt;Host&lt;/span&gt; &lt;span class="nl"&gt;bridge&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Intel&lt;/span&gt; &lt;span class="n"&gt;Corporation&lt;/span&gt; &lt;span class="mf"&gt;440F&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mf"&gt;82441F&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="n"&gt;PMC&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Natoma&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rev&lt;/span&gt; &lt;span class="mo"&gt;02&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="mo"&gt;00&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;01.0&lt;/span&gt; &lt;span class="n"&gt;ISA&lt;/span&gt; &lt;span class="nl"&gt;bridge&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Intel&lt;/span&gt; &lt;span class="n"&gt;Corporation&lt;/span&gt; &lt;span class="mi"&gt;82371&lt;/span&gt;&lt;span class="n"&gt;SB&lt;/span&gt; &lt;span class="n"&gt;PIIX3&lt;/span&gt; &lt;span class="n"&gt;ISA&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Natoma&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;Triton&lt;/span&gt; &lt;span class="n"&gt;II&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="mo"&gt;00&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;01.1&lt;/span&gt; &lt;span class="n"&gt;IDE&lt;/span&gt; &lt;span class="nl"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Intel&lt;/span&gt; &lt;span class="n"&gt;Corporation&lt;/span&gt; &lt;span class="mi"&gt;82371&lt;/span&gt;&lt;span class="n"&gt;SB&lt;/span&gt; &lt;span class="n"&gt;PIIX3&lt;/span&gt; &lt;span class="n"&gt;IDE&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Natoma&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;Triton&lt;/span&gt; &lt;span class="n"&gt;II&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="mo"&gt;00&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;01.3&lt;/span&gt; &lt;span class="nl"&gt;Bridge&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Intel&lt;/span&gt; &lt;span class="n"&gt;Corporation&lt;/span&gt; &lt;span class="mi"&gt;82371&lt;/span&gt;&lt;span class="n"&gt;AB&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;EB&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;MB&lt;/span&gt; &lt;span class="n"&gt;PIIX4&lt;/span&gt; &lt;span class="n"&gt;ACPI&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rev&lt;/span&gt; &lt;span class="mo"&gt;01&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="mo"&gt;00&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;02.0&lt;/span&gt; &lt;span class="n"&gt;VGA&lt;/span&gt; &lt;span class="n"&gt;compatible&lt;/span&gt; &lt;span class="nl"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Cirrus&lt;/span&gt; &lt;span class="n"&gt;Logic&lt;/span&gt; &lt;span class="n"&gt;GD&lt;/span&gt; &lt;span class="mi"&gt;5446&lt;/span&gt;
&lt;span class="mo"&gt;00&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;03.0&lt;/span&gt; &lt;span class="n"&gt;Ethernet&lt;/span&gt; &lt;span class="nl"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Amazon&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Inc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;Elastic&lt;/span&gt; &lt;span class="n"&gt;Network&lt;/span&gt; &lt;span class="n"&gt;Adapter&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ENA&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="mo"&gt;00&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="mf"&gt;.0&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;D&lt;/span&gt; &lt;span class="nl"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;NVIDIA&lt;/span&gt; &lt;span class="n"&gt;Corporation&lt;/span&gt; &lt;span class="n"&gt;GK210GL&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Tesla&lt;/span&gt; &lt;span class="n"&gt;K80&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rev&lt;/span&gt; &lt;span class="n"&gt;a1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="mo"&gt;00&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;1f.0&lt;/span&gt; &lt;span class="n"&gt;Unassigned&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ff80&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;XenSource&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Inc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;Xen&lt;/span&gt; &lt;span class="n"&gt;Platform&lt;/span&gt; &lt;span class="n"&gt;Device&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rev&lt;/span&gt; &lt;span class="mo"&gt;01&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;ubuntu&lt;/span&gt;&lt;span class="nv"&gt;@ip&lt;/span&gt;&lt;span class="o"&gt;-***&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;nvidia&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;smi&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="n"&gt;Wed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Sep&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;17&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2021&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;-----------------------------------------------------------------------------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;NVIDIA&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;SMI&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;470.57.02&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;Driver&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;Version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;470.57.02&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;CUDA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;Version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;11.4&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="c1"&gt;-------------------------------+----------------------+----------------------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GPU&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;Persistence&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;M&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Bus&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;Id&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;Disp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Volatile&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Uncorr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ECC&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Fan&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;Temp&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;Perf&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;Pwr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="k"&gt;Usage&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;Cap&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="n"&gt;Memory&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;Usage&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GPU&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;Util&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;Compute&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;M&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt;                               &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt;                      &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt;               &lt;/span&gt;&lt;span class="n"&gt;MIG&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;M&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;|===============================+======================+======================|&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;Tesla&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;K80&lt;/span&gt;&lt;span class="w"&gt;           &lt;/span&gt;&lt;span class="k"&gt;On&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="mf"&gt;.0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;Off&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt;                    &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="mi"&gt;55&lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;P0&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="mi"&gt;121&lt;/span&gt;&lt;span class="n"&gt;W&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;149&lt;/span&gt;&lt;span class="n"&gt;W&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="mi"&gt;4188&lt;/span&gt;&lt;span class="n"&gt;MiB&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;11441&lt;/span&gt;&lt;span class="n"&gt;MiB&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="mi"&gt;94&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="k"&gt;Default&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt;                               &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt;                      &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt;                  &lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;-------------------------------+----------------------+----------------------+&lt;/span&gt;

&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;-----------------------------------------------------------------------------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;Processes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;                                                                  &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;GPU&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;GI&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;CI&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;PID&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;Type&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;Process&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="w"&gt;                  &lt;/span&gt;&lt;span class="n"&gt;GPU&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Memory&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="w"&gt;                                                   &lt;/span&gt;&lt;span class="k"&gt;Usage&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;|=============================================================================|&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="mi"&gt;19274&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;python&lt;/span&gt;&lt;span class="w"&gt;                           &lt;/span&gt;&lt;span class="mi"&gt;4185&lt;/span&gt;&lt;span class="n"&gt;MiB&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;-----------------------------------------------------------------------------+&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;&lt;em&gt;IoU&lt;/em&gt; = Intersection over Union = Area of Overlap / Area of Union&lt;/li&gt;
&lt;li&gt;XIR format&lt;/li&gt;
&lt;li&gt;KV260 (KRIA)&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;xcompiler -t DPUCZDX8G_ISA0_B4096_MAX_BG2 -i quantize_result/ENet_int.xmodel -o compilation_results/KV260/ENet_cityscapes_pt/ENet_cityscapes_pt.xmodel&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;The compiled xmodel&amp;#39;s md5sum is 4bf46f368e9ff2d51fea136a19270c75&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;h1&gt;Day 3&lt;/h1&gt;
&lt;h2&gt;Expert Panel: Tips and Tricks&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;improvements in silicon&lt;ul&gt;
&lt;li&gt;"PL is what Xilinx is famous for"&lt;/li&gt;
&lt;li&gt;"end of Moore's law, end of Amdahl's law, end of Dennard scaling" --&amp;gt; more hardened IP (more like ASIC), e.g. DSP58&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;getting started guide for DFX ([xilinx.com/vivado/dfx])&lt;/li&gt;
&lt;li&gt;HDL (VHDL support)&lt;ul&gt;
&lt;li&gt;simulation: working on the support for the VHDL-2008 (&lt;em&gt;it is 2021&lt;/em&gt;), based on feature requests&lt;/li&gt;
&lt;li&gt;simulation: "current focus on SystemVerilog"&lt;/li&gt;
&lt;li&gt;synthesis: "at the advanced level"&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;open-source tools (providing the bitstream information)&lt;ul&gt;
&lt;li&gt;"secret sauce"&lt;/li&gt;
&lt;li&gt;"hacking protection"&lt;/li&gt;
&lt;li&gt;open-source front-end for Vivado and Vitis&lt;/li&gt;
&lt;li&gt;"leave bitstream generation to experts"&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;ML in Vivado&lt;/li&gt;
&lt;li&gt;congestion when 70% CLBs utilized&lt;/li&gt;
&lt;li&gt;revision control&lt;/li&gt;
&lt;li&gt;RTL workflow&lt;ul&gt;
&lt;li&gt;a couple of features require IPI (e.g. CIPS, NoC)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;porting software to AI Engines&lt;ul&gt;
&lt;li&gt;start from the C models&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;RapidWright&lt;/li&gt;
&lt;li&gt;QEMU&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Team-Based Collaborative Features in IP Integrator&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;UG994&lt;/li&gt;
&lt;li&gt;Block Design Container&lt;ul&gt;
&lt;li&gt;top-down workflow:&lt;ol&gt;
&lt;li&gt;create hierarchy&lt;/li&gt;
&lt;li&gt;validate&lt;/li&gt;
&lt;li&gt;&lt;em&gt;was distracted&lt;/em&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;e.g. debug vs no-debug version&lt;/li&gt;
&lt;li&gt;DFX flow&lt;/li&gt;
&lt;li&gt;Inter-NOC input&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Vitis HLS for High-Performance Kernels&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;v++&lt;/li&gt;
&lt;li&gt;optimizations&lt;ul&gt;
&lt;li&gt;pipeline (&lt;code&gt;II&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;SIMD&lt;/li&gt;
&lt;li&gt;dataflow (task parallelism, handshaking)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;data types:&lt;ul&gt;
&lt;li&gt;arrays: AXI4 Memory Mapped&lt;/li&gt;
&lt;li&gt;scalar: AXI4-Lite&lt;/li&gt;
&lt;li&gt;stream: AXI4-Stream&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="cp"&gt;#pragma HLS UNROLL factor=N&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;__attribute__&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;vector_size&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;Cppcon 2019: Faster Code Through Parallelism on CPU and GPU&lt;/li&gt;
&lt;li&gt;&lt;code&gt;RAM 1WnR&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;#pragma HLS BIND STORAGE&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;function call viewer&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Versal Architecture Solutions for PCIe and Cache Coherent Interconnect [Eric Crabill (Xilinx)]&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;in Versal&lt;ul&gt;
&lt;li&gt;CPM4 and CPM5 (gen 4 and gen 5)&lt;/li&gt;
&lt;li&gt;PL PCIE4 and PL PCIE5&lt;/li&gt;
&lt;li&gt;SRIOV&lt;/li&gt;
&lt;li&gt;integrated DMAs (QDMA and XDMA in hard IP)&lt;/li&gt;
&lt;li&gt;CCIX support&lt;/li&gt;
&lt;li&gt;connection to NoC&lt;/li&gt;
&lt;li&gt;CCIX to CHI bridge&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;CPM vs PL PCIE&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;CPM - feature rich&lt;/li&gt;
&lt;li&gt;PL PCIE - migration from previous architectures&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;QDMA vs XDMA&lt;/h3&gt;
&lt;h3&gt;CCIX and CXL&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;hetergeneous computing&lt;ul&gt;
&lt;li&gt;CPU + GPU&lt;/li&gt;
&lt;li&gt;CPU + ACAP&lt;/li&gt;
&lt;li&gt;CPU + Smart NIC&lt;/li&gt;
&lt;li&gt;...&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;classic PCIe = moving the data with DMA (SW-controlled)&lt;/li&gt;
&lt;li&gt;Cache cohherence = "move the data without using a driver"&lt;/li&gt;
&lt;li&gt;CCIX = symmetrical (CPU and accelerators are peers)&lt;/li&gt;
&lt;li&gt;CXL = CPU is the owner, multiple protocols: &lt;code&gt;cxl.io&lt;/code&gt;, &lt;code&gt;cxl.mem&lt;/code&gt;, &lt;code&gt;cxl.cache&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Documentation&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;PG347 (for CPM)&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Day 6&lt;/h1&gt;
&lt;h2&gt;The Xilinx SN1000: Accelerate Your Cloud Data Centers for Scalability and Performance&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;evolution (according to Xilinx)&lt;ol&gt;
&lt;li&gt;"traditional NIC"&lt;/li&gt;
&lt;li&gt;"offload NIC"&lt;/li&gt;
&lt;li&gt;"Programmable SmartNIC"&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;specific requirements, needs to adapt to changing workloads&lt;/li&gt;
&lt;li&gt;Alveo SN1000 (PCIe gen3 x16, up to 16 A72)&lt;/li&gt;
&lt;li&gt;Vitis Networking Stack (P4), HLS (C, C++), RTL&lt;/li&gt;
&lt;li&gt;Architecture: plugins&lt;/li&gt;
&lt;li&gt;Virtio, &lt;code&gt;vhost-vdpa&lt;/code&gt;, VirtIO NET PF&lt;/li&gt;
&lt;li&gt;vDPA - virtual Data-Path Acceleration (control plane managed by host)&lt;/li&gt;
&lt;li&gt;example: NVME-over-fabric, Ceph, &lt;code&gt;virtio-blk&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Azure Quantum Optimizing on FPGAs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;"Scaled Quantum Computing"&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Q#&lt;/code&gt;, Python SDK&lt;/li&gt;
&lt;li&gt;problem formulalation&lt;/li&gt;
&lt;li&gt;PUBO [0, 1], Ising [-1, 1]&lt;/li&gt;
&lt;li&gt;provides: Honeywell, IonQ, 1QBit&lt;/li&gt;
&lt;li&gt;example&lt;ul&gt;
&lt;li&gt;scheduling problem&lt;/li&gt;
&lt;li&gt;CPU runtime - 3 min 12 sec&lt;/li&gt;
&lt;li&gt;&lt;code&gt;python
  from azure.quantum.optimization import SimulatedAnealing&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;..., platform=HardwarePlatform.FPGA)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;FPGA runtime - 23 sec&lt;/li&gt;
&lt;li&gt;"10x speedup"&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;aka.ms/qsharp-blog&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;FPGA-Accelerated Structured Query Language (FAStQL) for Azure Synapse Analytics&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Apache Spark, Data Source v2 (DSv2) interface&lt;/li&gt;
&lt;li&gt;FPGA handles: parsing, filter, projection&lt;/li&gt;
&lt;li&gt;support for Decimal&lt;/li&gt;
&lt;li&gt;profiling: 80% on parsing, 20% on query&lt;/li&gt;
&lt;li&gt;plans for the future: compression/decompression, hash join, ...&lt;/li&gt;
&lt;li&gt;architecture: row scheduler, N row parsers, row combiners&lt;/li&gt;
&lt;li&gt;parsing: 6 - 7 GB/s&lt;/li&gt;
&lt;li&gt;filtering: stack-based processor (same arch: scheduler, N proc, output)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Improving Spark Storage Efficiency with NoLoad Transparent Compression&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;"Data Tsunami"&lt;/li&gt;
&lt;li&gt;computational storage&lt;/li&gt;
&lt;li&gt;CSP (Computational Storage Processor): computation, no persistant storage&lt;/li&gt;
&lt;li&gt;CSD (Compuational Storage Drive): computation + peristent data storage (FPGA + SSD)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;NVMe computation&lt;/em&gt; - in the process of standardization, expected in 2022&lt;/li&gt;
&lt;li&gt;NoLoad (r)&lt;ul&gt;
&lt;li&gt;NVMe-compilant front-end (looks like an NVMe device to the OS)&lt;/li&gt;
&lt;li&gt;certified by UNH-IOL&lt;/li&gt;
&lt;li&gt;accelerators: compression, decompression&lt;/li&gt;
&lt;li&gt;compute: analytics, ML, AI&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;NVMe-oF, Peer-to-Peer&lt;/li&gt;
&lt;li&gt;Apache Spark (data size up to PB), NoLoadFS, ZLIB compression offload&lt;ul&gt;
&lt;li&gt;Cisco UCS&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Breaking the Bonds of CPU-Centric AI Inferencing with NeuReality&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;"Server-on-a-Chip"&lt;/li&gt;
&lt;li&gt;cost, complexity&lt;/li&gt;
&lt;li&gt;current state&lt;ul&gt;
&lt;li&gt;training pods (NVIDIA, GRAPHCORE, SambaNova, Cerebras)&lt;/li&gt;
&lt;li&gt;Inferecne Servers: tenstorrent, untether.ai, nvidia, groq, Qualcomm&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;issue with current systems (according to NeuReality):
    moving the data between NIC --&amp;gt; CPU --&amp;gt; DLA (Deep Learning Accelerator)&lt;/li&gt;
&lt;li&gt;Versal ACAP + unique IP&lt;/li&gt;
&lt;li&gt;Kubernetes-managed&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;div style="font-size: 80%;" &gt;
Xilinx, Inc. Xilinx, the Xilinx logo, Alveo, Vivado, Vitis, Versal, Zynq are trademarks of Xilinx in the United States and
other countries.
&lt;/div&gt;

&lt;div style="font-size: 80%;" &gt;
All trademarks and registered trademarks are the property of their respective owners.
&lt;/div&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="misc"></category></entry><entry><title>USB-to-UART cable from an old ISDN modem</title><link href="www.j-marjanovic.io/usb-to-uart-cable-from-an-old-isdn-modem.html" rel="alternate"></link><published>2021-07-31T09:00:00+02:00</published><updated>2021-07-31T09:00:00+02:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2021-07-31:www.j-marjanovic.io/usb-to-uart-cable-from-an-old-isdn-modem.html</id><summary type="html">&lt;p&gt;A short intermezzo from all FPGA-related stuff, this time we deal with an
8051-based USB device. This blog post describes how I converted an ISDN modem
(with a USB connection) to a USB-to-UART cable.&lt;/p&gt;
&lt;p&gt;During my vacation at my parents' house, I wanted to access the UART on the
Ultra96 …&lt;/p&gt;</summary><content type="html">&lt;p&gt;A short intermezzo from all FPGA-related stuff, this time we deal with an
8051-based USB device. This blog post describes how I converted an ISDN modem
(with a USB connection) to a USB-to-UART cable.&lt;/p&gt;
&lt;p&gt;During my vacation at my parents' house, I wanted to access the UART on the
Ultra96 board to investigate the Linux boot procedure. Surprisingly, I did not
manage to find a Raspberry Pi or anything else which can talk UART, but I found
a box with old electronics. Among old phones, computer motherboards, GPUs, and
other relics of the past I found a PCB with a USB and an RJ45 connector. My
sixth sense for electronics made me think that this is a good starting point for
a USB-to-UART cable; the device already has a USB, and there will likely be a
UART port somewhere. &lt;/p&gt;
&lt;p&gt;&lt;em&gt;Note: no half-sane person would go this way to implement a simple
USB-to-UART bridge. If talking UART from your computer is your principal
objective, just buy a cable, or use a Raspberry Pi.&lt;/em&gt;&lt;/p&gt;
&lt;h1&gt;Initial inspection&lt;/h1&gt;
&lt;h2&gt;USB&lt;/h2&gt;
&lt;p&gt;The first thing I did is plugging the device into a computer, mainly to verify
that it is still somehow alive. The following was printed out in the &lt;code&gt;dmesg&lt;/code&gt;
output:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="m"&gt;[11816.274833] &lt;/span&gt;&lt;span class="k"&gt;usb 1-4.1:&lt;/span&gt; new full-speed USB device number 7 using xhci_hcd
&lt;span class="m"&gt;[11818.795110] &lt;/span&gt;&lt;span class="k"&gt;usb 1-4.1:&lt;/span&gt; New USB device found, idVendor=071d, idProduct=1000, bcdDevice= 0.3c
&lt;span class="m"&gt;[11818.795119] &lt;/span&gt;&lt;span class="k"&gt;usb 1-4.1:&lt;/span&gt; New USB device strings: Mfr=1, Product=2, SerialNumber=3
&lt;span class="m"&gt;[11818.795122] &lt;/span&gt;&lt;span class="k"&gt;usb 1-4.1:&lt;/span&gt; Product: Eicon DIVA USB
&lt;span class="m"&gt;[11818.795125] &lt;/span&gt;&lt;span class="k"&gt;usb 1-4.1:&lt;/span&gt; Manufacturer: Eicon Technology
&lt;span class="m"&gt;[11818.795128] &lt;/span&gt;&lt;span class="k"&gt;usb 1-4.1:&lt;/span&gt; SerialNumber: 0000001000
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;We see that the USB works, and we also got the product name (&lt;code&gt;Eicon DIVA USB&lt;/code&gt;).
A quick search on the internet revealed that this device is an ISDN modem, which
matches the observation, an RJ45 socket and the date code of June 1999.&lt;/p&gt;
&lt;p&gt;There is even a web page dedicated to various ISDN cards, where the main
characteristics of this device are listed: &lt;a href="https://www.isdncards.com/eicon-diva-usb"&gt;ISDN Cards Central: Eicon Diva
USB&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;PCB overview&lt;/h2&gt;
&lt;p&gt;&lt;img alt="Software in action" src="www.j-marjanovic.io/images/2021_ez_usb/pcb_front_rear.jpg" style="width:50%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;There are several easily-identifiable components on the PCB:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://media.digikey.com/pdf/Data%20Sheets/Cypress%20PDFs/AN2131SC,QC,AN2135SC,36SC.pdf"&gt;Cypress Semiconductor
  AN2135SC&lt;/a&gt;
  "EZ-USB™" = an 8051-based microcontroller with a dedicated USB engine&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.renesas.com/eu/en/document/dst/qs32xl384-datasheet"&gt;Renesas IDTQS32XL384&lt;/a&gt; - 20-bit bus switch and level translator&lt;/li&gt;
&lt;li&gt;Siemens PSB2115 - ISDN PC Adapter Circuit (this is where the ISDN magic happens)&lt;/li&gt;
&lt;li&gt;VAC 5054x005 ISDN transformer&lt;/li&gt;
&lt;li&gt;RJ45 socket&lt;/li&gt;
&lt;li&gt;&lt;a href="https://ww1.microchip.com/downloads/en/DeviceDoc/doc0336.pdf"&gt;Atmel AT24C32N&lt;/a&gt;
  32kbit I2C EEPROM (presumably to store manufacturer ID and maybe the program
  for the EZ-USB)&lt;/li&gt;
&lt;li&gt;some power-section related ICs&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;EZ-USB™&lt;/h1&gt;
&lt;p&gt;The main microcontroller on this board (AN2135SC) was designed specifically to
simplify the development of USB-based devices. EZ-USB Technical Reference Manual
is from May 2000, while the USB 1.1 standard was released in August 1998. The
TRM goes to a great length to explain the advantages of the USB ("Plug and
Play") and also serves as an introduction of the protocol itself.&lt;/p&gt;
&lt;h2&gt;Resources&lt;/h2&gt;
&lt;h3&gt;EZ-USB on Linux&lt;/h3&gt;
&lt;p&gt;&lt;a href="http://www.linux-usb.org/ezusb/"&gt;http://www.linux-usb.org/ezusb/&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;At this writing, all that firmware is statically linked into the appropriate mini-driver.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;Linux drivers&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://github.com/torvalds/linux/blob/v5.11/drivers/usb/misc/ezusb.c"&gt;https://github.com/torvalds/linux/blob/v5.11/drivers/usb/misc/ezusb.c&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/torvalds/linux/commit/8d733e26c076f47e7774c0e5baa74c9b1c01199a"&gt;https://github.com/torvalds/linux/commit/8d733e26c076f47e7774c0e5baa74c9b1c01199a&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;fxload&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt; apt info fxload
&lt;span class="go"&gt;Package: fxload&lt;/span&gt;
&lt;span class="go"&gt;Version: 0.0.20081013-1ubuntu2&lt;/span&gt;
&lt;span class="go"&gt;&amp;lt;...&amp;gt;&lt;/span&gt;
&lt;span class="go"&gt;Description: Firmware download to EZ-USB devices&lt;/span&gt;
&lt;span class="go"&gt; This program is conveniently able to download firmware into FX and FX2&lt;/span&gt;
&lt;span class="go"&gt; ez-usb devices. It is intended to be invoked by hotplug scripts when&lt;/span&gt;
&lt;span class="go"&gt; the unprogrammed device appears on the bus.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;h1&gt;Reverse-engineering the board&lt;/h1&gt;
&lt;p&gt;Since there are no BGA components on this board, and the main microcontroller
has only 44 pins, one can easily use a multimeter to reverse engineer the
connections between the most important components.&lt;/p&gt;
&lt;p&gt;I gathered the obtained knowledge in the schematics below:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Reverse-engineered schematics for EICON DIVA USB" src="www.j-marjanovic.io/images/2021_ez_usb/eicon_diva_schematics_partial.png" style="width:70%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;h1&gt;Modifications&lt;/h1&gt;
&lt;p&gt;From the schematics it is clear that one can easily reach the UART port with
some small modifications to the board:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;remove U103 MOSFET&lt;/li&gt;
&lt;li&gt;connect LED to PC6 (rotate R210 90deg to disconnect one pad, connect the
  flying pad on the R210 to the uC)&lt;/li&gt;
&lt;li&gt;connect UART RX cable to PC0 (R208 pad)&lt;/li&gt;
&lt;li&gt;connect UART TX cable to PC1 (R210 pad)&lt;/li&gt;
&lt;li&gt;connect ground cable to GND&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Firmware&lt;/h1&gt;
&lt;p&gt;With hardware in place, it was time to write some firmware for the
microcontroller. I managed to find a project on GitHub, titled
&lt;a href="https://github.com/hansiglaser/ezusb-firmware"&gt;ezusb-firmware&lt;/a&gt;, which can serve
as a starting point for custom developments. It is licensed under GPLv2 or later
and uses &lt;a href="http://sdcc.sourceforge.net/"&gt;sdcc&lt;/a&gt; compiler.&lt;/p&gt;
&lt;h2&gt;Programming the microcontroller&lt;/h2&gt;
&lt;p&gt;After modifying the code to match the pinout on the (modified) board, and
compiling the code with &lt;code&gt;sdcc&lt;/code&gt; a HEX file is produces. This gets then
easily downloaded to the board with the aforementioned &lt;code&gt;fxload&lt;/code&gt; utility:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt; sudo fxload -D /dev/bus/usb/001/008 -s /usr/share/usb/a3load.hex -I firmware.hex -t an21 -v
&lt;span class="go"&gt;microcontroller type: an21&lt;/span&gt;
&lt;span class="go"&gt;1st stage:  load 2nd stage loader&lt;/span&gt;
&lt;span class="go"&gt;open RAM hexfile image /usr/share/usb/a3load.hex&lt;/span&gt;
&lt;span class="go"&gt;stop CPU&lt;/span&gt;
&lt;span class="go"&gt;write on-chip, addr 0x0357 len   23 (0x0017)&lt;/span&gt;
&lt;span class="go"&gt;&amp;lt;...&amp;gt;&lt;/span&gt;
&lt;span class="go"&gt;write on-chip, addr 0x036e len   12 (0x000c)&lt;/span&gt;
&lt;span class="go"&gt;... WROTE: 775 bytes, 10 segments, avg 77&lt;/span&gt;
&lt;span class="go"&gt;reset CPU&lt;/span&gt;
&lt;span class="go"&gt;open RAM hexfile image firmware.hex&lt;/span&gt;
&lt;span class="go"&gt;2nd stage:  write external memory&lt;/span&gt;
&lt;span class="go"&gt;write external, addr 0x1b00 len   88 (0x0058)&lt;/span&gt;
&lt;span class="go"&gt;stop CPU&lt;/span&gt;
&lt;span class="go"&gt;2nd stage:  write on-chip memory&lt;/span&gt;
&lt;span class="go"&gt;write on-chip, addr 0x0000 len    4 (0x0004)&lt;/span&gt;
&lt;span class="go"&gt;write on-chip, addr 0x000b len    1 (0x0001)&lt;/span&gt;
&lt;span class="go"&gt;&amp;lt;...&amp;gt;&lt;/span&gt;
&lt;span class="go"&gt;write on-chip, addr 0x0862 len   32 (0x0020)&lt;/span&gt;
&lt;span class="go"&gt;... WROTE: 2378 bytes, 33 segments, avg 72&lt;/span&gt;
&lt;span class="go"&gt;reset CPU&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;With the proverbial LED blinking, I was able to confirm that this method of
programming the microcontroller works and that the firmware project works
correctly.&lt;/p&gt;
&lt;p&gt;I have continued the development by adding a UART-related code and later
implemented a method to retrieve the UART buffer over the USB and to transmit
the data over the UART from the USB. The code can be found in
&lt;a href="https://github.com/j-marjanovic/ezusb-uart/tree/master/firmware"&gt;ezusb-uart/firmware&lt;/a&gt;.&lt;/p&gt;
&lt;h1&gt;Software&lt;/h1&gt;
&lt;p&gt;As the last step, I hacked together an example from the
&lt;a href="https://libusb.info/"&gt;libusb&lt;/a&gt; library and some code to manage the receive and
transmit buffers. Ideally the terminal management (using the &lt;code&gt;termios&lt;/code&gt; API)
would be more elaborated, ideally one would also use a dedicated library for
such a task. But since the goal was just to capture some data from the Ultra96,
the current implementation is sufficient.&lt;/p&gt;
&lt;p&gt;The code is available in &lt;a href="https://github.com/j-marjanovic/ezusb-uart/tree/master/software"&gt;ezusb-uart/software&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Shown in the screenshot below is the output of &lt;code&gt;ezuart&lt;/code&gt; utility when connected
to the Ultra96 board:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Software in action" src="www.j-marjanovic.io/images/2021_ez_usb/software_example.png" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;h1&gt;Summary&lt;/h1&gt;
&lt;p&gt;Without too much effort, I was able to convert this ISDN modem to a USB-to-UART
cable. Several factors contributed to this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the microcontroller on this board was specifically designed to facilitate
  the development of USB devices&lt;/li&gt;
&lt;li&gt;the firmware for this device is downloaded over the USB, allowing easy 
  modifications and development of custom firmware&lt;/li&gt;
&lt;li&gt;the PCB was relatively simple (no BGAs)&lt;/li&gt;
&lt;li&gt;a fantastic firmware project by Martin Schmoelzer and Johann Glaser as a
  skeleton for my development&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There are a couple of limitations, the most jarring one is that this
microcontroller does not support 115200 baud operation, making it not really
suitable as a general-purpose UART cable. &lt;/p&gt;
&lt;p&gt;Nevertheless, I would classify this project as a success - I have managed to
convert an old piece of junk to a device, which can provide some insight into
the boot procedure of the Zynq on the Ultra96 board.&lt;/p&gt;
&lt;p&gt;&lt;img alt="EZ-USB board connected to Ultra96" src="www.j-marjanovic.io/images/2021_ez_usb/ezusb_uart_connected_to_u96.jpg" style="width:50%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;EZ-USB® is a registered trademark of Cypress Semiconductor Corp. All other trademarks and registered
trademarks are the property of their respective owners.&lt;/p&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="misc"></category></entry><entry><title>Notes from Chisel Community Conference China 2021</title><link href="www.j-marjanovic.io/notes-from-chisel-community-conference-china-2021.html" rel="alternate"></link><published>2021-06-26T04:00:00+02:00</published><updated>2021-06-26T04:00:00+02:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2021-06-26:www.j-marjanovic.io/notes-from-chisel-community-conference-china-2021.html</id><summary type="html">&lt;p&gt;Here are my notes from the Chisel Community Conference China 2021. The
conference took place on June 26th 2021, and was organized as a hybrid
conference, open to both on-site and remote participants (over Zoom). The
organizers have promised that the recorded talks will be made available online
in the …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Here are my notes from the Chisel Community Conference China 2021. The
conference took place on June 26th 2021, and was organized as a hybrid
conference, open to both on-site and remote participants (over Zoom). The
organizers have promised that the recorded talks will be made available online
in the next couple of weeks.&lt;/p&gt;
&lt;p&gt;Since the conference was located in China it started in the early morning hours
for the participants from Europe. I decided that 4:00 is a good compromise
between my love for Chisel and my need for sleep; I have only missed the two
invited talks at the beginning. In general the conference was quite interesting,
and it was clear that Chisel is particularly suited for highly configurable
designs, e.g. processors and data-processing pipelines.&lt;/p&gt;
&lt;p&gt;Written in &lt;em&gt;italics&lt;/em&gt; are my comments.&lt;/p&gt;
&lt;hr&gt;
&lt;h1&gt;[Invited Talk] Chisel breakdown 2&lt;/h1&gt;
&lt;p&gt;&lt;em&gt;...&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;cloneType&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;override def cloneType&lt;/code&gt; --&amp;gt; autoclonetype2&lt;/li&gt;
&lt;li&gt;&lt;em&gt;why have I never needed this?&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;autoclonetype1&lt;/code&gt; - deprecated, was based on reflection&lt;/li&gt;
&lt;li&gt;generated by the compiler plugin&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;aspect phase&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;insert additional hardware, layout, verification, ...&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h1&gt;Top 10 Common Misconception about Chisel&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Institute of Computing Technology at CAS&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;in Chinese, I understood "Chisel", "DSL", and "okay"&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;RocketChip&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;RocketChip is complicated, several additional ... (config, the bus framework, register gen)&lt;/li&gt;
&lt;li&gt;"if you are new to Chisel, DO NOT read the source code of RocketChip"&lt;/li&gt;
&lt;li&gt;&lt;em&gt;note to self: go read the source code of RocketChip&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Verilog is more expressive than Chisel&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;the presenter argues that logic is fundamentally:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;modules&lt;/li&gt;
&lt;li&gt;combinatorial logic&lt;/li&gt;
&lt;li&gt;registers&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;technically this is all supported in Chisel&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;em&gt;I am not sure if I would agree - I think there is still place for Verilog for low-level stuff, just like some parts of the code (e.g. in Linux kernel) are still written in assembly&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Chisel compile errors&lt;/h2&gt;
&lt;p&gt;types of errors:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Scala compile errors&lt;/li&gt;
&lt;li&gt;Scala run-time error&lt;/li&gt;
&lt;li&gt;Chisel build error&lt;/li&gt;
&lt;li&gt;FIRRTL transform error&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Circuit simulation error&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;distinction between fault (uninit variable), error (returning  a garbage value) and failure (segmentation fault)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;em&gt;how are these different?&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;the presenter argues that Chisel has a stricter type system than Verilog&lt;/li&gt;
&lt;li&gt;shouldn't Chisel be compared to SystemVerilog for a fair comparison? (also &lt;code&gt;default_nettype none&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;"hidden fault" -&amp;gt; "observable failure"&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;&lt;code&gt;Blackbox&lt;/code&gt; for (System)Verilog&lt;/h2&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Chisel Tester, Chisel Tester 2, UVM from Verilog&lt;/li&gt;
&lt;li&gt;"Agent Faker": TL-C UVM above Chisel Tester 2, open source&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;equality between Chisel and generated Verilog code&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;aka "the Chisel compiler is not formally verified"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;very complex task and unnecessary, one can run tests also on the generated Verilog&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;known-good&lt;/em&gt; --&amp;gt; successful Chisel projects: RocketChip, BOOM, lowRISC, NutShell, Labeled RISC-V, XiangShan&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Quality of Results for Chisel&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;misconception: Java slower than C --&amp;gt; Chisel hardware slower than Verilog hardware&lt;/li&gt;
&lt;li&gt;&lt;em&gt;OK, I experienced this first hand - my colleague was asking me what the fmax is for a typical Chisel-generated logic&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;will he also mention that Chisel is not an HLS?&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Chisel is not HLS&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;I predict the future&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;advanced features in Chisel/Scala (&lt;em&gt;i managed to understand "map", "mapper"&lt;/em&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;PPA: Power-Performance-Area&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;generated core readability&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;comments in verilog&lt;/li&gt;
&lt;li&gt;&lt;code&gt;EmbeddedTLB&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;wasn't there a patch to improve readbility of the generated code - check previous CCC&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;high-performance circuits in Chisel&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/OpenXiangShan/XiangShan"&gt;https://github.com/OpenXiangShan/XiangShan&lt;/a&gt;, &lt;a href="https://openxiangshan.github.io/"&gt;https://openxiangshan.github.io/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;looks impressive&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h1&gt;Introducing Decoder Generation API to Chisel&lt;/h1&gt;
&lt;h2&gt;example&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;7 segment LED&lt;/li&gt;
&lt;li&gt;input: b0 - b4, output: a - g, k for cathode&lt;/li&gt;
&lt;li&gt;non-valid states = (default, don't care)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;theory&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;AND plane, OR plane  (&lt;em&gt;this looks like PAL/GAL, right?&lt;/em&gt;, &lt;em&gt;is this really relevant for modern LUT-based FPGAs and ASICs?&lt;/em&gt;)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;now a full example of the 7-segment decoder implemented in PLA&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;logic optimization (definition from Wikipedia)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Quine-McClusky&lt;/li&gt;
&lt;li&gt;Espresso&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;sparser PLA&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Chisel utils&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;experimental.decode.TruthTable&lt;/code&gt;, &lt;code&gt;DecodeTableAnnotation&lt;/code&gt;, &lt;code&gt;decoder&lt;/code&gt;,
  &lt;code&gt;QMCMinimizer&lt;/code&gt;, &lt;code&gt;EspressoMinimizer&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;how does this affect later stages (e.g. optimization during the synthesis)&lt;/em&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;nice Scala feature - "list unpacking" - &lt;code&gt;val a :: b :: [...] = x.toBools()&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h1&gt;Practice of High-performance Chip Agile Development with Chisel&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;XiangShan CPU&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;agile development (iterative)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;SystemVerilog &lt;code&gt;interface&lt;/code&gt;s, Chisel &lt;code&gt;Bundle&lt;/code&gt;s&lt;/li&gt;
&lt;li&gt;processor parameters&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Ultra - apparently the highest-end impl of the processor&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Chisel solution - &lt;code&gt;Vec&lt;/code&gt; in a &lt;code&gt;Bundle&lt;/code&gt; as an I/O&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;FIRRTL transform for &lt;code&gt;printf&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;configurability&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;"Chisel = syntactic sugar for Verilog"&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;link: wallace tree multiplier, CCCC 2021&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;recursion is allowed in Chisel, the generated Verilog code does not include recursion&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://github.com/OpenXiangShan/XiangShan/pull/812"&gt;https://github.com/OpenXiangShan/XiangShan/pull/812&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;distinction between Chisel, Scala&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;Wire, Reg - straightforward to understand&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;advanced Scala features: object, abstract class, trait&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;advice: start with Chisel, learn Scala later, &lt;em&gt;i would not agree, learn Scala first&lt;/em&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;co-simulation&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Scala-based modules in Chisel Test2 cannot be used in other tools&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;behavioral models to replace actual Chisel modules&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;assertion generation in Chisel&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;generated names can change between Chisel versions, and can cause problems for physical design&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;summary&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;chisel is an advanced HDL, not HLS&lt;/li&gt;
&lt;li&gt;parametrization&lt;/li&gt;
&lt;li&gt;does not affect PPA (vs Verilog)&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h1&gt;Revisiting Diplomacy&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;not an HLS, HCL = HW Construction Language&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;configuration&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;RocketChip&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;an example of a configuration with Verilog: &lt;a href="https://github.com/riscv-mcu/e203_hbirdv2/blob/master/rtl/e203/core/config.v"&gt;https://github.com/riscv-mcu/e203_hbirdv2/blob/master/rtl/e203/core/config.v&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Verilog defines = "that is a piece of garbage" :D&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;defines&lt;/code&gt; are handled by a preprocessor&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;no easy way to provide a validation&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;diplomacy: parameter negotiation framework&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;CDE (Content Dependent Values)&lt;/li&gt;
&lt;li&gt;Scala &lt;code&gt;implicit&lt;/code&gt;s&lt;/li&gt;
&lt;li&gt;Scala type inference and type checking&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;API&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;user API (&lt;code&gt;extend Config&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;design API (&lt;code&gt;trait&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;parameters&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;hierarchy&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;topology&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;global defintion = bad&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;Parameter&lt;/code&gt;, &lt;code&gt;LazyModule&lt;/code&gt;, passed implicitly&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;topology parameters&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;interfaces as DAG (Directed Acyclic Graph)&lt;/li&gt;
&lt;li&gt;acylic: how does DMA look like?&lt;/li&gt;
&lt;li&gt;interfaces: AXI, TileLink, ...&lt;/li&gt;
&lt;li&gt;2-phase elaboration&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Summary&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Diplomacy refactor -&amp;gt; stand-alone library&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;(plans for) TileLink, AXI, ACE, CHI, WishBone&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;RocketChip newbies&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;TileLink implementation&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h1&gt;Using partial swarm optimization to reduce verification time&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;prerecorded intro, &lt;em&gt;issues with audio&lt;/em&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Partial Swarm Optimization&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;v_i - particle velocity&lt;/li&gt;
&lt;li&gt;C - learning factor&lt;/li&gt;
&lt;li&gt;global best&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;CDMA - some kind of a DMA, apparently &lt;br/&gt;
MCIF - presumably memory controller interface&lt;/p&gt;
&lt;p&gt;1 channel with weights &lt;br/&gt;
3 data channels: IMG, WG, DC&lt;/p&gt;
&lt;p&gt;data word: 78*3 bits&lt;/p&gt;
&lt;p&gt;process:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;generate random solution&lt;/li&gt;
&lt;li&gt;calculate the fitness&lt;/li&gt;
&lt;li&gt;update the particle speed and position&lt;/li&gt;
&lt;li&gt;end a high-quality stimulus was found, else goto 1&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;
&lt;h1&gt;A general method of generating stimulus based on SVM&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;verification consumes a lot of resources&lt;/li&gt;
&lt;li&gt;using SVM to &lt;strong&gt;predict&lt;/strong&gt; coverage&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;SVM - support vector machine (binary classification)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;using Matlab -&amp;gt; different types, different functions&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Convolution Pipeline:&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;CDMA, CBUF, CSC, CMAC, ...&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;CSC - convolution sequence controller = gets the data from DMA, loads/schedules it into MAC&lt;/p&gt;
&lt;p&gt;nvdla-csc&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;SG - sequence generator&lt;/li&gt;
&lt;li&gt;DL - data loader&lt;/li&gt;
&lt;li&gt;WL - weight loader&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;training&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;data, labels = coverage data&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;process:&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;training&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;random stimuli&lt;/li&gt;
&lt;li&gt;get the coverage from VCS&lt;/li&gt;
&lt;li&gt;use this to create training set for SVM&lt;/li&gt;
&lt;li&gt;train SVM&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;an arbitrary value is chosen as a threshold between the labels&lt;/p&gt;
&lt;p&gt;&lt;em&gt;couldn't this be done better with Reinforcement Learning? and is the SVM the right tool to use?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Q: relationship between the project and Chisel? &lt;br/&gt;
A: &lt;em&gt;answer in Chinese&lt;/em&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h1&gt;Summary of Problems and Experiences during the Processor Development based on Chisel&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;short intro (chisel sheatsheet, Programming in Scala, software thinking, hardware thinking)&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;val&lt;/code&gt; vs &lt;code&gt;def&lt;/code&gt; (&lt;em&gt;this could be nasty to debug, &lt;code&gt;def&lt;/code&gt; creates a new instance on each call&lt;/em&gt;)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;Cat&lt;/code&gt; vs &lt;code&gt;Map&lt;/code&gt;, hardware thinking vs software thinking --&amp;gt; Cat starts at MSB, Map starts at LSB&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;width - in &lt;code&gt;Cat&lt;/code&gt; it should be explicitly defined&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;careful with &lt;code&gt;DontCare&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;Syntactic Salt - I like this very much&lt;/em&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;competitive assignment (loop index)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;developer perspective&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;"Chisel = excelent HDL, free devs from dirty work"&lt;/li&gt;
&lt;li&gt;"be familiar with Scala before using Chisel"&lt;/li&gt;
&lt;li&gt;check the generated verilog&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;I liked this talk, the examples were relevant&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Q: a question about the competitive assignment (in Chinese)&lt;/p&gt;
&lt;p&gt;Q: &lt;em&gt;i understood TPU, SiFive and xie xie&lt;/em&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h1&gt;Stimuli Generation by Constrained Markov Chain Monte Carlo Simulation for Chisel-based Deep Learning Accelerator Verification Platform&lt;/h1&gt;
&lt;p&gt;&lt;em&gt;was this title also generated with a Markov Chain?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;nvdla.org&lt;/p&gt;
&lt;h2&gt;Direction Convolution&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;input: CSC, output CACC&lt;/li&gt;
&lt;li&gt;each MAC cell: 64 multipliers&lt;/li&gt;
&lt;li&gt;pipelined status&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Constrained Random Testing&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Markov-Chain Monte-Carlo (MCMC)&lt;/li&gt;
&lt;li&gt;Monte-Carlo: draw independent samples from the distribution&lt;/li&gt;
&lt;li&gt;Markov-Chain: the current value is probabilistically dependent on the previous value&lt;/li&gt;
&lt;li&gt;Metropolis-Hastings Algorithm (proposal, an acceptance of the proposal)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;implementation&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;use a pool of states to generate a low-correlation stimulus&lt;/li&gt;
&lt;li&gt;MCMC-based fault location&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;very precise and academic presentation, unclear how it relates to Chisel and
how this method can be used in practice&lt;/em&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h1&gt;Implementation of a Highly Configurable Wallace Tree Multiplier with Chisel&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;recursion for Wallace tree compression&lt;/li&gt;
&lt;li&gt;only 120 lines of Chisel code&lt;/li&gt;
&lt;li&gt;configurable pipelining&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;the algorithm&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;booth-4 encoding&lt;/li&gt;
&lt;li&gt;n*n mul &amp;lt;-&amp;gt; n/2 partial products&lt;/li&gt;
&lt;li&gt;sign-extend (&lt;code&gt;i match [...]&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;tree-compression - columns represented as &lt;code&gt;Array[Seq[Bool]]&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;code&gt;Seq&lt;/code&gt; vs &lt;code&gt;Array&lt;/code&gt;?&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;what is the benefit of using &lt;code&gt;Array&lt;/code&gt; (from Java) vs &lt;code&gt;Vec&lt;/code&gt; (from Chisel)&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;compress the whole tree (+ register insertion)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;summary: highly configurable, better scalability, easier to read&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Q: something about latency on the slide about pipelining&lt;/p&gt;
&lt;hr&gt;
&lt;h1&gt;Agile IC Design Team Working in Chisel, Empowered by Diplomacy and Config&lt;/h1&gt;
&lt;p&gt;&lt;a href="https://www.streamcomputing.com/en/"&gt;https://www.streamcomputing.com/en/&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;RocketChip/BOOM&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Qs when reading the code: &lt;code&gt;implicit&lt;/code&gt;, &lt;code&gt;:*=&lt;/code&gt; and &lt;code&gt;:=*&lt;/code&gt;, ...&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Diplomacy&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Scala framework&lt;/li&gt;
&lt;li&gt;negotation/castint/modifying parameters&lt;/li&gt;
&lt;li&gt;LazymoduleImp&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Config&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Scala framework&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;definition of parameters globaly&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Chisel =/= Diplomacy and Config&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Diplomacy and Config = pure Scala&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;pros:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;suitable for SoCs&lt;/li&gt;
&lt;li&gt;availability of open-source IP&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;cons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;hard to fully understand&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;building a Chisel Team&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;firm commitment to Chisel&lt;/li&gt;
&lt;li&gt;no modifications of the generated Verilog&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;divide an conquer: dedicated experts for Diplomacy and Config&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;stages:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Chisel basis (Bundle, Reg, WIre,, ...)&lt;/li&gt;
&lt;li&gt;Diplomacy and Config (case class, high-order function, other advanced features)&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;minor tricks: wrap important signals in modules, use CamelCase in Chisel&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h1&gt;Experience Sharing: Develop NutShell using Chisel&lt;/h1&gt;
&lt;p&gt;&lt;a href="https://github.com/OSCPU/NutShell"&gt;https://github.com/OSCPU/NutShell&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;NutShell (5 undergraduates in 4 months)&lt;/li&gt;
&lt;li&gt;SDRAM, SPI and UART,&lt;/li&gt;
&lt;li&gt;boots Linux (Debian/Fedora)&lt;/li&gt;
&lt;li&gt;single-issue, in-order core&lt;/li&gt;
&lt;li&gt;RV64IMAC, Zifence, Zicsr&lt;/li&gt;
&lt;li&gt;runs at 60 MHz on Zynq 7000, 200 MHz on Zynq US+, 350 MHz in 110 nm SMIC&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;example:&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;MaskedRegMap&lt;/code&gt; abstract class - address, read and write side effects&lt;/li&gt;
&lt;li&gt;&lt;code&gt;apply&lt;/code&gt; method, &lt;code&gt;generate&lt;/code&gt; method&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;peripheral devices&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;MMIO - AXI4 or AXI4Lite&lt;/li&gt;
&lt;li&gt;&lt;code&gt;AXI4SlaveModule&lt;/code&gt; abstract class&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;suggestion&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;avoid Verilog-like Chisel&lt;/li&gt;
&lt;li&gt;software-like Chisel - careful with &lt;code&gt;var&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;find the balance&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h1&gt;Chisel Implementation Tutorial - in a lightweight ...&lt;/h1&gt;
&lt;p&gt;&lt;em&gt;the title in the program was "Light Weight Chisel3 KnitKit"&lt;/em&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Data -&amp;gt; Bits -&amp;gt; Clock/Wire/... (&lt;em&gt;outdated diagram&lt;/em&gt;)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;RegInit&lt;/code&gt; with different types of resets&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;running through slides&lt;/em&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;io connection with &lt;code&gt;withClockAndReset&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://github.com/colin4124/Chisel-Implementation-Tutorial"&gt;https://github.com/colin4124/Chisel-Implementation-Tutorial&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;created 6 hours ago&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Q: something about &lt;code&gt;AutoBundle&lt;/code&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h1&gt;Use Firrtl Transform to Control the Effective Range of 'printf' in Large Scale Circuits&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;printf&lt;/code&gt; in Chisel can be translated in &lt;code&gt;fwrite&lt;/code&gt; in Verilog&lt;/li&gt;
&lt;li&gt;extensive printing will slow down the simulation&lt;/li&gt;
&lt;li&gt;typically only as small part of the code is inspected&lt;/li&gt;
&lt;li&gt;using FIRRTL transform&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;p&gt;4 types of annotation:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;EnablePrintfAnnotation&lt;/code&gt;, &lt;code&gt;Disable..&lt;/code&gt;, &lt;code&gt;DisableAll..&lt;/code&gt;, &lt;code&gt;Remove Assert...&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;execute(c: CircuitState)&lt;/code&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;ancestor (hierarchy)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;firrtl.analyses.CircuitGraph&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h1&gt;Reinforcement Learning based Stimulus Generation for Chisel Module Verification&lt;/h1&gt;
&lt;h2&gt;Constrained Random Test&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;test generator&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;CACC: convolution accumulator&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;assembly SRAM group, delivery SRAM group (banks of 64Bx32 SRAMs)&lt;/li&gt;
&lt;li&gt;assembly = accumulation with saturation&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Reinforcement Learning&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Agent&lt;/li&gt;
&lt;li&gt;Action&lt;/li&gt;
&lt;li&gt;Environment&lt;/li&gt;
&lt;li&gt;Reward&lt;/li&gt;
&lt;li&gt;State&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Q-learning (Q = quality), Bellman equation,&lt;/p&gt;
&lt;p&gt;states = coverage, actions = stimulus
gamma = reward discount&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;agent is exploring the environment, Q-table is updated, Q is maximized&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;approach&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;extremely large input vector: 2 ** 131&lt;/li&gt;
&lt;li&gt;multiplexer: stimulus = 128 digits, select = 3 bits&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;controller written in Python! (at a Chisel workshop)&lt;/em&gt; :)&lt;/p&gt;
&lt;hr&gt;
&lt;h1&gt;Genetic Algorithm based Stimulus Generation for Chisel Module Verification&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;Single Point Data Processor = post-processor of NVDLA&lt;/li&gt;
&lt;li&gt;inputs: i32 x 16 from CACC&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Genetic Algorithm&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;population&lt;/li&gt;
&lt;li&gt;fitness calculation&lt;/li&gt;
&lt;li&gt;mating pool&lt;/li&gt;
&lt;li&gt;parents selection&lt;/li&gt;
&lt;li&gt;mating (crossover and mutation)&lt;/li&gt;
&lt;li&gt;offsprings&lt;/li&gt;
&lt;li&gt;back to population&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;approach&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;20 binary stimulus vectors&lt;/li&gt;
&lt;li&gt;fitness calculation - coverage from VCS&lt;/li&gt;
&lt;li&gt;matting: exchange bits in vectors between stimuli vectors, mutation: bit flips&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;again no mention of Chisel, no result presented&lt;/em&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h1&gt;Recurring Neural Networks based Stimuli Generation for Chisel Module Verification&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;constrained random verification (PRG for randomization, constraints for stimuli)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Recurring Neural Network&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;neuron: x * w - offset + threshold&lt;/li&gt;
&lt;li&gt;Hopfield Network (each neurons output is connected to all other neurons but not to itself)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;PDP (Planar Data Processing)&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;results of the pooling process&lt;/li&gt;
&lt;li&gt;outputs: max, min, average&lt;/li&gt;
&lt;li&gt;traverses width, height, channel&lt;/li&gt;
&lt;li&gt;maximum pooling (pipelined)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;approach&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;inputs: control, status from the previous module, data payload&lt;/li&gt;
&lt;li&gt;output: status, and data&lt;/li&gt;
&lt;li&gt;Python based controller, simulation with VCS&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h1&gt;Quasar: SweRV-EL2 implemented in CHISEL&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;Convert SweRV-EL2 from SystemVerilog to Chisel&lt;/li&gt;
&lt;li&gt;comparison of SystemVerilog to Chisel&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Quasar&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;4-stage, mostly in-order, RV32IMC, runs at 600 MHz at 16 nm&lt;/li&gt;
&lt;li&gt;development procedure: first unit test, then  comparison between SweRV-EL2&lt;/li&gt;
&lt;li&gt;LEC (*is this a formal) on the generated Verilog&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Pros/Cons&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;parametreizable &amp;amp; scalable code&lt;/li&gt;
&lt;li&gt;no linting problems&lt;/li&gt;
&lt;li&gt;people are reluctant about adopting Chisel&lt;/li&gt;
&lt;li&gt;"Chisel makes the verification more tedious"&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Analysis of the results&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;fMAX, area and power almost the same (2-3% difference)&lt;/li&gt;
&lt;li&gt;Chisel: 12 kLOC, SystemVerilog: 19 kLOC&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Roadmap&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;F-extension (will be open-sourced in the near future)&lt;/li&gt;
&lt;li&gt;vector extension&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://github.com/Lampro-Mellon/Chisel-Training"&gt;https://github.com/Lampro-Mellon/Chisel-Training&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h1&gt;ChiselVerify: A Verification Framework for Chisel&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;verification = testing before tape-out&lt;/li&gt;
&lt;li&gt;validation = testing after tape-out&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Current solutions&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;ChiselTest (not many functions), ScalaTest&lt;/li&gt;
&lt;li&gt;SystemVerilog, UVM (verbose, multiple languages)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;ChiselVerify&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;an extension to ChiselTest&lt;/li&gt;
&lt;li&gt;4 parts&lt;/li&gt;
&lt;li&gt;functional coverage&lt;/li&gt;
&lt;li&gt;constraint random verification&lt;/li&gt;
&lt;li&gt;bus functional model&lt;/li&gt;
&lt;li&gt;timed assertions&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;functional coverage&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;statement coverage / functional coverage&lt;/li&gt;
&lt;li&gt;verification plan &amp;lt;- cover groups &amp;lt;- cover points] (Range, Conditions, Cross, Timed)&lt;/li&gt;
&lt;li&gt;coverage database&lt;/li&gt;
&lt;li&gt;coverage reporter&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="n"&gt;cr&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CoverageReporter&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dut&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;cr&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;register&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
      &lt;span class="nc"&gt;CoverPoints&lt;/span&gt;&lt;span class="o"&gt;(...),&lt;/span&gt;
      &lt;span class="o"&gt;...&lt;/span&gt;
  &lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="n"&gt;cr&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;printReport&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;h3&gt;constraint random verification&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;constraint programmable language&lt;/li&gt;
&lt;li&gt;JaCoP as an SMT solver&lt;/li&gt;
&lt;li&gt;custom distribution&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;bus functional models&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;AXI4 interface&lt;/li&gt;
&lt;li&gt;Transactions&lt;/li&gt;
&lt;li&gt;software abstraction&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;timed assertions&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;types of delays:&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Exactly&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Eventually&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Always&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Never&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;summary&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;"test Chisel designs in Scala"&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://github.com/chiselverify/chiselverify"&gt;https://github.com/chiselverify/chiselverify&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Teaching Digital Design with Chisel&lt;/h2&gt;
&lt;p&gt;Q at CCC 2020: "Is Chisel ready for class?" &lt;br/&gt;
A: yes&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;two courses: Digital Electronics 1 &amp;amp; 2&lt;/li&gt;
&lt;li&gt;VHDL until 2019, DE1 still uses VHDL, DE2 uses Chisel&lt;/li&gt;
&lt;li&gt;"VHDL is dying a little bit"&lt;/li&gt;
&lt;li&gt;IntelliJ, sbt (&lt;code&gt;sbt run&lt;/code&gt;, &lt;code&gt;sbt test&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Digital Design with Chisel (&lt;a href="https://www.imm.dtu.dk/~masca/chisel-book.html"&gt;https://www.imm.dtu.dk/~masca/chisel-book.html&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;LOC(VHDL)/LOC(Chisel) ~ 2&lt;/li&gt;
&lt;li&gt;Simulation + GUI in Swing&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h1&gt;Towards Agile Networking Hardware - Chisel at OVHcloud&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;OVHcloud = 1st European cloud provider&lt;/li&gt;
&lt;li&gt;22 Tbps bandwidth, DDoS&lt;/li&gt;
&lt;li&gt;anti-DDoS (scrubbing with FPGAs)&lt;/li&gt;
&lt;li&gt;attackers are agile, to respond&lt;/li&gt;
&lt;li&gt;HLS - reduces performance and agility&lt;/li&gt;
&lt;li&gt;does HCL improve agility without affecting performance&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;de-risking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;counter store (hash table with a cuckoo filter)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hal.archives-ouvertes.fr/hal-03157426"&gt;https://hal.archives-ouvertes.fr/hal-03157426&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;reset, async reset added recently&lt;/li&gt;
&lt;li&gt;with no reset the P&amp;amp;R results are better&lt;/li&gt;
&lt;li&gt;Pull Request for Preset&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;flow&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;limitation: parameters are embedded in names&lt;/li&gt;
&lt;li&gt;solution: a wrapper in Verilog for the Chisel-generated Verilog&lt;/li&gt;
&lt;li&gt;successful SV/Chisel cohabitation&lt;/li&gt;
&lt;li&gt;sv2chisel ("low level Chisel"), challenges: clock and reset retrieval, choosing types&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/ovh/sv2chisel"&gt;https://github.com/ovh/sv2chisel&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Summary&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;CocoTB instead of Chisel-testers&lt;/li&gt;
&lt;li&gt;Pipeline abstraction (PhD thesis, a DSL on top of Chisel)&lt;/li&gt;
&lt;/ul&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="misc"></category><category term="FPGA"></category><category term="Scala"></category><category term="Chisel"></category></entry><entry><title>Performance counters in Cache Coherent Interconnect in Zynq MPSoC</title><link href="www.j-marjanovic.io/performance-counters-in-cache-coherent-interconnect-in-zynq-mpsoc.html" rel="alternate"></link><published>2021-06-12T20:00:00+02:00</published><updated>2021-06-12T20:00:00+02:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2021-06-12:www.j-marjanovic.io/performance-counters-in-cache-coherent-interconnect-in-zynq-mpsoc.html</id><summary type="html">&lt;p&gt;In this blog post I describe my tinkering with the interconnect architecture of
the Xilinx Zynq MPSoC. I specifically focus on the Performance Monitoring Unit
(PMU) integrated into the Cache Coherence Interconnect (CCI).&lt;/p&gt;
&lt;p&gt;I believe that most of the readers of my blog are already familiar with Xilinx
Zynq® UltraScale …&lt;/p&gt;</summary><content type="html">&lt;p&gt;In this blog post I describe my tinkering with the interconnect architecture of
the Xilinx Zynq MPSoC. I specifically focus on the Performance Monitoring Unit
(PMU) integrated into the Cache Coherence Interconnect (CCI).&lt;/p&gt;
&lt;p&gt;I believe that most of the readers of my blog are already familiar with Xilinx
Zynq® UltraScale+™ MPSoC, but for the sake of completeness let's do a quick
introduction. Zynq MPSoC is a programmable device, combining a quad-core ARM
Cortex-A53 (called Application Processing Unit (APU) in Xilinx-speak) and a
relatively large FPGA (called Programmable Logic (PL) in Xilinx-speak) in one
package. Sitting in between the two parts is an interconnect, more precisely
ARM® CoreLink™ CCI-400 Cache Coherent Interconnect. The majority of the
connections between the APU and PL go through this interconnect, which makes it
one of the more important parts of the Processing System (PS).&lt;/p&gt;
&lt;p&gt;In simple use cases the interconnect is mostly transparent for the users; from
the APU side, the memory transactions (i.e. reads and writes) come to the
interconnect and get routed to the appropriate output port (e.g. DDR controller
or PL manager ports - &lt;code&gt;M_AXI_HPMx_FPD&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;The overview of the CCI is shown in the excerpt from the UG1085 below - the
CCI is shown prominently in its central location.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Excerpt from the UG1085 showing the interconnect" src="www.j-marjanovic.io/images/2021_cci_part_1/ug1085_ps_interconnect.png" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;h1&gt;Linux driver for CCI PMU&lt;/h1&gt;
&lt;p&gt;Already provided in Linux is a driver for the Performance Monitoring Unit
(PMU) in the CCI. This driver is enabled with the &lt;code&gt;CONFIG_ARM_CCI_PMU&lt;/code&gt; variable,
and for Zynq MPSoC this option is by default already turned on.&lt;/p&gt;
&lt;p&gt;The driver prints a short message in the &lt;code&gt;dmesg&lt;/code&gt; to indicate that it was
successfully loaded:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;root@u96v2-sbc:~#&lt;/span&gt; dmesg &lt;span class="p"&gt;|&lt;/span&gt; grep CCI
&lt;span class="go"&gt;[    3.218405] ARM CCI_400_r1 PMU driver probed&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;We can then use &lt;a href="https://perf.wiki.kernel.org/index.php/Main_Page"&gt;perf&lt;/a&gt; command
to list all performance counters available in the system, and among those there
are also listed those from the CCI-400:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;root@u96v2-sbc:~#&lt;/span&gt; perf list

&lt;span class="go"&gt;List of pre-defined events (to be used in -e):&lt;/span&gt;

&lt;span class="go"&gt;[...]&lt;/span&gt;
&lt;span class="go"&gt;  CCI_400_r1/cycles/                                 [Kernel PMU event]&lt;/span&gt;
&lt;span class="go"&gt;[...]&lt;/span&gt;
&lt;span class="go"&gt;  CCI_400_r1/si_rrq_hs_any,source=?/                 [Kernel PMU event]&lt;/span&gt;
&lt;span class="go"&gt;[...]&lt;/span&gt;
&lt;span class="go"&gt;  CCI_400_r1/si_wrq_hs_any,source=?/                 [Kernel PMU event]&lt;/span&gt;
&lt;span class="go"&gt;[...]&lt;/span&gt;
&lt;span class="go"&gt;  CCI_400_r1/si_wrq_hs_write_unique,source=?/        [Kernel PMU event]&lt;/span&gt;
&lt;span class="go"&gt;  CCI_400_r1/si_wrq_stall_tt_full,source=?/          [Kernel PMU event]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Although at this stage the performance counters are already accessible from the
system, except for the &lt;code&gt;cycles&lt;/code&gt; counter no other counter is actually counting.&lt;/p&gt;
&lt;p&gt;It turns out that the CCI-400 IP has some external signals which can disable
the performance counters, even if the registers are properly configured.&lt;/p&gt;
&lt;p&gt;Presented below is a table from the &lt;a href="https://developer.arm.com/documentation/ddi0470/i/programmers-model/register-descriptions/performance-monitor-control-register--pmcr-?lang=en"&gt;CCI-400 Technical Reference
Manual&lt;/a&gt;
(&lt;a href="http://archive.today/2021.06.12-193819/https://developer.arm.com/documentation/ddi0470/i/programmers-model/register-descriptions/performance-monitor-control-register--pmcr-?lang=en"&gt;archive.today
link&lt;/a&gt;)
which shows that &lt;code&gt;NIDEN&lt;/code&gt; must be high for &lt;em&gt;Event counters&lt;/em&gt; to be
enabled.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Table describing the function of the NIDEN input" src="www.j-marjanovic.io/images/2021_cci_part_1/cci_manual.png" style="width:50%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;h1&gt;Configuration&lt;/h1&gt;
&lt;p&gt;Looking at the register map for Zynq MPSoC in the
&lt;a href="https://www.xilinx.com/html_docs/registers/ug1087/ug1087-zynq-ultrascale-registers.html"&gt;UG1087&lt;/a&gt;
one can note that there are two modules associated with Cache Coherent
Interconnect.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Modules in the Zynq MPSoC register map" src="www.j-marjanovic.io/images/2021_cci_part_1/ug1087.png" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;In my opinion, the documentation from Xilinx is a little bit vague, but I presume
that the CCI_REG acts as a GPIO which then drives the debug inputs on the CCI
module. Shown in the figure below is the CCI module together with this auxiliary
module. This block diagram is based on my current understanding.&lt;/p&gt;
&lt;p&gt;&lt;img alt="CCI and debug signals" src="www.j-marjanovic.io/images/2021_cci_part_1/cci_reg_block_diag.png" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;h2&gt;Accessing the configuration registers&lt;/h2&gt;
&lt;p&gt;With this in mind, one would be tempted to quickly change the value of NIDEN
input directly from user space with &lt;code&gt;devmem&lt;/code&gt; utility.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;root@u96v2-sbc:~#&lt;/span&gt; devmem 0xFD5E0000
&lt;span class="go"&gt;Bus error&lt;/span&gt;
&lt;span class="gp"&gt;root@u96v2-sbc:~#&lt;/span&gt; devmem 0xFD5E0040
&lt;span class="go"&gt;Bus error&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Here we encounter another issue, namely the CCI_REG is protected by the Xilinx
Memory Protection Unit (XMPU) and is only accessible from the secure
environment.&lt;/p&gt;
&lt;h2&gt;Exception Levels&lt;/h2&gt;
&lt;p&gt;There are &lt;a href="https://developer.arm.com/documentation/102412/0100/Privilege-and-Exception-levels"&gt;4 exception
levels&lt;/a&gt;
defined in ARMv8 architecture.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The user space runs in EL0&lt;/li&gt;
&lt;li&gt;The kernel space runs in EL1&lt;/li&gt;
&lt;li&gt;Hypervisors run in EL2&lt;/li&gt;
&lt;li&gt;Firmware runs in EL3&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We can get the current Exception Level by reading the &lt;a href="https://developer.arm.com/documentation/ddi0595/2020-12/AArch64-Registers/CurrentEL--Current-Exception-Level"&gt;CurrentEL&lt;/a&gt; register.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;In user space this instruction throws &lt;code&gt;Illegal instruction&lt;/code&gt; - this is expected&lt;/li&gt;
&lt;li&gt;In kernel space the reported level is 1: &lt;code&gt;[ 1091.821735] jan-level: EL = 1&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;In FSBL the reported level is 3:  &lt;code&gt;EL = 3&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Patch for the First Stage BootLoader (FSBL)&lt;/h2&gt;
&lt;p&gt;Since we now know that FSBL runs in EL3, and we need EL3 to access the CCI_REG
module, we can patch the FSBL to configure the appropriate registers before
continuing with the boot. In this way, the &lt;code&gt;NIDEN&lt;/code&gt; and also &lt;code&gt;SPIDEN&lt;/code&gt; signals
will be already set high before the Linux boots.&lt;/p&gt;
&lt;p&gt;I wrote the following patch and included it in the Bitbake recipe for the FSBL:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;From 08450fd4c18d11fedf196c65c22f8abf83a6cc2a Mon Sep 17 00:00:00 2001
From: Jan Marjanovic &amp;lt;jan.marjanovic@outlook.com&amp;gt;
Date: Thu, 10 Jun 2021 19:20:25 +0200
Subject: [PATCH] Enable CCI debug (NIDEN and SPINDEN on CCI-400)

&lt;span class="gd"&gt;---&lt;/span&gt;
 lib/sw_apps/zynqmp_fsbl/src/xfsbl_hooks.c | 39 +++++++++++++++++++++--
 1 file changed, 36 insertions(+), 3 deletions(-)

&lt;span class="gh"&gt;diff --git a/lib/sw_apps/zynqmp_fsbl/src/xfsbl_hooks.c b/lib/sw_apps/zynqmp_fsbl/src/xfsbl_hooks.c&lt;/span&gt;
&lt;span class="gh"&gt;index 80a1314203..b0030a1d67 100644&lt;/span&gt;
&lt;span class="gd"&gt;--- a/lib/sw_apps/zynqmp_fsbl/src/xfsbl_hooks.c&lt;/span&gt;
&lt;span class="gi"&gt;+++ b/lib/sw_apps/zynqmp_fsbl/src/xfsbl_hooks.c&lt;/span&gt;
&lt;span class="gu"&gt;@@ -64,13 +64,46 @@ u32 XFsbl_HookAfterBSDownload(void )&lt;/span&gt;
 }
 #endif

&lt;span class="gi"&gt;+static void print_el(void) {&lt;/span&gt;
&lt;span class="gi"&gt;+   register uint64_t x0 __asm__ (&amp;quot;x0&amp;quot;);&lt;/span&gt;
&lt;span class="gi"&gt;+   __asm__ (&amp;quot;mrs x0, CurrentEL;&amp;quot; : : : &amp;quot;%x0&amp;quot;);&lt;/span&gt;
&lt;span class="gi"&gt;+   XFsbl_Printf(DEBUG_PRINT_ALWAYS, &amp;quot;EL = %x\r\n&amp;quot;, x0 &amp;gt;&amp;gt; 2);&lt;/span&gt;
&lt;span class="gi"&gt;+}&lt;/span&gt;
&lt;span class="gi"&gt;+&lt;/span&gt;
&lt;span class="gi"&gt;+static void cci_reg_dump(void) {&lt;/span&gt;
&lt;span class="gi"&gt;+   // offsets from UG1087&lt;/span&gt;
&lt;span class="gi"&gt;+   uint64_t offsets[] = {0, 0x10, 0x14, 0x18, 0x1c, 0x40};&lt;/span&gt;
&lt;span class="gi"&gt;+&lt;/span&gt;
&lt;span class="gi"&gt;+   XFsbl_Printf(DEBUG_PRINT_ALWAYS, &amp;quot;CCI_REG: register dump\r\n&amp;quot;);&lt;/span&gt;
&lt;span class="gi"&gt;+&lt;/span&gt;
&lt;span class="gi"&gt;+   for (int i = 0; i &amp;lt; sizeof(offsets)/sizeof(*offsets); i++) {&lt;/span&gt;
&lt;span class="gi"&gt;+       uint64_t offs = offsets[i];&lt;/span&gt;
&lt;span class="gi"&gt;+       u32 val = XFsbl_In32(XPAR_PSU_CCI_REG_S_AXI_BASEADDR + offs);&lt;/span&gt;
&lt;span class="gi"&gt;+       XFsbl_Printf(DEBUG_PRINT_ALWAYS, &amp;quot;  offset %x = %x\r\n&amp;quot;,&lt;/span&gt;
&lt;span class="gi"&gt;+               offs, val);&lt;/span&gt;
&lt;span class="gi"&gt;+   }&lt;/span&gt;
&lt;span class="gi"&gt;+}&lt;/span&gt;
&lt;span class="gi"&gt;+&lt;/span&gt;
&lt;span class="gi"&gt;+static void cci_reg_debug_enable(void) {&lt;/span&gt;
&lt;span class="gi"&gt;+   const uint64_t OFFS_CCI_MISC_CTRL = 0x40;&lt;/span&gt;
&lt;span class="gi"&gt;+&lt;/span&gt;
&lt;span class="gi"&gt;+   const uint32_t CCI_MISC_CTRL_NIDEN_MASK = 0x2;&lt;/span&gt;
&lt;span class="gi"&gt;+   const uint32_t CCI_MISC_CTRL_SPIDEN_MASK = 0x1;&lt;/span&gt;
&lt;span class="gi"&gt;+&lt;/span&gt;
&lt;span class="gi"&gt;+   XFsbl_Printf(DEBUG_PRINT_ALWAYS, &amp;quot;CCI_REG: debug enable\r\n&amp;quot;);&lt;/span&gt;
&lt;span class="gi"&gt;+&lt;/span&gt;
&lt;span class="gi"&gt;+   XFsbl_Out32(XPAR_PSU_CCI_REG_S_AXI_BASEADDR + OFFS_CCI_MISC_CTRL,&lt;/span&gt;
&lt;span class="gi"&gt;+           CCI_MISC_CTRL_NIDEN_MASK | CCI_MISC_CTRL_SPIDEN_MASK);&lt;/span&gt;
&lt;span class="gi"&gt;+}&lt;/span&gt;
&lt;span class="gi"&gt;+&lt;/span&gt;
 u32 XFsbl_HookBeforeHandoff(u32 EarlyHandoff)
 {
    u32 Status = XFSBL_SUCCESS;

&lt;span class="gd"&gt;-   /**&lt;/span&gt;
&lt;span class="gd"&gt;-    * Add the code here&lt;/span&gt;
&lt;span class="gd"&gt;-    */&lt;/span&gt;
&lt;span class="gi"&gt;+   print_el();&lt;/span&gt;
&lt;span class="gi"&gt;+   cci_reg_dump();&lt;/span&gt;
&lt;span class="gi"&gt;+   cci_reg_debug_enable();&lt;/span&gt;
&lt;span class="gi"&gt;+   cci_reg_dump();&lt;/span&gt;

    return Status;
 }
&lt;span class="gd"&gt;--&lt;/span&gt;
2.25.1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;The following is then the output of the FSBL with the patch:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;Xilinx Zynq MP First Stage Boot Loader
Release 2020.2   Jun 10 2021  -  19:49:38
Reset Mode      :       System Reset
Platform: Silicon (4.0), Running on A53-0 (64-bit) Processor, Device Name: XCZU3EG
SD0 Boot Mode
PMU Firmware 2020.2     Jun  3 2021   19:28:36
PMU_ROM Version: xpbr-v8.1.0-0
Protection configuration applied
EL = 3
CCI_REG: register dump
  offset 0 = 0
  offset 10 = 0
  offset 14 = 8000003F
  offset 18 = 0
  offset 1C = 0
  offset 40 = 0
CCI_REG: debug enable
CCI_REG: register dump
  offset 0 = 0
  offset 10 = 0
  offset 14 = 8000003F
  offset 18 = 0
  offset 1C = 0
  offset 40 = 3
Exit from FSBL
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;We can note that the register at the offset &lt;code&gt;0x40&lt;/code&gt; has changed from &lt;code&gt;0&lt;/code&gt; to &lt;code&gt;3&lt;/code&gt;,
as we have requested.&lt;/p&gt;
&lt;h2&gt;Lamport's bakery algorithm&lt;/h2&gt;
&lt;p&gt;To provide a good test case for the cache coherent interconnect I have
implemented distributed counting (i.e. multiple workers share one counter),
and the synchronization is provided with &lt;a href="https://github.com/j-marjanovic/chisel-stuff/tree/master/example-10-lamports-bakery-algorithm"&gt;Lamport's bakery algorithm&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;AXI protocol itself provides a possibility (&lt;code&gt;AxLOCK&lt;/code&gt; signals for exclusive
access) to perform atomic operations on memory and devices, but Lamport's
algorithm does not require any special locking primitives, &lt;a href="http://lamport.azurewebsites.net/pubs/pubs.html#bakery"&gt;only atomic reads
and writes&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img alt="IPs connected to HPC ports on Zynq MPSoC" src="www.j-marjanovic.io/images/2021_cci_part_1/vivado.png" style="width:80%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;In the Vivado design I have connected two &lt;code&gt;LamportsBakeryAlgo&lt;/code&gt; IPs to both
HPC ports. Each IP can be configured to run for a defined number of loops,
and in each loop the counter will be incremented by 1. For each loop we
expect to see 5 writes in total:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;1 write for setting the entering flag&lt;/li&gt;
&lt;li&gt;1 write for setting the number&lt;/li&gt;
&lt;li&gt;1 write for clearing the entering flag&lt;/li&gt;
&lt;li&gt;1 write to do the actual work (increment the counter in this example)&lt;/li&gt;
&lt;li&gt;1 write to clear the number (i.e. release the lock)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Similarly, we can expect the following to be the lower limit of the number of reads
- if the flag is set the algorithm continues to poll it until it is cleared. In
this highly-contended example we expect the numbers of reads to be much higher.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;N reads for the &lt;code&gt;maximum()&lt;/code&gt; function&lt;/li&gt;
&lt;li&gt;N reads in the inner loop to check the entering flag&lt;/li&gt;
&lt;li&gt;N reads in the inner loop to check the number variable&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Performance counters usage example&lt;/h1&gt;
&lt;p&gt;With the Performance Memory Unit enabled we can now start monitoring the values
of the counters with the &lt;code&gt;perf&lt;/code&gt; tool.&lt;/p&gt;
&lt;p&gt;I ran the two &lt;code&gt;LamportsBakeryAlgo&lt;/code&gt; IP, each programmed to perform 100000 loops.
In addition, there is a third instance of Lamport's bakery algorithm, this
one running in the software and accessing the same memory locations as the two
FPGA implementations.&lt;/p&gt;
&lt;p&gt;In parallel, I ran &lt;code&gt;perf stat&lt;/code&gt; and selected the following events: read requests
(&lt;code&gt;rrq&lt;/code&gt;) handshakes (&lt;code&gt;hs&lt;/code&gt;) and write request (&lt;code&gt;wr&lt;/code&gt;) handshakes on subordinate
ports 0 and 3. From the
&lt;a href="https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf"&gt;UG1085&lt;/a&gt;
we can see that the APU is connected to port 3 on the CCI-400 and the HPC ports
are connected to port 0 on the interconnect.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;root@u96v2-sbc:~#&lt;/span&gt; perf stat -e &lt;span class="se"&gt;\&lt;/span&gt;
&amp;gt; CCI_400_r1/cycles/,&lt;span class="se"&gt;\&lt;/span&gt;
&amp;gt; CCI_400_r1/si_rrq_hs_any,source&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;/,&lt;span class="se"&gt;\&lt;/span&gt;
&amp;gt; CCI_400_r1/si_wrq_hs_any,source&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;/,&lt;span class="se"&gt;\&lt;/span&gt;
&amp;gt; CCI_400_r1/si_rrq_hs_any,source&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;3&lt;/span&gt;/,&lt;span class="se"&gt;\&lt;/span&gt;
&amp;gt; CCI_400_r1/si_wrq_hs_any,source&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;3&lt;/span&gt;/

&lt;span class="go"&gt;^C&lt;/span&gt;
&lt;span class="go"&gt; Performance counter stats for &amp;#39;system wide&amp;#39;:&lt;/span&gt;

&lt;span class="go"&gt;        1103648324      CCI_400_r1/cycles/&lt;/span&gt;
&lt;span class="go"&gt;           3207971      CCI_400_r1/si_rrq_hs_any,source=0/&lt;/span&gt;
&lt;span class="go"&gt;           1000000      CCI_400_r1/si_wrq_hs_any,source=0/&lt;/span&gt;
&lt;span class="go"&gt;           3158786      CCI_400_r1/si_rrq_hs_any,source=3/&lt;/span&gt;
&lt;span class="go"&gt;            636417      CCI_400_r1/si_wrq_hs_any,source=3/&lt;/span&gt;

&lt;span class="go"&gt;       4.138963536 seconds time elapsed&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;We see that expected 1000000 writes from the HPC ports (= 2 (instances) * 5 (per
loop) * 100000 (number of loops)) and we also see the number of reads on the HPS
port matches our expectation to be equal or greater than 1800000 (= 2 (total
instances) * 3 * 3 (total instances = N) * 100000 (number of loops)).&lt;/p&gt;
&lt;p&gt;Port 3 (APU port) is also used to load the program from
the main memory or the SD card which slightly obscures the number of
transactions performed by the algorithm itself, but the numbers match our
expectations.&lt;/p&gt;
&lt;h2&gt;Limitations&lt;/h2&gt;
&lt;p&gt;It seems that the PMU in the CCI-400 does not provide all facilities needed for
&lt;code&gt;perf record&lt;/code&gt; to work properly, it reports &lt;code&gt;PMU Hardware doesn't support
sampling/overflow-interrupts. Try 'perf stat'&lt;/code&gt;.&lt;/p&gt;
&lt;h1&gt;Conclusion&lt;/h1&gt;
&lt;p&gt;In this post we have seen how tools commonly used in performance engineering
can also be used to observe the behavior and performance of the FPGA. With the
proliferation of heterogeneous architectures, having good tools to observe
(and debug) the interfaces between individual components provides additional
insight into the system.&lt;/p&gt;
&lt;p&gt;It is yet to be explored if shouting at the Zynq MPSoC has any impact
on the &lt;a href="https://www.youtube.com/watch?v=tDacjrSCeq4"&gt;performance in terms of
latency&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;div style="font-size: 80%;" &gt;
Xilinx, Inc. Xilinx, the Xilinx logo, Vivado, Zynq are trademarks of Xilinx in the United States and
other countries.
&lt;/div&gt;

&lt;div style="font-size: 80%;" &gt;
AMBA, ARM, Cortex and TrustZone are registered trademarks of ARM Limited (or its
subsidiaries) in the EU and/or elsewhere. CoreLink is  a trademark of ARM
Limited (or its subsidiaries) in the EU and/or elsewhere.
&lt;/div&gt;

&lt;div style="font-size: 80%;" &gt;
All trademarks and registered trademarks are the property of their respective owners.
&lt;/div&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="FPGA, Zynq, Cache-Coherence"></category></entry><entry><title>Stratix V accelerator card from eBay, part 7</title><link href="www.j-marjanovic.io/stratix-v-accelerator-card-from-ebay-part-7.html" rel="alternate"></link><published>2021-05-15T13:30:00+02:00</published><updated>2021-05-15T13:30:00+02:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2021-05-15:www.j-marjanovic.io/stratix-v-accelerator-card-from-ebay-part-7.html</id><summary type="html">&lt;p&gt;The reverse engineering of the FPGA accelerator card from eBay is progressing
well and documenting the DDR3 interface was a huge step forward. However, there
is still one important part of the board which is not fully uncovered, the PCI
Express® interface.&lt;/p&gt;
&lt;h1&gt;Hardware&lt;/h1&gt;
&lt;h2&gt;Pikes Peak&lt;/h2&gt;
&lt;p&gt;Pikes Peak is a Stratix …&lt;/p&gt;</summary><content type="html">&lt;p&gt;The reverse engineering of the FPGA accelerator card from eBay is progressing
well and documenting the DDR3 interface was a huge step forward. However, there
is still one important part of the board which is not fully uncovered, the PCI
Express® interface.&lt;/p&gt;
&lt;h1&gt;Hardware&lt;/h1&gt;
&lt;h2&gt;Pikes Peak&lt;/h2&gt;
&lt;p&gt;Pikes Peak is a Stratix® V-based FPGA accelerator conforming to &lt;a href="https://www.opencompute.org/documents/microsoft-ocs-v2-tray-mezzanine"&gt;Open CloudServer
OCS Tray Mezzanine
Specification&lt;/a&gt;.
It connects to the motherboard with a 160-pin connector which provides power,
management, and a 16-lane-wide PCIe interface. With the resistors on the
&lt;code&gt;PCIE_CFG_ID&lt;/code&gt;, Pikes Peak indicates 2 x8 bifurcation.&lt;/p&gt;
&lt;p&gt;I have explored the various details of this board in previous posts on my blog.
Important for today's discussion is &lt;a href="stratix-v-accelerator-card-from-ebay-part-2.html"&gt;the part
2&lt;/a&gt;, where using a &lt;a href="https://github.com/j-marjanovic/ocs-tray-mezzanine-adapter"&gt;custom HW
adapter&lt;/a&gt; I was able
to establish the connection between the PC and the FPGA using the factory image
on the board (stored in on-board Flash).&lt;/p&gt;
&lt;p&gt;The part number of the device (&lt;code&gt;5SGSD5&lt;/code&gt;) indicates that it contains &lt;a href="https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/stratix-v/stx5_51001.pdf"&gt;one
PCIe hard IP block&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;At that time I have assumed that one link of 8 PCIe lanes is connected to
the PCIe hard IP block and that another x8 link is connected to 8 transceivers
on the other side of the device, maybe with the PCIe IP implemented in logic.
Later we will explore why this assumption is not correct.&lt;/p&gt;
&lt;h2&gt;Storey Peak&lt;/h2&gt;
&lt;p&gt;Storey Peak contains (roughly) the same components as Pikes Peak but in a
"normal" PCIe card form factor. For me, as a hobbyist who is not very keen on
designing custom HW and prefers to stay in the FPGA domain, this form factor is
much more convenient.&lt;/p&gt;
&lt;p&gt;I have presented the main components of this board in a series of tweets a
couple of weeks ago:&lt;/p&gt;
&lt;p&gt;&lt;blockquote class="twitter-tweet" align="center"&gt;&lt;a href="https://twitter.com/janmarjanovic/status/1383876145821589505"&gt;Tweet of janmarjanovic/1383876145821589505&lt;/a&gt;&lt;/blockquote&gt;&lt;/p&gt;
&lt;p&gt;An attentive reader will note the difference between the PCIe devices IDs on
Storey Peak and Pikes Peak: Store Peak reports &lt;code&gt;0xb101&lt;/code&gt; while Pikes Peak reports
&lt;code&gt;0xb100&lt;/code&gt;.&lt;/p&gt;
&lt;h1&gt;First test&lt;/h1&gt;
&lt;p&gt;I have
&lt;a href="https://github.com/j-marjanovic/otma-fpga-bringup/commit/da95ff009cd1d9d0664b099894da73b6e59f4eae"&gt;instantiated&lt;/a&gt;
the &lt;em&gt;Avalon-MM Stratix V Hard IP for PCI Express&lt;/em&gt; and connected it to a couple
of peripherals (GPIO, System ID) to provide a minimal example for access
from the computer. I have also connected the Hard IP status interface to the
Nios II processor, which allows me to query the status of the PCIe link over the
JTAG. There is only 1 location where the PCIe lanes can be connected in a
normal &lt;code&gt;5SGSD5&lt;/code&gt; (annotated in light pink in the image below), so I connected the
PCIe lanes there.&lt;/p&gt;
&lt;p&gt;&lt;img alt="PCIe pin locations" src="www.j-marjanovic.io/images/2021_fpga_card_part_7/pcie_pins.png" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;h2&gt;First test with Pikes Peak&lt;/h2&gt;
&lt;p&gt;Once everything was set up I have connected the Pikes Peak board through the x1
extender to one of the slots in my DELL PowerEdge R720.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pikes Peak in a server" src="www.j-marjanovic.io/images/2021_fpga_card_part_7/pp_in_server.jpg" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;The "OTMA adapter" does not properly connect the reset pin to the Pikes Peak
connector, and I had to manually issue a reset command after power-up. Below
is the output from the CLI running on the Nios II processor in FPGA:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;init done

pp_sp_example &amp;gt; pcie status
pcie:
  id = 2c1e57a7
  version = 10000
  status.cur speed   = 0
  status.LTTSM state = 0
  status.lane act    = 0
  status.DL up       = 0

pp_sp_example &amp;gt; clks
Clock counter: ident = 0xc10cc272, version = 0x10000
Clock frequency [0] =  124.999 MHz
Clock frequency [1] =  644.531 MHz
Clock frequency [2] =   99.998 MHz
Clock frequency [3] =   99.998 MHz
Clock frequency [4] =    0.000 MHz
Clock frequency [5] =    0.000 MHz
Clock frequency [6] =    0.000 MHz
Clock frequency [7] =    0.000 MHz

pp_sp_example &amp;gt; pcie reset
pcie: asserting reset...
pcie: deasserted reset...
pcie: reset deasserted

pp_sp_example &amp;gt; pcie status
pcie:
  id = 2c1e57a7
  version = 10000
  status.cur speed   = 2
  status.LTTSM state = f
  status.lane act    = 1
  status.DL up       = 0
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;We see that clocks 2 and 3 are both 100 MHz, which indicates that the PCIe clock
distribution works correctly, and after the reset the link reports:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;current speed: 5 GT/s&lt;/strong&gt; - maximum supported by the IP in this configuration&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;LTSSM state: L0&lt;/strong&gt; - this is the state in which the state machine should be&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;lane active: 1&lt;/strong&gt; - as expected, the USB cable contains only one high-speed differential lane&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is all looks very promising; once the PC boots I am also able to get the
same information from the &lt;code&gt;lspci&lt;/code&gt; utility:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$ sudo lspci -s 03:00 -vv
03:00.0 Non-VGA unclassified device: Altera Corporation Stratix V (rev 01)
    Subsystem: Device 01a2:0001
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast &amp;gt;TAbort- &amp;lt;TAbort- &amp;lt;MAbort- &amp;gt;SERR- &amp;lt;PERR- INTx-
    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin A routed to IRQ 15
    NUMA node: 0
    Region 0: Memory at df800000 (32-bit, non-prefetchable) [size=2M]
&amp;lt;...&amp;gt;
    Capabilities: [80] Express (v2) Endpoint, MSI 00
        DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s &amp;lt;64ns, L1 &amp;lt;1us
            ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 25.000W
        DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq+
            RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
            MaxPayload 128 bytes, MaxReadReq 512 bytes
        DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
        LnkCap: Port #1, Speed 5GT/s, Width x8, ASPM not supported
            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
        LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta: Speed 5GT/s (ok), Width x1 (downgraded)
            TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
&amp;lt;...&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;And finally, I can use &lt;code&gt;pcimem&lt;/code&gt; to read and write to the registers in the FPGA,
e.g. read system ID and timestamp and write to a GPIO pin:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$ sudo ./pcimem /sys/bus/pci/devices/0000:03:00.0/resource0 0x100000 w
/sys/bus/pci/devices/0000:03:00.0/resource0 opened.
Target offset is 0x100000, page size is 4096
mmap(0, 4096, 0x3, 0x1, 3, 0x100000)
PCI Memory mapped to address 0x7f3ecc8ba000.
0x100000: 0x01A20001

$ sudo ./pcimem /sys/bus/pci/devices/0000:03:00.0/resource0 0x100004 w
/sys/bus/pci/devices/0000:03:00.0/resource0 opened.
Target offset is 0x100004, page size is 4096
mmap(0, 4096, 0x3, 0x1, 3, 0x100004)
PCI Memory mapped to address 0x7f65b24b3000.
0x100004: 0x6097B8A9

$ sudo ./pcimem /sys/bus/pci/devices/0000:03:00.0/resource0 0x0 w 1
/sys/bus/pci/devices/0000:03:00.0/resource0 opened.
Target offset is 0x0, page size is 4096
mmap(0, 4096, 0x3, 0x1, 3, 0x0)
PCI Memory mapped to address 0x7fec6c0cc000.
0x0000: 0x00000000
Written 0x0001; readback 0x   1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;The proverbial "blinky" is actually blinking and so far everything seems to be
working correctly.&lt;/p&gt;
&lt;h2&gt;First test with Storey Peak&lt;/h2&gt;
&lt;p&gt;Encouraged by the success with the Pikes Peak board and the adapter PCB, I have
installed the Storey Peak board instead. Compared to the Pikes Peak board and
the cable salad there, this one fits much nicer - both from purely esthetic as
well as thermal point-of-view.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Storey Peak in a server" src="www.j-marjanovic.io/images/2021_fpga_card_part_7/sp_in_server.jpg" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;And here I encountered an unpleasant surprise. Although the clocks are there,
the PCIe link can never reach the &lt;em&gt;L0&lt;/em&gt; state, i.e. the link never finishes
training.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;pp_sp_example &amp;gt; pcie status
pcie:
  id = 2c1e57a7
  version = 10000
  status.cur speed   = 1
  status.LTTSM state = 1a
  status.lane act    = 8
  status.DL up       = 0
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Intel® provides good &lt;a href="https://community.intel.com/t5/FPGA-Wiki/FTA-PCI-express/ta-p/735993"&gt;troubleshooting resources for the PCIe
link&lt;/a&gt;, and
I have decided to have a look at the PIPE interface to see the data exchange
between the FPGA and the CPU.&lt;/p&gt;
&lt;p&gt;&lt;img alt="TS1 ordered sets" src="www.j-marjanovic.io/images/2021_fpga_card_part_7/ts1.png" style="width:80%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;We can decode the data received by the first two transceivers and obtain
the information from the &lt;a href="https://www.oreilly.com/library/view/pci-express-system/0321156307/0321156307_ch14lev1sec5.html"&gt;TS1 ordered sets&lt;/a&gt;:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align="left"&gt;field&lt;/th&gt;
&lt;th align="left"&gt;xcvr 0&lt;/th&gt;
&lt;th align="left"&gt;xcvr 1&lt;/th&gt;
&lt;th&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align="left"&gt;COM&lt;/td&gt;
&lt;td align="left"&gt;0xBC (K28.5)&lt;/td&gt;
&lt;td align="left"&gt;0xBC (K28.5)&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;Link #&lt;/td&gt;
&lt;td align="left"&gt;0x00&lt;/td&gt;
&lt;td align="left"&gt;0x00&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;Lane #&lt;/td&gt;
&lt;td align="left"&gt;0x08&lt;/td&gt;
&lt;td align="left"&gt;0x09&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;N_FTS&lt;/td&gt;
&lt;td align="left"&gt;0x1E&lt;/td&gt;
&lt;td align="left"&gt;0x1E&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;Rate ID&lt;/td&gt;
&lt;td align="left"&gt;0x0E&lt;/td&gt;
&lt;td align="left"&gt;0x0E&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;We see that the first transceiver receives lane number 8 and the second
transceiver receives lane number 9. At this point one can look again at the
board, and realize that the upper 8 lanes are connected to the PCIe Hard IP in
the FPGA.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Card with annotated FPGA" src="www.j-marjanovic.io/images/2021_fpga_card_part_7/annotated.jpg" style="width:70%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;This explains why we see lane 8 on the first transceiver, and also explains
why the link does not get established.&lt;/p&gt;
&lt;h2&gt;Storey Peak in a computer which supports bifurcation&lt;/h2&gt;
&lt;p&gt;Just for a test, I have plugged the Storey Peak card in a computer that
supports 2 x8 bifurcation on a x16 slot.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Bifurcation configuration" src="www.j-marjanovic.io/images/2021_fpga_card_part_7/server_bifurcation.png" style="width:50%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;We can see that in this configuration there are two PCIe endpoints presented:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$ lspci
&amp;lt;...&amp;gt;
86:00.0 Unassigned class [ff00]: Microsoft Corporation Device b100 (rev 01)
87:00.0 Unassigned class [ff00]: Microsoft Corporation Device b101 (rev 01)
&amp;lt;...&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;And each of them supports a x8 link. It is theoretically possible that one of
the endpoints is implemented in FPGA logic, but we will later see that this
is not the case.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$ sudo lspci -s 86:00 -vv
86:00.0 Unassigned class [ff00]: Microsoft Corporation Device b100 (rev 01)
&amp;lt;...&amp;gt;
        LnkCap:    Port #1, Speed 8GT/s, Width x8, ASPM not supported, Exit Latency L0s &amp;lt;4us, L1 &amp;lt;1us
&amp;lt;...&amp;gt;
        LnkSta:    Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
&amp;lt;...&amp;gt;

$ sudo lspci -s 87:00 -vv
87:00.0 Unassigned class [ff00]: Microsoft Corporation Device b101 (rev 01)
&amp;lt;...&amp;gt;
        LnkCap:    Port #1, Speed 8GT/s, Width x8, ASPM not supported, Exit Latency L0s &amp;lt;4us, L1 &amp;lt;1us
&amp;lt;...&amp;gt;
        LnkSta:    Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
&amp;lt;...&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Unfortunately the DELL R720 which I use in my homelab does not support PCIe
bifurcation.&lt;/p&gt;
&lt;h1&gt;Investigation&lt;/h1&gt;
&lt;p&gt;With some confusion regarding the assignment of PCIe lanes and the number of
hard IP blocks in this device, it is time to carefully check the presentation on
&lt;a href="https://indico.cern.ch/event/822126/contributions/3500184/attachments/1906428/3148591/Catapult_FastML_Fermilab_2019.pdf"&gt;Heterogeneous Computing @
Microsoft&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Shown on slide 14 is the screenshot from Quartus® Chip Planner. Comparing this
screenshot with the Chip Planner view for a "normal" device, we can note that
in the presentation there is an additional PCIe block, which is not present
in a normal device.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Comparison between the two devices" src="www.j-marjanovic.io/images/2021_fpga_card_part_7/msft_presentation.png" style="width:80%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;h2&gt;Web search&lt;/h2&gt;
&lt;p&gt;There were a couple of speculation on &lt;a href="https://twitter.com/rombik_su/status/1257741366144180226"&gt;Twitter by
@rombik_su&lt;/a&gt; and
&lt;a href="https://www.reddit.com/r/FPGA/comments/7g0f80/altera_part_lookup/"&gt;reddit&lt;/a&gt; on
what this part might be, and we can see that the last part of the code (reserved
for &lt;em&gt;Special order devices&lt;/em&gt;) contains &lt;strong&gt;AC&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&lt;blockquote class="twitter-tweet" align="center"&gt;&lt;a href="https://twitter.com/rombik_su/status/1257741366144180226"&gt;Tweet of rombik_su/1257741366144180226&lt;/a&gt;&lt;/blockquote&gt;&lt;/p&gt;
&lt;p&gt;Searching for the full code (&lt;strong&gt;5SGSKF40I3LNAC&lt;/strong&gt;) also finds a &lt;a href="https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/pcn/pdn2007.pdf"&gt;PDN from
Intel&lt;/a&gt;
with a comment that the device is available "Upon Request, Please Contact Sales".&lt;/p&gt;
&lt;p&gt;There is also &lt;a href="https://community.intel.com/t5/Programmable-Devices/FA-request-of-3-PCS-FBGA-AJ000400T01-ALTERA-Mfr-P-N/td-p/1204542"&gt;a
thread&lt;/a&gt;
on the Intel forum by somebody from Quanta who has a problem with a post-production
check on this device.&lt;/p&gt;
&lt;h2&gt;Bitstream analysis&lt;/h2&gt;
&lt;p&gt;At this point it is pretty clear that this device is a custom part with 2 PCIe
Hard IPs. I was curious if I am able to find the evince of the second IP
in the bitstream.&lt;/p&gt;
&lt;p&gt;I have used the same approach as for the DDR3 controller: by changing one
variable at the time I was able to isolate the address in the JIC file which
corresponds to the individual PCIe IP parameter. By repeating this procedure I
was able to extract addresses for several parameters, especially interesting are
the ones that can be observed from the outside, such as vendor ID, device ID,
BAR type and size, ...&lt;/p&gt;
&lt;p&gt;After I have managed to find the location of several parameters, I have
extracted the values from the original JIC. As expected, one can easily
see Microsoft® vendor ID (&lt;code&gt;0x1414&lt;/code&gt;) and the device ID (&lt;code&gt;0xb100&lt;/code&gt;):&lt;/p&gt;
&lt;p&gt;&lt;img alt="Parameters extracted from the bitstream" src="www.j-marjanovic.io/images/2021_fpga_card_part_7/bitstream_extract.png" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;I have then taken the bits which correspond to the PCIe block on the left side
and performed a weighted autocorrelation on the bitstream. The procedure
and the result can be found in a Jupyter &lt;a href="https://github.com/j-marjanovic/otma-pin-re/blob/master/scripts/pcie_analysis/02_autocorrelation.ipynb"&gt;notebook&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;One can see that there is a strong peak very close to the original
configuration: the second configuration differs by just 1 bit. This is exactly
what one can expect since we know that the two endpoints have different device
IDs (&lt;code&gt;0xb100&lt;/code&gt; and &lt;code&gt;0xb101&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;&lt;img alt="Auto-correlation of the original IP configuration" src="www.j-marjanovic.io/images/2021_fpga_card_part_7/auto_corr.png" style="width:70%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;With this we can confirm that there is a configuration for the second Hard IP
stored in the factory image in the on-board Flash.&lt;/p&gt;
&lt;h1&gt;Conclusion&lt;/h1&gt;
&lt;p&gt;It is now clear that this board contains a custom part with two PCIe hard IP
blocks. Unfortunately, the block which is available in a normal
device is connected to the high lanes (lanes 8-15), and this requires at minimum
some support from the motherboard to be useful.&lt;/p&gt;
&lt;p&gt;There are several possibilities on how to proceed here:&lt;/p&gt;
&lt;h2&gt;Playing stupid method&lt;/h2&gt;
&lt;p&gt;I tried playing stupid and just instantiated two hard IPs in my design.
Quartus was (as expected), not so easily fooled:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Error message when two hard IPs are instantiated with a normal part" src="www.j-marjanovic.io/images/2021_fpga_card_part_7/2_ips.png" style="width:70%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;h2&gt;Riser cable&lt;/h2&gt;
&lt;p&gt;One could take a x16 riser cable and resolder the wires to swap the two x8
connections on the connector. This will only provide 8 lanes, as the second
hard IP cannot be used (yet).&lt;/p&gt;
&lt;h2&gt;Upgrading my homelab server&lt;/h2&gt;
&lt;p&gt;Is somebody interested in buying a DELL PowerEdge R720? :)&lt;/p&gt;
&lt;h2&gt;Soft PCIe IP&lt;/h2&gt;
&lt;p&gt;One could think about developing a soft IP and connect it directly to the
transceivers. The development effort here would be enormous since Physical,
Data, and Transaction Layers need to be handled in the logic. On the other hand,
this could also be a great learning experience to get familiar with the lower
layers of protocol, which are usually handled by a dedicated PCIe block.
Additionally, one would have to use few advanced features of the transceiver
itself (rate change, 8b/10b and 128b/130b decoder, receiver detection, lane
bonding, ...).&lt;/p&gt;
&lt;h2&gt;Manipulating the bitstream&lt;/h2&gt;
&lt;p&gt;Since we have the configuration stored in the factory bitstream, one could
consider "pasting" this configuration on top of a normal bitstream. This
method does not seem user-friendly, and the main obstacle would be connecting
the user logic to the second (unknown) hard IP.&lt;/p&gt;
&lt;h2&gt;Unlocking the custom part in Quartus&lt;/h2&gt;
&lt;p&gt;The database for this custom part is either already integrated into Quartus or
it is delivered to the customer as a separate package (e.g. a DLL or a shared
object).&lt;/p&gt;
&lt;p&gt;If one &lt;code&gt;grep&lt;/code&gt;s for &lt;code&gt;5SGSMD5&lt;/code&gt; in the Quartus directory there is a library that
contains a couple of interesting strings:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$ strings libddb_dstr.so | egrep &amp;#39;5SGSMD5&amp;#39;
&amp;lt;...&amp;gt;
5SGSMD5H3F35I3LNAA
&amp;lt;...&amp;gt;
5SGSMD5K3F40I3YY
&amp;lt;...&amp;gt;
5SGSMD5_MS
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;So far hard work and determination have paid off. It seems that continuing
to poke around Quartus libraries would be the right way ahead.&lt;/p&gt;
&lt;hr&gt;
&lt;div style="font-size: 80%;" &gt;
Intel, the Intel logo, Altera, Nios, Quartus and Stratix words and logos are
trademarks of Intel  Corporation  or  its subsidiaries  in  the  U.S.  and/or
other  countries.
&lt;/div&gt;

&lt;div style="font-size: 80%;" &gt;
Microsoft® is a registered trademark of Microsoft Corporation in the United
States and/or other countries.
&lt;/div&gt;

&lt;div style="font-size: 80%;" &gt;
PCI Express® and PCIe® are registered trademarks of PCI-SIG.
&lt;/div&gt;

&lt;div style="font-size: 80%;" &gt;
All trademarks and registered trademarks are the property of their respective owners.
&lt;/div&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="FPGA"></category></entry><entry><title>Stratix V accelerator card from eBay, part 6</title><link href="www.j-marjanovic.io/stratix-v-accelerator-card-from-ebay-part-6.html" rel="alternate"></link><published>2021-02-28T13:00:00+01:00</published><updated>2021-02-28T13:00:00+01:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2021-02-28:www.j-marjanovic.io/stratix-v-accelerator-card-from-ebay-part-6.html</id><summary type="html">&lt;p&gt;In my &lt;a href="./stratix-v-accelerator-card-from-ebay-part-5.html"&gt;last blog post&lt;/a&gt; I
have presented a method to extract the DDR3 pinout from the bitstream, obtained
from the on-board Flash. In this post I will use the information obtained from
the bitstream to instantiate a DDR3 memory controller and measure its
performance.&lt;/p&gt;
&lt;h1&gt;DDR3 SDRAM Controller with UniPHY …&lt;/h1&gt;</summary><content type="html">&lt;p&gt;In my &lt;a href="./stratix-v-accelerator-card-from-ebay-part-5.html"&gt;last blog post&lt;/a&gt; I
have presented a method to extract the DDR3 pinout from the bitstream, obtained
from the on-board Flash. In this post I will use the information obtained from
the bitstream to instantiate a DDR3 memory controller and measure its
performance.&lt;/p&gt;
&lt;h1&gt;DDR3 SDRAM Controller with UniPHY&lt;/h1&gt;
&lt;p&gt;Once the pinout was obtained from the bitstream and the DDR3 part number was
identified from the markings on the board, instantiating the DDR3 controller was
relatively straightforward. To speed up the development, I first started with
just only &lt;a href="https://github.com/j-marjanovic/otma-fpga-bringup/commit/ca66f8ec1a54b9ec2a499152ac7027b3861588ad"&gt;1 DDR3 chip (8-bit
width)&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The first sign of the success were the LEDs indicating that the DDR3
initialization controller has finished its job and that the calibration
(of the delay chains) was successfully completed.&lt;/p&gt;
&lt;p&gt;&lt;img alt="LEDs indicating a success of the DDR3 controller initialization procedure" src="www.j-marjanovic.io/images/2021_fpga_card_part_6/ddr3_calib_done.jpg" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;h1&gt;EMIF Debug Toolkit&lt;/h1&gt;
&lt;p&gt;As mentioned in &lt;a href="./stratix-v-accelerator-card-from-ebay-part-3.html"&gt;part 3&lt;/a&gt; of
this series, I have spent quite some time making the JTAG work. OpenOCD was able
to detect the device in the JTAG chain, but I wanted to have a native
integration with Intel tools.&lt;/p&gt;
&lt;p&gt;The efforts have now paid off, as I was able to use &lt;a href="https://www.intel.com/content/www/us/en/programmable/quartushelp/13.0/mergedProjects/program/syscon/syscon_about_emi_toolkit.htm"&gt;External Memory InterFace
Debug
Toolkit&lt;/a&gt;
to inspect the calibration results and obtain other internal data from the DDR3
memory controller.&lt;/p&gt;
&lt;h2&gt;Single bank&lt;/h2&gt;
&lt;p&gt;Presented in the following two figures are the so-called margin reports for the
read and write cycles. At the start-up, the controller performs a series of
reads and writes, and measures the data valid window. This information can be
then retrieved over the JTAG (with EMIF Debug Toolkit).&lt;/p&gt;
&lt;p&gt;&lt;img alt="Margin report for single bank read" src="www.j-marjanovic.io/images/2021_fpga_card_part_6/single_bank_read.png" style="width:30%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;&lt;img alt="Margin report for single bank write" src="www.j-marjanovic.io/images/2021_fpga_card_part_6/single_bank_write.png" style="width:30%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;h2&gt;Full interface&lt;/h2&gt;
&lt;p&gt;Once it was clear that the interface works correctly (and that the
reverse-engineering procedure described in my previous blog post works),
I extended the interface to the &lt;a href="https://github.com/j-marjanovic/otma-fpga-bringup/commit/59328a21fda38882297f030fe3ed7cc28ba5b509"&gt;full 72-bits and enabled the ECC&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Margin report for full interface read" src="www.j-marjanovic.io/images/2021_fpga_card_part_6/full_if_read.png" style="width:30%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;&lt;img alt="Margin report for full interface write" src="www.j-marjanovic.io/images/2021_fpga_card_part_6/full_if_write.png" style="width:30%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;The margin report from the EMIF Debug Toolkit unfortunately do not convey
information on whether the results are as expected for the DDR3 interface. The
data valid window is also not presented in a misleading way, with the Unit
Interval (UI) being indicated to be 1400 ps, while in reality it is only 625 ps
(for the operation at 1600 MT/s).&lt;/p&gt;
&lt;p&gt;In a presentation of a Cyclone 10 board with a DDR3 memory controller, Intel
shows a similar result to the one achieved on the Stratix V board. From this I
would assume that the DDR3 interface is configured correctly.&lt;/p&gt;
&lt;iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/nLEcunXRTrs" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen&gt;&lt;/iframe&gt;

&lt;h1&gt;Memory Checker IP&lt;/h1&gt;
&lt;p&gt;At this point we know that all data and control pins are working correctly, but
we have not yet really tested if the address decoding works properly.&lt;/p&gt;
&lt;p&gt;For this task I have developed a small IP core, called &lt;a href="https://github.com/j-marjanovic/chisel-stuff/tree/master/example-9-mem-checker"&gt;Memory
Checker&lt;/a&gt;.
It has an Avalon-MM master interface that can be connected to the DDR3 memory
controller. The IP can be instructed to either populate the memory with one of
the 8 patterns ("all 0s", "all 1s", "walking 1", "walking 0", "alternate",
"8-bit counter", "32-bit counter", "128-bit counter") or to read the content
back and check the content of the memory against the expected value. The results
of the checker are made available to software through the control interface.&lt;/p&gt;
&lt;p&gt;Initially I wanted to use the AXI interface on this core (to make it compatible)
with both Intel and Xilinx tools, but unfortunately I could not make the bursts
work in the Intel Platform Designer (previously known as Qsys) with the
AXI-to-Avalon adapter.&lt;/p&gt;
&lt;p&gt;Memory Checker IP in its natural habitat, connected to the DDR3 memory controller:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Memory Checker" src="www.j-marjanovic.io/images/2021_fpga_card_part_6/mem_checker_ip.png" style="width:70%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;h2&gt;Driver&lt;/h2&gt;
&lt;p&gt;The IP also provides &lt;a href="https://github.com/j-marjanovic/chisel-stuff/blob/master/example-9-mem-checker/ip_cores/mem_checker/mem_checker_sw.tcl"&gt;a driver&lt;/a&gt;,
which gets automatically included in the BSP. The function &lt;code&gt;mem_check()&lt;/code&gt; performs
the memory check and outputs the information on the &lt;code&gt;stdout&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;Output&lt;/h2&gt;
&lt;p&gt;The output from the memory test procedure is presented below. The main
function first confirms that it is talking to the right IP (by comparing
the expected and the real value of the ID register), then retrieves the
configuration of the IP (Avalon-MM interface width, burst length)
and then continues with the test procedure for all eight test modes.
After each test is complete, the result (PASS or FAIL) is printed, and
the throughput is presented.&lt;/p&gt;
&lt;p&gt;To test the entire 4GB of RAM, the entire procedure takes only several seconds,
which is significantly faster than the SW-based memory test, provided as
a part of Nios example design.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=================================================&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="n"&gt;IP&lt;/span&gt; &lt;span class="kt"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mh"&gt;0x3e3c8ec8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10003&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="n"&gt;Avalon&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;64&lt;/span&gt; &lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;burst&lt;/span&gt; &lt;span class="n"&gt;len&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;128&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;all&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PASS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;67108860&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;67108860&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="n"&gt;write&lt;/span&gt; &lt;span class="n"&gt;throughput&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;7816&lt;/span&gt; &lt;span class="n"&gt;MB&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="n"&gt;read&lt;/span&gt; &lt;span class="n"&gt;throughput&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;9662&lt;/span&gt; &lt;span class="n"&gt;MB&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;all&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PASS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;67108860&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;67108860&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="n"&gt;write&lt;/span&gt; &lt;span class="n"&gt;throughput&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;7816&lt;/span&gt; &lt;span class="n"&gt;MB&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="n"&gt;read&lt;/span&gt; &lt;span class="n"&gt;throughput&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;9662&lt;/span&gt; &lt;span class="n"&gt;MB&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;walking&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PASS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;67108860&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;67108860&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="n"&gt;write&lt;/span&gt; &lt;span class="n"&gt;throughput&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;7816&lt;/span&gt; &lt;span class="n"&gt;MB&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="n"&gt;read&lt;/span&gt; &lt;span class="n"&gt;throughput&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;9662&lt;/span&gt; &lt;span class="n"&gt;MB&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;walking&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PASS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;67108860&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;67108860&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="n"&gt;write&lt;/span&gt; &lt;span class="n"&gt;throughput&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;7816&lt;/span&gt; &lt;span class="n"&gt;MB&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="n"&gt;read&lt;/span&gt; &lt;span class="n"&gt;throughput&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;9662&lt;/span&gt; &lt;span class="n"&gt;MB&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;alternate&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PASS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;67108860&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;67108860&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="n"&gt;write&lt;/span&gt; &lt;span class="n"&gt;throughput&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;7816&lt;/span&gt; &lt;span class="n"&gt;MB&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="n"&gt;read&lt;/span&gt; &lt;span class="n"&gt;throughput&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;9662&lt;/span&gt; &lt;span class="n"&gt;MB&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;bit&lt;/span&gt; &lt;span class="n"&gt;counter&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PASS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;67108860&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;67108860&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="n"&gt;write&lt;/span&gt; &lt;span class="n"&gt;throughput&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;7816&lt;/span&gt; &lt;span class="n"&gt;MB&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="n"&gt;read&lt;/span&gt; &lt;span class="n"&gt;throughput&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;9662&lt;/span&gt; &lt;span class="n"&gt;MB&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;bit&lt;/span&gt; &lt;span class="n"&gt;counter&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PASS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;67108860&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;67108860&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="n"&gt;write&lt;/span&gt; &lt;span class="n"&gt;throughput&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;7816&lt;/span&gt; &lt;span class="n"&gt;MB&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="n"&gt;read&lt;/span&gt; &lt;span class="n"&gt;throughput&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;9662&lt;/span&gt; &lt;span class="n"&gt;MB&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;bit&lt;/span&gt; &lt;span class="n"&gt;counter&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PASS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;67108860&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;67108860&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="n"&gt;write&lt;/span&gt; &lt;span class="n"&gt;throughput&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;7816&lt;/span&gt; &lt;span class="n"&gt;MB&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="n"&gt;read&lt;/span&gt; &lt;span class="n"&gt;throughput&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;9662&lt;/span&gt; &lt;span class="n"&gt;MB&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;h1&gt;Performance&lt;/h1&gt;
&lt;p&gt;Although the performance of the HW-accelerated memory test is several times
better than the SW-based memory test, I still wanted to see the throughput which
can be achieved and how it compares to the full bus throughput (i.e. theoretical
maximum).&lt;/p&gt;
&lt;p&gt;From my &lt;a href="https://indico.desy.de/event/23131/contributions/49392/attachments/31928/39944/ard_st3_gigevision.pdf"&gt;previous experience with Xilinx DDR4 controller&lt;/a&gt; I expected to achieve the DDR4 bus utilization of
around 80%.&lt;/p&gt;
&lt;p&gt;Show in the figure below is the read and write throughput for different burst
lengths.&lt;/p&gt;
&lt;p&gt;&lt;img alt="DDR3 throughput" src="www.j-marjanovic.io/images/2021_fpga_card_part_6/plot_ddr3_perf.png" style="width:80%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;It can be noted that the read throughput reaches a reasonable level at the large
bursts. Quite surprisingly, the write level does not reach a reasonable level,
it only reaches 60% of the full bus throughput, and it is also not affected by
the burst length.&lt;/p&gt;
&lt;p&gt;To investigate this further, I have taken two captures of the Avalon interface,
once for the burst length of 4 and once for the burst length of 128.&lt;/p&gt;
&lt;p&gt;Burst length 4:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Avalon interface between Memory Checker and DDR3 controller" src="www.j-marjanovic.io/images/2021_fpga_card_part_6/burst_4_full.png" style="width:100%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;Burst length 128:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Avalon interface between Memory Checker and DDR3 controller" src="www.j-marjanovic.io/images/2021_fpga_card_part_6/burst_128.png" style="width:100%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;With these captures I can calculate what is the interface utilization, i.e. for
what percentage of cycles was &lt;code&gt;waitrequest&lt;/code&gt; signal also being asserted when the
&lt;code&gt;write&lt;/code&gt; signal was being asserted. This number is roughly 60% for both cases,
thus confirming the write throughput measurements from the Memory Checker IP.&lt;/p&gt;
&lt;h1&gt;Conclusion&lt;/h1&gt;
&lt;p&gt;In this blog post I have used the knowledge obtained from reverse-engineering
the bitstream and instantiated the DDR3 controller. To verify the functionality
of the controller and the board itself (it was purchased of eBay for 40 USD
after all), I have used Intel tools (EMIF Debug Toolkit) and developed a Memory
Checker IP. The test successfully ran overnight and did not detect any errors.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;All trademarks and registered trademarks are the property of their respective owners.&lt;/p&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="FPGA"></category></entry><entry><title>Stratix V accelerator card from eBay, part 5</title><link href="www.j-marjanovic.io/stratix-v-accelerator-card-from-ebay-part-5.html" rel="alternate"></link><published>2021-01-10T10:00:00+01:00</published><updated>2021-01-10T10:00:00+01:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2021-01-10:www.j-marjanovic.io/stratix-v-accelerator-card-from-ebay-part-5.html</id><summary type="html">&lt;p&gt;The last piece of the puzzle on the Stratix V board is the DDR3 memory pinout. Once this is
figured out, I can finally start using the board for my developments. It is obvious that this is not
the most simple way to get a cheap FPGA development board, but …&lt;/p&gt;</summary><content type="html">&lt;p&gt;The last piece of the puzzle on the Stratix V board is the DDR3 memory pinout. Once this is
figured out, I can finally start using the board for my developments. It is obvious that this is not
the most simple way to get a cheap FPGA development board, but I generally enjoy the challenges and
also took this opportunity to learn something new. Being able to use a board without proper
documentation is a valuable skill.&lt;/p&gt;
&lt;h1&gt;Introduction&lt;/h1&gt;
&lt;p&gt;Looking at the board from the bottom, we note that the board has 4 ICs on the bottom side and
presumably 5 ICs on the top side. The markings on the chips indicate that this is H5TC4G83BFR-PBA -
a 4 Gb (512M x 8) DDR3 IC. With a total of 9 ICs, the total interface width is 72-bit, which can be
used as a 64-bit interface with ECC.&lt;/p&gt;
&lt;p&gt;&lt;img alt="FPGA and 9 DDR3 ICs" src="www.j-marjanovic.io/images/2021_fpga_card_part_5/fpga_ddr3_ics.jpg" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;Looking at the presentation &lt;a href="https://indico.cern.ch/event/822126/contributions/3500184/attachments/1906428/3148591/Catapult_FastML_Fermilab_2019.pdf"&gt;"Heterogeneous Computing @ Microsoft" from A. Putnam and K.
Ovtcharov&lt;/a&gt;
we can see the board without the heatsink which confirms that there are 5 DDR3 ICs on the top side.
Another important detail that can be derived from this picture is the orientation of the FPGA; the
memory interface is next to the upper right corner, which would correspond to the banks 8A - 8D.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Stratix V board from the top side" src="www.j-marjanovic.io/images/2021_fpga_card_part_5/fpga_orientation.jpg" style="width:40%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;From other sources we also know that there is a 125 MHz clock being feed into the pin &lt;strong&gt;M23&lt;/strong&gt; (in the
IO bank 8D), and a dedicated high-quality clock is a must for a fast DDR3 interface.&lt;/p&gt;
&lt;h1&gt;Strategy&lt;/h1&gt;
&lt;p&gt;The DDR3 memory uses Dynamic On-Die Termination (ODT) for the data (DQ) and data strobe (DQS) pins
but uses external resistors for termination on the address and control lines. On the Stratix V
board, we can observe these resistors under the 5th DDR3 chip on the bottom side of the board.&lt;/p&gt;
&lt;p&gt;Since tracks and vias are also somehow visible, figuring out the pinout between the FPGA and
external resistors would already give us a starting point.&lt;/p&gt;
&lt;p&gt;The data, data strobe, and data mask (DM) pins do not have any external termination resistors, and
since all but 1 IC are placed in the clamshell topology, it would be also impossible to probe it
with an oscilloscope (without physically modifying the board).&lt;/p&gt;
&lt;p&gt;There are a couple of vias for DQ and DQS visible under the 5th IC which can be probed with an
oscilloscope, but this is only a small portion of the interface. A different approach is needed.&lt;/p&gt;
&lt;h2&gt;Configuration Flash (EPCS/EPCQ)&lt;/h2&gt;
&lt;p&gt;The board contains an on-board Flash which is used to configure the FPGA upon boot. We already know
that the board configures itself and exposes the PCIe endpoint, so we know that the Flash was not
erased as a part of the disposal process. It is very likely that the DDR3 controller is also a part
of the bitstream.&lt;/p&gt;
&lt;p&gt;Altera provides a method to &lt;a href="https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/an/an556.pdf"&gt;encrypt the bitstream with a 256-bit AES
algorithm&lt;/a&gt;,
with the key which is either stored in the fuses inside the device or in a volatile memory, backed
by a battery.&lt;/p&gt;
&lt;p&gt;Since the boards were never meant to be distributed to customers and were originally only used
inside the data center, and since the management of the keys and setting up the provisioning
workflow are non-trivial tasks, I would assume/hope that the bitstream stored in the Flash
memory is not encrypted.&lt;/p&gt;
&lt;p&gt;If the bitstream is not encrypted, then it should be possible to extract the pinout from the
bitstream. Since the data pins in a DDR3 memory interface can be swapped around freely (one only
needs to observe that the grouping between the data strobe and data), it is most likely enough to
classify the I/O standard of the pins in question.&lt;/p&gt;
&lt;p&gt;For &lt;code&gt;DQ&lt;/code&gt; pins we can expect bidir SSTL, for &lt;code&gt;DM&lt;/code&gt; SSTL output, for &lt;code&gt;DQS&lt;/code&gt; differential bidir SSTL, ...&lt;/p&gt;
&lt;h1&gt;HW inspection&lt;/h1&gt;
&lt;p&gt;We start our exploration by inspecting the hardware itself. One can find some &lt;a href="http://virtlab.occamlab.com/home/zapisnik/microsoft-catapult-v2"&gt;notes from VirtLab&lt;/a&gt; where a couple of pins are already mapped; this offers a good start.&lt;/p&gt;
&lt;p&gt;I prepared a small FPGA project where pins are transmitting their location as an UART message - e.g.
pin K21 will continuously transmit "K21 ". This allows quick determination of which pins are
connected to which resistors with an oscilloscope.&lt;/p&gt;
&lt;p&gt;Annotated on the picture below are the FPGA pins which can be detected on the termination resistors
and vias. I have used color-coding to indicate the voltage level; the signals annotated in cyan
have driven the output from rail to rail, which means that there is no external termination while
with the signals in red the effect of the external termination can be observed in the signal level.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Annotated termination resistors and vias" src="www.j-marjanovic.io/images/2021_fpga_card_part_5/ddr3_pinout1.png" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;The annotation can be then transferred to the DDR3 chip pinout - we can see that we have managed
to determine all address and control pins and a couple of data and data strobe pins. This will
be useful as a validation for our future steps.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Annotated DDR3 IC pinout" src="www.j-marjanovic.io/images/2021_fpga_card_part_5/ddr3_pinout2.png" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;h1&gt;Reverse engineering the bitstream&lt;/h1&gt;
&lt;p&gt;At this point we have obtained every piece of information from the HW we can (without more intrusive
procedures such as desoldering the DDR3 chips away). Now it is time to have look at the bitstream
for Startix V.&lt;/p&gt;
&lt;p&gt;The collection of scripts and other resources is available on my GitHub in a repository
&lt;a href="https://github.com/j-marjanovic/otma-pin-re"&gt;otma-pin-re&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;JIC format&lt;/h2&gt;
&lt;p&gt;JIC stands for &lt;a href="https://www.intel.com/content/www/us/en/programmable/quartushelp/17.0/reference/glossary/def_jic.htm"&gt;JTAG Indirect Configuration
file&lt;/a&gt;
which contains both the data to configure the EPCS/EPSQ device itself and the actual FPGA bitstream.
Using Quartus Programmer one can dump the content from the EPCS or EPCQ into the JIC format.&lt;/p&gt;
&lt;p&gt;Wirebond on GitHub already did this and stored the result
&lt;a href="https://github.com/wirebond/catapult_v2_pikes_peak/tree/master/fpga/factory_fw"&gt;here&lt;/a&gt;. I have used
this file for my analysis.&lt;/p&gt;
&lt;h2&gt;EPCQ content&lt;/h2&gt;
&lt;p&gt;As one of the first steps I have plotted the bit density, i.e. how many bits are set in a 1024-bit
block. 1024 is an arbitrary number, chosen to make plotting simple/fast.&lt;/p&gt;
&lt;p&gt;Shown in the figure below are four different JIC files. On the first subplot we have a JIC file
which I have generated from a DDR3 example design. On the second subplot there is a JIC file from
the on-board EPCQ and on the third subplot are two JIC files (one with an offset for the FPGA
bitstream) where the compression was enabled.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Bit density of factory.jic" src="www.j-marjanovic.io/images/2021_fpga_card_part_5/jic_factory.png" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;We can see that the bitstream from the EPCQ (&lt;code&gt;factory.jic&lt;/code&gt;) contains two compressed FPGA images -
the first one immediately after the header and the second one at the half of the memory. Presumably,
the first image is a recovery image (since it is smaller in size) and the second one is an
application image.&lt;/p&gt;
&lt;h3&gt;Checksum&lt;/h3&gt;
&lt;p&gt;Zooming in at the beginning of the bitstream we can note an increase in bit density
every 1188 bits. This is presumably a checksum.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Checksum in the JIC file can be noted visually" src="www.j-marjanovic.io/images/2021_fpga_card_part_5/jic_checksum.png" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;Explored in the notebook
&lt;a href="https://github.com/j-marjanovic/otma-pin-re/blob/master/scripts/bitstream_analysis/02_crc_checksum.ipynb"&gt;bitstream_analysis/02_crc_checksum.ipynb&lt;/a&gt;
is the checksum calculation. It can be confirmed that the checksum is calculated with CRC16 with
Modbus polynomial, as it is mentioned in &lt;a href="https://www.emsec.ruhr-uni-bochum.de/media/attachments/files/2014/11/MA_Swierczynski.pdf"&gt;P. Swierczynski: Security Analysis of the Bitstream
Encryption Scheme of Altera
FPGAs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Being able to calculate a checksum on the data can serve as a confirmation
that one is correctly able to interpret the raw data. This will become
especially important once we try to decompress the bitstream.&lt;/p&gt;
&lt;h3&gt;Compression&lt;/h3&gt;
&lt;p&gt;To save the space in the Flash memory, the bitstream in the EPCQ is compressed
- this can be observed from the increased density in the plots in the previous
section.&lt;/p&gt;
&lt;p&gt;The algorithm for compression (and decompression) is described in the &lt;a href="https://patents.google.com/patent/US6525678"&gt;US patent
6525678&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To give an example, here are two excerpts, the first one from the uncompressed bitstream and the
second one from the compressed.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;227550 | 00 00 b6 f9 81 8b 00 00 2f e8 00 00 00 00 00 00&lt;/span&gt;
&lt;span class="err"&gt;227560 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00&lt;/span&gt;
&lt;span class="err"&gt;*&lt;/span&gt;
&lt;span class="err"&gt;2279f0 | 00 00 00 00 00 00 b6 f9 81 8b 00 00 2f e8 00 00&lt;/span&gt;
&lt;span class="err"&gt;227a00 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00&lt;/span&gt;
&lt;span class="err"&gt;*&lt;/span&gt;
&lt;span class="err"&gt;227e90 | 00 00 00 00 00 00 00 00 00 00 b6 f9 81 8b 00 00&lt;/span&gt;
&lt;span class="err"&gt;227ea0 | 2f e8 00 00 00 00 00 00 00 00 00 00 00 00 00 00&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;225900 | 00 00 6f 9b ff 81 8b f0 2f e8 00 00 00 00 00 00&lt;/span&gt;
&lt;span class="err"&gt;225910 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00&lt;/span&gt;
&lt;span class="err"&gt;*&lt;/span&gt;
&lt;span class="err"&gt;225a30 | 00 6f 9b ff 81 8b f0 2f e8 00 00 00 00 00 00 00&lt;/span&gt;
&lt;span class="err"&gt;225a40 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00&lt;/span&gt;
&lt;span class="err"&gt;*&lt;/span&gt;
&lt;span class="err"&gt;225b60 | 6f 9b ff 81 8b f0 2f e8 00 00 00 00 00 00 00 00&lt;/span&gt;
&lt;span class="err"&gt;225b70 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;We see that the non-compressed bitstream contains a sequence &lt;code&gt;b6&lt;/code&gt;, &lt;code&gt;f9&lt;/code&gt;, ... after a long string of
zeros. In the compressed bitstream the long string of zeros is interrupted by an &lt;code&gt;f&lt;/code&gt; (which
indicates that the next 4 elements should be copied from the compressed bitstream) and followed by
&lt;code&gt;b6&lt;/code&gt;, &lt;code&gt;f9&lt;/code&gt; (as in the original bitstream). The checksum is calculated after the decompression.&lt;/p&gt;
&lt;p&gt;With this out of the way, we have a full bitstream where we can start identifying individual
features.&lt;/p&gt;
&lt;h2&gt;Weak pull-up bit&lt;/h2&gt;
&lt;p&gt;One of the most simple changes regarding an IO pin is turning the weak pull-up resistor on and off.&lt;/p&gt;
&lt;p&gt;I have explored the effect of changing the value of the weak pull-up resistor in the notebook
&lt;a href="https://github.com/j-marjanovic/otma-pin-re/blob/master/scripts/extract_pin_addr/test_01-extract_pin_addr.ipynb"&gt;extract_pin_addr/test_01-extract_pin_addr.ipynb&lt;/a&gt;.
In one of the graphs, we can determine the address of the bit responsible for the weak pull-up, and
the corresponding checksum bits.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Weak Pull-Up configuration bit" src="www.j-marjanovic.io/images/2021_fpga_card_part_5/extract_pu.png" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;With the confirmation that a pull-up is controlled by a single bit, I have decided to use this bit
as a reference address for the rest of the configuration bits. For the classification I use bits
at addresses relative to the pull-up bit, i.e. the pull-up bit has an address 0.&lt;/p&gt;
&lt;p&gt;In the notebook
&lt;a href="https://github.com/j-marjanovic/otma-pin-re/blob/master/scripts/extract_pin_addr/01_get_pu_addr.ipynb"&gt;extract_pin_addr/01_get_pu_addr.ipynb&lt;/a&gt;
I have then determined the address of the weak pull-up bit for all pins in the banks 8A-8D. Those addresses
are then (manually) copied to
&lt;a href="https://github.com/j-marjanovic/otma-pin-re/blob/master/scripts/extract_pin_addr/knowledge.py"&gt;knowledge.py&lt;/a&gt;,
which is then used in subsequent notebooks.&lt;/p&gt;
&lt;h2&gt;Pin configuration&lt;/h2&gt;
&lt;p&gt;The next step is to determine which bits are relevant for the I/O standard on the pin. I have
prepared a lot of bitstreams with pins in different configurations, and then in the notebook
 &lt;a href="https://github.com/j-marjanovic/otma-pin-re/blob/master/scripts/extract_pin_addr/03_extract_io_std.ipynb"&gt;extract_pin_addr/03_extract_io_std.ipynb&lt;/a&gt;
extracted all the bits which are relevant for the IO standard.&lt;/p&gt;
&lt;p&gt;&lt;img alt="I/O standard configuration bits" src="www.j-marjanovic.io/images/2021_fpga_card_part_5/extract_io_std.png" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;h2&gt;Classifier&lt;/h2&gt;
&lt;p&gt;Once I have managed to get a vector of values that are relevant for the I/O pin standard, I have
started to build a classifier. In the first attempt I let the classifier itself figure out which
bits are relevant for which features, but I had better success with a hand-crafted classifier,
presented in the notebook
&lt;a href="https://github.com/j-marjanovic/otma-pin-re/blob/master/scripts/extract_pin_addr/05_classification_manual.ipynb"&gt;extract_pin_addr/05_classification_manual.ipynb&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Validation&lt;/h2&gt;
&lt;p&gt;I have prepared a small project with an 8-bit DDR3 interface and let the classifier classify the
pins. I have purposely places also some address and control pins in the same bank, and left some
of the pins disabled. The classifier correctly classifies all the pins, the results are presented in the
notebook
&lt;a href="https://github.com/j-marjanovic/otma-pin-re/blob/master/scripts/extract_pin_addr/06_validation.ipynb"&gt;extract_pin_addr/06_validation.ipynb&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;Classification&lt;/h2&gt;
&lt;p&gt;At this point we can run the classifier on the decompressed image from the on-board Flash. Since
there are two images (a recovery image and an application image) available, I have decompressed both
and called them &lt;code&gt;factory_decompress0.jic&lt;/code&gt; and &lt;code&gt;factory_decompress1.jic&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;I then ran the classifier on the second image and was amazed by the results. The &lt;a href="https://github.com/j-marjanovic/otma-pin-re/blob/master/scripts/extract_pin_addr/07_application.ipynb"&gt;classifier&lt;/a&gt; managed to find:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;exactly all 72 &lt;code&gt;DQ&lt;/code&gt; pins&lt;/li&gt;
&lt;li&gt;17 (out of 18) &lt;code&gt;DQS&lt;/code&gt; and &lt;code&gt;DQSn&lt;/code&gt; pins,&lt;/li&gt;
&lt;li&gt;10 &lt;code&gt;DM&lt;/code&gt; pins (9 + something which is counted as a &lt;code&gt;DM&lt;/code&gt; but it is something else),&lt;/li&gt;
&lt;li&gt;25 address and control pins (out of 25), and&lt;/li&gt;
&lt;li&gt;5 unknown pins&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;At this point one could investigate why, for example, one of the DQS pins is not detected and improve
the classifier to correctly detect all pins, but I have decided to go ahead and clean-up the results
manually.&lt;/p&gt;
&lt;h2&gt;Clean-up&lt;/h2&gt;
&lt;p&gt;The clean-up was quite straightforward, since one can base the corrections on the pinout which was
determined by physically probing vias and resistors in one of the previous paragraphs.&lt;/p&gt;
&lt;p&gt;Out of 5 unknown pins, two were &lt;code&gt;CK&lt;/code&gt; and &lt;code&gt;CKn&lt;/code&gt;, one was the missing &lt;code&gt;DQSn&lt;/code&gt; pin, one was the 125 MHz
clock and one was the &lt;code&gt;RZQ&lt;/code&gt; pin. One of the pins which was incorrectly classified as &lt;code&gt;DM&lt;/code&gt; pin was
actually a reset pin (L20).&lt;/p&gt;
&lt;p&gt;The full table with the final pinout (&lt;code&gt;DQ&lt;/code&gt; pins are omitted for brevity, check the last notebook)
is presented below.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align="left"&gt;pin&lt;/th&gt;
&lt;th align="left"&gt;dbg&lt;/th&gt;
&lt;th align="left"&gt;class&lt;/th&gt;
&lt;th align="left"&gt;addr&lt;/th&gt;
&lt;th align="left"&gt;control&lt;/th&gt;
&lt;th align="left"&gt;data&lt;/th&gt;
&lt;th align="left"&gt;misc&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align="left"&gt;J27&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='Scls1', term=0, diff=0&lt;/td&gt;
&lt;td align="left"&gt;addr&lt;/td&gt;
&lt;td align="left"&gt;A0&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;J21&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='Scls1', term=0, diff=0&lt;/td&gt;
&lt;td align="left"&gt;addr&lt;/td&gt;
&lt;td align="left"&gt;A1&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;J29&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='Scls1', term=0, diff=0&lt;/td&gt;
&lt;td align="left"&gt;addr&lt;/td&gt;
&lt;td align="left"&gt;A2&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;L28&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='Scls1', term=0, diff=0&lt;/td&gt;
&lt;td align="left"&gt;addr&lt;/td&gt;
&lt;td align="left"&gt;A3&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;P26&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='Scls1', term=0, diff=0&lt;/td&gt;
&lt;td align="left"&gt;addr&lt;/td&gt;
&lt;td align="left"&gt;A4&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;M26&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='Scls1', term=0, diff=0&lt;/td&gt;
&lt;td align="left"&gt;addr&lt;/td&gt;
&lt;td align="left"&gt;A5&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;N25&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='Scls1', term=0, diff=0&lt;/td&gt;
&lt;td align="left"&gt;addr&lt;/td&gt;
&lt;td align="left"&gt;A6&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;P25&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='Scls1', term=0, diff=0&lt;/td&gt;
&lt;td align="left"&gt;addr&lt;/td&gt;
&lt;td align="left"&gt;A7&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;N22&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='Scls1', term=0, diff=0&lt;/td&gt;
&lt;td align="left"&gt;addr&lt;/td&gt;
&lt;td align="left"&gt;A8&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;N26&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='Scls1', term=0, diff=0&lt;/td&gt;
&lt;td align="left"&gt;addr&lt;/td&gt;
&lt;td align="left"&gt;A9&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;K27&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='Scls1', term=0, diff=0&lt;/td&gt;
&lt;td align="left"&gt;addr&lt;/td&gt;
&lt;td align="left"&gt;A10&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;L27&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='Scls1', term=0, diff=0&lt;/td&gt;
&lt;td align="left"&gt;addr&lt;/td&gt;
&lt;td align="left"&gt;A11&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;N27&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='Scls1', term=0, diff=0&lt;/td&gt;
&lt;td align="left"&gt;addr&lt;/td&gt;
&lt;td align="left"&gt;A12&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;M27&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='Scls1', term=0, diff=0&lt;/td&gt;
&lt;td align="left"&gt;addr&lt;/td&gt;
&lt;td align="left"&gt;A13&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;N21&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='Scls1', term=0, diff=0&lt;/td&gt;
&lt;td align="left"&gt;addr&lt;/td&gt;
&lt;td align="left"&gt;A14&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;K28&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='Scls1', term=0, diff=0&lt;/td&gt;
&lt;td align="left"&gt;addr&lt;/td&gt;
&lt;td align="left"&gt;A15&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;J28&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='Scls1', term=0, diff=0&lt;/td&gt;
&lt;td align="left"&gt;addr&lt;/td&gt;
&lt;td align="left"&gt;BA0&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;K21&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='Scls1', term=0, diff=0&lt;/td&gt;
&lt;td align="left"&gt;addr&lt;/td&gt;
&lt;td align="left"&gt;BA1&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;L26&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='Scls1', term=0, diff=0&lt;/td&gt;
&lt;td align="left"&gt;addr&lt;/td&gt;
&lt;td align="left"&gt;BA2&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;L24&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='Scls1', term=0, diff=0&lt;/td&gt;
&lt;td align="left"&gt;addr&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;CASn&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;J23&lt;/td&gt;
&lt;td align="left"&gt;inp=1, out=1, pu=0, io_std='Scls1', term=0, diff=1&lt;/td&gt;
&lt;td align="left"&gt;?&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;CK&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;J24&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='Scls1', term=0, diff=1&lt;/td&gt;
&lt;td align="left"&gt;?&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;CKn&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;K24&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='Scls1', term=0, diff=0&lt;/td&gt;
&lt;td align="left"&gt;addr&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;CKE&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;N23&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='Scls1', term=0, diff=0&lt;/td&gt;
&lt;td align="left"&gt;addr&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;CSn&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;M21&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='Scls1', term=0, diff=0&lt;/td&gt;
&lt;td align="left"&gt;addr&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;ODT&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;L21&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='Scls1', term=0, diff=0&lt;/td&gt;
&lt;td align="left"&gt;addr&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;RASn&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;L20&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='S', term=1, diff=0&lt;/td&gt;
&lt;td align="left"&gt;DM&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;reset&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;P23&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='Scls1', term=0, diff=0&lt;/td&gt;
&lt;td align="left"&gt;addr&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;WEn&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;N33&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='S', term=1, diff=0&lt;/td&gt;
&lt;td align="left"&gt;DM&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;DM0&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;T31&lt;/td&gt;
&lt;td align="left"&gt;inp=1, out=1, pu=0, io_std='S', term=1, diff=0&lt;/td&gt;
&lt;td align="left"&gt;DQ&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;DQ0&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;P34&lt;/td&gt;
&lt;td align="left"&gt;inp=1, out=1, pu=0, io_std='S', term=1, diff=0&lt;/td&gt;
&lt;td align="left"&gt;DQ&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;DQ1&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;N34&lt;/td&gt;
&lt;td align="left"&gt;inp=1, out=1, pu=0, io_std='S', term=1, diff=0&lt;/td&gt;
&lt;td align="left"&gt;DQ&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;DQ4&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;M33&lt;/td&gt;
&lt;td align="left"&gt;inp=1, out=1, pu=0, io_std='S', term=1, diff=0&lt;/td&gt;
&lt;td align="left"&gt;DQ&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;DQ5&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;L34&lt;/td&gt;
&lt;td align="left"&gt;inp=1, out=1, pu=0, io_std='S', term=1, diff=0&lt;/td&gt;
&lt;td align="left"&gt;DQ&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;DQ7&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;N32&lt;/td&gt;
&lt;td align="left"&gt;inp=1, out=1, pu=0, io_std='S', term=1, diff=1&lt;/td&gt;
&lt;td align="left"&gt;DQS&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;DQS0&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;M32&lt;/td&gt;
&lt;td align="left"&gt;inp=1, out=1, pu=0, io_std='S', term=1, diff=1&lt;/td&gt;
&lt;td align="left"&gt;DQS&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;DQS0n&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;M23&lt;/td&gt;
&lt;td align="left"&gt;inp=1, out=0, pu=0, io_std='2V5', term=None, diff=0&lt;/td&gt;
&lt;td align="left"&gt;?&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;125 MHz clk&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;E23&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='S', term=1, diff=1&lt;/td&gt;
&lt;td align="left"&gt;?&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;DQS2n&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;B34&lt;/td&gt;
&lt;td align="left"&gt;inp=1, out=0, pu=0, io_std='2V5', term=None, diff=0&lt;/td&gt;
&lt;td align="left"&gt;?&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;RZQ&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;D21&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='S', term=1, diff=0&lt;/td&gt;
&lt;td align="left"&gt;DM&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;D22&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='S', term=1, diff=0&lt;/td&gt;
&lt;td align="left"&gt;DM&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;D25&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='S', term=1, diff=0&lt;/td&gt;
&lt;td align="left"&gt;DM&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;E27&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='S', term=1, diff=0&lt;/td&gt;
&lt;td align="left"&gt;DM&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;D28&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='S', term=1, diff=0&lt;/td&gt;
&lt;td align="left"&gt;DM&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;E30&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='S', term=1, diff=0&lt;/td&gt;
&lt;td align="left"&gt;DM&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;C34&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='S', term=1, diff=0&lt;/td&gt;
&lt;td align="left"&gt;DM&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;H34&lt;/td&gt;
&lt;td align="left"&gt;inp=0, out=1, pu=0, io_std='S', term=1, diff=0&lt;/td&gt;
&lt;td align="left"&gt;DM&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;&lt;/td&gt;
&lt;td align="left"&gt;8&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h1&gt;Summary&lt;/h1&gt;
&lt;p&gt;In this blog post I have presented the procedure to obtain the DDR3 pinout from the Stratix V board.
I have not yet tested the pinout on the real hardware, but I am pretty confident that it should work
- I will leave this for the next post.&lt;/p&gt;
&lt;p&gt;In general, the procedure was relatively straightforward, but had some obstacles on the way,
for example compression, the not-completely-regular structure of the bitstream, and the checksums.
I was also quite lucky that the bitstream itself was not encrypted, although it is understandable
why this was not the case.&lt;/p&gt;
&lt;p&gt;In the next blog post I plan to verify that the pinout presented here is accurate and perform
some memory tests.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;All trademarks and registered trademarks are the property of their respective owners.&lt;/p&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="FPGA"></category></entry><entry><title>Stratix V accelerator card from eBay, part 4</title><link href="www.j-marjanovic.io/stratix-v-accelerator-card-from-ebay-part-4.html" rel="alternate"></link><published>2020-10-11T16:00:00+02:00</published><updated>2020-10-11T16:00:00+02:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2020-10-11:www.j-marjanovic.io/stratix-v-accelerator-card-from-ebay-part-4.html</id><summary type="html">&lt;p&gt;In the part 4 of the Odyssey with the Startix V board from the eBay, I explore
the QSFP+ port and establish a 40 Gigabit Ethernet connection to a computer
with an Intel X710 40GbE network adapter.&lt;/p&gt;
&lt;h1&gt;HW overview&lt;/h1&gt;
&lt;p&gt;Since this board was designed to run in a datacenter, the …&lt;/p&gt;</summary><content type="html">&lt;p&gt;In the part 4 of the Odyssey with the Startix V board from the eBay, I explore
the QSFP+ port and establish a 40 Gigabit Ethernet connection to a computer
with an Intel X710 40GbE network adapter.&lt;/p&gt;
&lt;h1&gt;HW overview&lt;/h1&gt;
&lt;p&gt;Since this board was designed to run in a datacenter, the connectivity is of
major importance. The board can connect to the host CPU over PCIe x16
connection, and can communicate with other boards and external world over 2
QSFP+ connectors.&lt;/p&gt;
&lt;p&gt;The transceivers in the &lt;a href="https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/stratix-v/stx5_51001.pdf"&gt;Stratix V
GS&lt;/a&gt;
can operate at 14.1 Gbps, which means that the board can run 40 Gigabit Ethernet
(4x10.3125 Gbps), InfiniBand FDR (56 Gbps = 4x14.0 Gbps) and other fast
protocols.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Stratix V board with QSFP cable" src="www.j-marjanovic.io/images/2020_fpga_card_part_4/qsfp_cable.jpg" style="width:80%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;h1&gt;QSFP+ management interface&lt;/h1&gt;
&lt;p&gt;As described in &lt;a href="https://members.snia.org/document/dl/25896"&gt;SFF-8436: Specification for QSFP+ 4X 10 Gb/s Pluggable
Transceiver&lt;/a&gt;, the QSFP module
contains a relatively simple management interface, accessible over an I2C bus.&lt;/p&gt;
&lt;p&gt;First I &lt;a href="https://github.com/j-marjanovic/otma-fpga-bringup/blob/master/software/otma_bringup/src/mini_i2cdetect.c"&gt;implemented a mini equivalent to &lt;code&gt;i2cdetect&lt;/code&gt;&lt;/a&gt; to check if I am probing a
correct I2C bus. As expected, an I2C device with address 0x50 replies to our
request:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt; 0000 |          -- -- -- -- -- -- -- -- -- -- -- -- --&lt;/span&gt;
&lt;span class="err"&gt; 0010 | -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --&lt;/span&gt;
&lt;span class="err"&gt; 0020 | -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --&lt;/span&gt;
&lt;span class="err"&gt; 0030 | -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --&lt;/span&gt;
&lt;span class="err"&gt; 0040 | -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --&lt;/span&gt;
&lt;span class="err"&gt; 0050 | 50 -- -- -- -- -- -- -- -- -- -- -- -- -- -- --&lt;/span&gt;
&lt;span class="err"&gt; 0060 | -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --&lt;/span&gt;
&lt;span class="err"&gt; 0070 | -- -- -- -- -- -- --&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;We can then continue by dumping the content of the module EEPROM:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;rc = 0 | 0d 02 06 00 00 00 00 00 00 00 00 00 00 00 00 00&lt;/span&gt;
&lt;span class="err"&gt;rc = 0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00&lt;/span&gt;
&lt;span class="err"&gt;rc = 0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00&lt;/span&gt;
&lt;span class="err"&gt;rc = 0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00&lt;/span&gt;
&lt;span class="err"&gt;rc = 0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00&lt;/span&gt;
&lt;span class="err"&gt;rc = 0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00&lt;/span&gt;
&lt;span class="err"&gt;rc = 0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00&lt;/span&gt;
&lt;span class="err"&gt;rc = 0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00&lt;/span&gt;
&lt;span class="err"&gt;rc = 0 | 0d 00 23 08 00 00 00 00 00 00 00 05 8d 00 00 00&lt;/span&gt;
&lt;span class="err"&gt;rc = 0 | 00 00 02 a0 4d 65 6c 6c 61 6e 6f 78 20 20 20 20&lt;/span&gt;
&lt;span class="err"&gt;rc = 0 | 20 20 20 20 0f 00 02 c9 36 37 30 37 35 39 2d 42&lt;/span&gt;
&lt;span class="err"&gt;rc = 0 | 32 33 20 20 20 20 20 20 41 31 04 06 09 00 46 27&lt;/span&gt;
&lt;span class="err"&gt;rc = 0 | 00 00 00 00 36 43 32 32 31 32 30 37 36 58 20 20&lt;/span&gt;
&lt;span class="err"&gt;rc = 0 | 20 20 20 20 31 32 30 33 32 37 20 20 00 00 00 64&lt;/span&gt;
&lt;span class="err"&gt;rc = 0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00&lt;/span&gt;
&lt;span class="err"&gt;rc = 0 | 00 00 00 00 00 00 00 00 00 00 02 00 00 30 00 00&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;The cable is correctly identified as Mellanox 670759-B23 cable.&lt;/p&gt;
&lt;h1&gt;40G Ethernet MAC&lt;/h1&gt;
&lt;p&gt;To implement the physical layer of the Ethernet protocol I used &lt;a href="https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/ug/ug_40_100gbe.pdf"&gt;40- and
100-Gbps Ethernet MAC and PHY
MegaCore&lt;/a&gt;.
Without an appropriate license this IP can be evaluated for an hour, which is
not a lot, but enough to test the hardware capabilities of the FPGA board.&lt;/p&gt;
&lt;p&gt;To clock the transceiver part I have used the 644.53125 MHz on-board oscillator
which I have explored in detail in &lt;a href="https://j-marjanovic.io/stratix-v-accelerator-card-from-ebay-part-3.html"&gt;my previous
post&lt;/a&gt;.
The application part is clocked at 312.5 MHz, generated from the same 644 MHz
oscillator.&lt;/p&gt;
&lt;p&gt;In the Qsys I have added a &lt;a href="http://www.altera.com/literature/ug/xcvr_user_guide.pdf"&gt;Transceiver Reconfiguration
Controller&lt;/a&gt; and
connected it to a &lt;em&gt;JTAG to Avalon Master Bridge&lt;/em&gt;, so that I can perform Eye Scan
measurements and change other transceiver configurations.&lt;/p&gt;
&lt;p&gt;With everything in place, I have programmed the bitstream to the FPGA, and the
&lt;code&gt;dmesg&lt;/code&gt; output in Linux on the other side of the link reported some good news,
the link is up:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;[  615.469842] i40e 0000:01:00.0 enp1s0: NIC Link is Up, 40 Gbps Full Duplex, Flow Control: None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;h1&gt;Eye scan&lt;/h1&gt;
&lt;p&gt;Once the link is established, we can use &lt;a href="https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/an/an605.pdf"&gt;Transceiver Toolkit
EyeQ&lt;/a&gt;
to verify the link quality on the receiver. EyeQ circuitry uses an additional
data sampler to sample the receiving data at a time and voltage offset, and
compares those to the one received from the center of the data eye. With
this method, it can determine BER (Bit Error Rate) for each point in the 2D
eye diagram.&lt;/p&gt;
&lt;p&gt;Shown in figures below are the results from eye scan on all 4 lanes.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Eye scan on channel 0" src="www.j-marjanovic.io/images/2020_fpga_card_part_4/eye_rx0.png" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;&lt;img alt="Eye scan on channel 1" src="www.j-marjanovic.io/images/2020_fpga_card_part_4/eye_rx1.png" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;&lt;img alt="Eye scan on channel 2" src="www.j-marjanovic.io/images/2020_fpga_card_part_4/eye_rx2.png" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;&lt;img alt="Eye scan on channel 3" src="www.j-marjanovic.io/images/2020_fpga_card_part_4/eye_rx3.png" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;We can see that the eyes are not open very wide, but on the other hand this is
an expected result with a relatively high data rate and a passive coper cable.
Anyway, the eyes seems to be open wide enough to provide reliable transmission
of the data.&lt;/p&gt;
&lt;h1&gt;Logic Analyzer&lt;/h1&gt;
&lt;p&gt;At this point we can use &lt;em&gt;SystemTap II Logic Analyzer&lt;/em&gt; to capture the received
packets from the 40G Ethernet MAC interface.&lt;/p&gt;
&lt;p&gt;Shown in figure below is the capture with the SignalTap, with the data from an
ARP packet, send from the PC when we want to ping an address.&lt;/p&gt;
&lt;p&gt;&lt;img alt="ARP request captured with SignalTap" src="www.j-marjanovic.io/images/2020_fpga_card_part_4/arp.png" style="width:100%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;An experienced reader will recognize a broadcast MAC address (&lt;code&gt;0xFFFFFFFFFF&lt;/code&gt;),
ARP Ether Type (&lt;code&gt;0x0806&lt;/code&gt;), and some other elements in the ARP packet.&lt;/p&gt;
&lt;h1&gt;Summary&lt;/h1&gt;
&lt;p&gt;In this short blog post I have explored the 40 Gigabit Ethernet on the Stratix V
board. I have managed to establish a link to a computer, and verify the signal
integrity of the received signals.&lt;/p&gt;
&lt;p&gt;For the next post I will prepare a minimal UDP/IPv4 core, and transfer some
UDP packets between the computer and the FPGA.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;All trademarks and registered trademarks are the property of their respective
owners.&lt;/p&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="FPGA"></category></entry><entry><title>Stratix V accelerator card from eBay, part 3</title><link href="www.j-marjanovic.io/stratix-v-accelerator-card-from-ebay-part-3.html" rel="alternate"></link><published>2020-09-06T16:00:00+02:00</published><updated>2020-09-06T16:00:00+02:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2020-09-06:www.j-marjanovic.io/stratix-v-accelerator-card-from-ebay-part-3.html</id><summary type="html">&lt;p&gt;As mentioned in my &lt;a href="https://j-marjanovic.io/stratix-v-accelerator-card-from-ebay-part-2.html"&gt;previous blog post&lt;/a&gt;, the next step would be
to get the JTAG running in Quaruts. In this blog post I describe how I managed
to develop a library as an interface between the FPGA board and Quartus, and
demonstrate the developed interface to download the bitstream …&lt;/p&gt;</summary><content type="html">&lt;p&gt;As mentioned in my &lt;a href="https://j-marjanovic.io/stratix-v-accelerator-card-from-ebay-part-2.html"&gt;previous blog post&lt;/a&gt;, the next step would be
to get the JTAG running in Quaruts. In this blog post I describe how I managed
to develop a library as an interface between the FPGA board and Quartus, and
demonstrate the developed interface to download the bitstream and to debug
the Nios II soft-core processor.&lt;/p&gt;
&lt;h1&gt;Introduction&lt;/h1&gt;
&lt;h2&gt;HW overview&lt;/h2&gt;
&lt;p&gt;The Stratix V board contains a 4-pin USB port (with a non-standard connector),
to which an &lt;a href="https://www.ftdichip.com/Products/ICs/FT232H.htm"&gt;FT232H&lt;/a&gt; is
attached, containing &lt;em&gt;Multi-Protocol Synchronous Serial Engine (MPSSE)&lt;/em&gt; which
can be used to implement the JTAG protocol.&lt;/p&gt;
&lt;p&gt;I am a little bit surprised that the board includes the full JTAG programmer and
not only the 10-pin JTAG header, which can be then used by the developers on
their development setups. On the other hand, the cost of an FT232H is 2.70 EUR
on Mouser for a reel, which is negligible compared to the cost of the board. I
can also understand that having a JTAG debugger available on each board is
valuable when monitoring the operating conditions (e.g. transceiver link quality
with the
&lt;a href="https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/an/an605.pdf"&gt;EyeQ&lt;/a&gt;)
and to debug and investigate the synchronization issues between the servers in
the deployment.&lt;/p&gt;
&lt;p&gt;I was not able to find more information about this particular USB/JTAG
connection on the Open Compute Project website.&lt;/p&gt;
&lt;h2&gt;Quartus software suite&lt;/h2&gt;
&lt;p&gt;Intel® Quartus® provides several programs that are extremely useful for
development for Intel/Altera FPGAs.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Quartus Programmer&lt;/strong&gt; which, as the name suggests, allows
  programming/configuring the FPGA devices over the JTAG chain, and it can also
  program the non-volatile memories attached to the FPGAs.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;System Console&lt;/strong&gt; provides "visibility into your system" - one can use JTAG
  to Avalon MM bridge to read and write the registers in the IPs, and
  &lt;em&gt;Transceiver Toolkit&lt;/em&gt; and &lt;em&gt;External Memory Interface Toolkits&lt;/em&gt; both greatly
  simplify the bringup and debug of transceivers and external memories&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SignalTap&lt;/strong&gt; is an embedded logic analyzer, useful for debugging the logic
  in hardware, with the input from real devices&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Nios II Debugger&lt;/strong&gt; provides access to soft-core Nios II processor, and
  it can be used to download the programs, to debug them through GDB, and to
  obtain the output from the program over the &lt;em&gt;JTAG UART&lt;/em&gt; IP&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To my knowledge OpenOCD does not provide all these features and Quartus also
does not seem to be able to interface to OpenOCD.&lt;/p&gt;
&lt;h2&gt;OpenOCD&lt;/h2&gt;
&lt;p&gt;Nonetheless, OpenOCD can be used to scan the JTAG chain, and confirm
that the FTDI really is connected to the JTAG port of the FPGA:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$ openocd -f interface/ftdi/um232h.cfg -c &lt;span class="s2"&gt;&amp;quot;adapter_khz 100; transport select jtag; jtag newtap auto0 tap -irlen 10 -expected-id 0x029070dd&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
Open On-Chip Debugger &lt;span class="m"&gt;0&lt;/span&gt;.10.0
Licensed under GNU GPL v2
For bug reports, &lt;span class="nb"&gt;read&lt;/span&gt;
    http://openocd.org/doc/doxygen/bugs.html
adapter speed: &lt;span class="m"&gt;100&lt;/span&gt; kHz
Info : clock speed &lt;span class="m"&gt;100&lt;/span&gt; kHz
Info : JTAG tap: auto0.tap tap/device found: 0x029070dd &lt;span class="o"&gt;(&lt;/span&gt;mfg: 0x06e &lt;span class="o"&gt;(&lt;/span&gt;Altera&lt;span class="o"&gt;)&lt;/span&gt;, part: 0x2907, ver: 0x0&lt;span class="o"&gt;)&lt;/span&gt;
Warn : gdb services need one or more targets defined
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;This gives hope that at least on the HW side, integrating the board with Quartus
will be easy.&lt;/p&gt;
&lt;h1&gt;JTAG library&lt;/h1&gt;
&lt;p&gt;Some information on how to add a custom cable can be found on &lt;a href="https://forums.intel.com/s/question/0D50P00003yyL2bSAE/bemicro-and-programming-under-linux"&gt;Intel
forums&lt;/a&gt;.
At the startup, Quartus scans the &lt;code&gt;linux64&lt;/code&gt; folder (if running on 64-bit Linux)
and searches for the shared object files which start with the &lt;code&gt;libjtag_hw.&lt;/code&gt; The
shared object is then loaded, and a function called &lt;code&gt;get_supported_hardware&lt;/code&gt; is
called. This function returns a structure, containing function pointers for
various operations that the programmer and other utilities can perform.&lt;/p&gt;
&lt;p&gt;To understand this (undocumented) interface a little bit better, I first
implemented &lt;a href="https://github.com/j-marjanovic/jtag-quartus-ft232h/blob/master/src/jtag_hw_dummy.cpp"&gt;a library with a dummy JTAG TAP
controller&lt;/a&gt;.
This library can be copied or linked in the &lt;code&gt;linux64&lt;/code&gt; folder, and then used in
Quartus Programmer. The implementation of the JTAG pretends that it is a Stratix
V device (by having the same &lt;code&gt;IDCODE&lt;/code&gt;) and then discards all the bits which are
downloaded to the device. If somebody has too much time, one can easily extend
this to create a very complicated &lt;code&gt;.sof&lt;/code&gt; to &lt;code&gt;.rbf&lt;/code&gt; converter. Finally, the
dummy device fakes the status register to communicate that the &lt;code&gt;CONF_DONE&lt;/code&gt; is
high at the end of the programming. For debugging purposes, the library prints
extensive debug information over a UNIX socket.&lt;/p&gt;
&lt;p&gt;&lt;img alt="A dummy JTAG device being programmed in Quartus Programmer" src="www.j-marjanovic.io/images/2020_fpga_card_part_3/jtag_dummy.png" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;Once this part in place, it is easy to imagine that combining the library
of the dummy JTAG and the HW-related part of the OpenOCD is not so complicated.
Since the OpenOCD is licensed under GPL v2, I have decided to re-use the code
for MPSSE and FT232H and to also license my library under the same license.&lt;/p&gt;
&lt;h2&gt;Downloading the bitstream&lt;/h2&gt;
&lt;p&gt;Here is the documentation of the first victory in this convoluted JTAG bring-up
process, the bitstream is successfully downloaded into the FPGA, and an LED on
the board is blinking, indicating a total success.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Stratix V being programmed over the on-board FT232H" src="www.j-marjanovic.io/images/2020_fpga_card_part_3/jtag_ft232h.png" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;h2&gt;JTAG to Avalon&lt;/h2&gt;
&lt;p&gt;To check if all functions of the JTAG cable are working, I have prepared a small
IP (discussed below) and connected it to &lt;a href="https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/ug/ug_embedded_ip.pdf"&gt;JTAG to Avalon Master
Bridge&lt;/a&gt;.
This IP provides access to the Avalon interconnect over &lt;strong&gt;System Console&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Shown here is a read of 12 words from a certain address in the Avalon MM memory
space:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt; jtag_master &lt;span class="k"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;lindex&lt;/span&gt; &lt;span class="k"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;get_service_paths&lt;/span&gt; master&lt;span class="k"&gt;]&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="k"&gt;]&lt;/span&gt;
&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nv"&gt;devices&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;SGSMD5H&lt;span class="k"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;1&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="k"&gt;)&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;SGSMD5K1&lt;span class="o"&gt;|&lt;/span&gt;..&lt;span class="err"&gt;@&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;bus-instance&lt;span class="err"&gt;#&lt;/span&gt;OTMA FT232H&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="k"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;link&lt;/span&gt;&lt;span class="k"&gt;)&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;JTAG&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="k"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;110&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;132&lt;/span&gt; v1 &lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="k"&gt;)&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;phy_0&lt;span class="o"&gt;/&lt;/span&gt;master
&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nv"&gt;open_service&lt;/span&gt; master &lt;span class="nv"&gt;$jtag_master&lt;/span&gt;

&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nv"&gt;master_read_32&lt;/span&gt; &lt;span class="nv"&gt;$jtag_master&lt;/span&gt; &lt;span class="mh"&gt;0x1000&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;
&lt;span class="nv"&gt;0xc10cc272&lt;/span&gt; &lt;span class="mh"&gt;0x00010000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x266ac4b1&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt;
&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nv"&gt;master_read_32&lt;/span&gt; &lt;span class="nv"&gt;$jtag_master&lt;/span&gt; &lt;span class="mh"&gt;0x1000&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;
&lt;span class="nv"&gt;0xc10cc272&lt;/span&gt; &lt;span class="mh"&gt;0x00010000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x266ac4ac&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt;
&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nv"&gt;master_read_32&lt;/span&gt; &lt;span class="nv"&gt;$jtag_master&lt;/span&gt; &lt;span class="mh"&gt;0x1000&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;
&lt;span class="nv"&gt;0xc10cc272&lt;/span&gt; &lt;span class="mh"&gt;0x00010000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x266ac4a8&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt;
&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nv"&gt;master_read_32&lt;/span&gt; &lt;span class="nv"&gt;$jtag_master&lt;/span&gt; &lt;span class="mh"&gt;0x1000&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;
&lt;span class="nv"&gt;0xc10cc272&lt;/span&gt; &lt;span class="mh"&gt;0x00010000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x266ac4a3&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;I will explain later what exactly are we seeing here, for now let's just accept
that &lt;em&gt;JTAG to Avalon bridge&lt;/em&gt; works, and it reads a magic number register
(&lt;code&gt;0xc10cc272&lt;/code&gt;), a version register (&lt;code&gt;0x00010000&lt;/code&gt;), and the &lt;code&gt;meas_clk[5]&lt;/code&gt; register
reports a value between 644531368 Hz and 644531377 Hz.&lt;/p&gt;
&lt;h2&gt;Nios II&lt;/h2&gt;
&lt;p&gt;For a final test, I wanted to see if I can download a program into the Nios II
instruction memory, run the program and observe the output over the JTAG UART
interface.&lt;/p&gt;
&lt;p&gt;Also here there were no obstacles with the home-made JTAG driver, and
I could successfully perform all tasks necessary to download and debug the
Nios II core, as also presented on the screenshot below.&lt;/p&gt;
&lt;p&gt;&lt;img alt="A program being debugged over Nios II debugger over the on-board FT232H" src="www.j-marjanovic.io/images/2020_fpga_card_part_3/nios_debugger.png" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;h1&gt;Clock&lt;/h1&gt;
&lt;p&gt;The board contains an &lt;a href="https://www.idt.com/us/en/document/dst/idt8n4q001-datasheet"&gt;IDT8N4Q001 programmable clock
oscillator&lt;/a&gt;, which
is most likely used to generate the clocks needed for 40 Gigabit Ethernet (e.g.
156.25 MHz) and maybe for other communication protocols on the QSFP slots. Since
none of the currently available resources
(&lt;a href="https://github.com/wirebond/catapult_v2_pikes_peak"&gt;wirebond/catapult_v2_pikes_peak&lt;/a&gt;
and &lt;a href="http://virtlab.occamlab.com/home/zapisnik/microsoft-catapult-v2"&gt;Microsoft's Catapult v2 (Pikes
Peak)&lt;/a&gt;)
mentions where the IDT is connected to FPGA, I had to find out this myself.&lt;/p&gt;
&lt;p&gt;The oscillator on my board has a code &lt;code&gt;2059&lt;/code&gt;, which according to &lt;a href="https://www.idt.com/us/en/document/mau/femtoclock-ng-ceramic-package-xo-and-vcxo-ordering-information"&gt;the document
from
IDT&lt;/a&gt;
produces 644.53125 MHz for all values of &lt;code&gt;FSEL&lt;/code&gt; in the default configuration.
This matches the previously measured frequency at the input &lt;code&gt;CLK_R_REFCLK5&lt;/code&gt;
(pins &lt;code&gt;T7&lt;/code&gt; and &lt;code&gt;T6&lt;/code&gt;).&lt;/p&gt;
&lt;h2&gt;Clock counter&lt;/h2&gt;
&lt;p&gt;I have written a small IP to measure the frequency of several clocks from a
known clock frequency. As a known frequency I have used the 125 MHz on-board
oscillator.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Clock-counter IP block diagram" src="www.j-marjanovic.io/images/2020_fpga_card_part_3/clock_counter.png" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;The clock-counter IP generates a strobe signal with a frequency of 0.5 Hz (i.e.
pulse width of 1 s). Each of the measured clocks counts with an independent
counter. When a transition of the strobe signal is detected, the counter value
is stored in a register (accessible on Avalon MM interface) and the counter
value is reset to 0. The counter then continuos counting, until the next
transition is detected and the same procedure is repeated.&lt;/p&gt;
&lt;p&gt;Since the counter is active for exactly a second (within a certain ppm range)
before it gets stored in a register, the value stored in the register is the
frequency of the measured clock, in Hertz.&lt;/p&gt;
&lt;p&gt;The clock-domain crossing for the registers is non-existent, the registers
are loaded from one clock (from the measured clock) and read from the Avalon
interface clocks. Since the register is updated only once per second (and
thus the possibility that we read during an update is quite low) and since
this is only used for diagnostics, not implementing a proper CDC can
be tolerated.&lt;/p&gt;
&lt;h2&gt;IDT driver&lt;/h2&gt;
&lt;p&gt;I prepared &lt;a href="https://github.com/j-marjanovic/otma-fpga-bringup/blob/25f084c1b9c3982f7a8b281c95cab5e36f978822/software/otma_bringup/src/IDT8NxQ001.c"&gt;a driver for the IDT
oscillator&lt;/a&gt;,
which provides:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;a function to decode the bytes into a proper structure (can be used to inspect
  the current configuration),&lt;/li&gt;
&lt;li&gt;a function to encode the structure in bytes (can be used to generate the bytes
  to be written into the device),&lt;/li&gt;
&lt;li&gt;a function to configure all relevant fields (per channel) to obtain the
  desired frequency&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;and some other functions.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://github.com/j-marjanovic/otma-fpga-bringup/blob/25f084c1b9c3982f7a8b281c95cab5e36f978822/software/otma_bringup/src/main.c"&gt;main
program&lt;/a&gt;
configures 4 different frequencies, selects one of the four, and then goes into a
loop where it prints the measured frequency once per second.&lt;/p&gt;
&lt;p&gt;Here is the output of the program:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;Clock&lt;/span&gt; &lt;span class="nl"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ident&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Oxcl0cc272&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mh"&gt;0x00010000&lt;/span&gt;
&lt;span class="n"&gt;IDT8NXQOO1&lt;/span&gt; &lt;span class="nl"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="nl"&gt;MINT&lt;/span&gt;     &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;25&lt;/span&gt; &lt;span class="mi"&gt;25&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt; &lt;span class="mi"&gt;25&lt;/span&gt;
  &lt;span class="nl"&gt;MFRAC&lt;/span&gt;    &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mo"&gt;000000&lt;/span&gt; &lt;span class="mo"&gt;000000&lt;/span&gt; &lt;span class="mo"&gt;000000&lt;/span&gt; &lt;span class="mo"&gt;00000&lt;/span&gt;
  &lt;span class="nl"&gt;N&lt;/span&gt;        &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt; &lt;span class="mi"&gt;08&lt;/span&gt;
  &lt;span class="nl"&gt;P&lt;/span&gt;        &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mo"&gt;00&lt;/span&gt; &lt;span class="mo"&gt;00&lt;/span&gt; &lt;span class="mo"&gt;00&lt;/span&gt; &lt;span class="mo"&gt;00&lt;/span&gt;
  &lt;span class="nl"&gt;DSM_ENA&lt;/span&gt;  &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
  &lt;span class="nl"&gt;LF&lt;/span&gt;       &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
  &lt;span class="nl"&gt;CP&lt;/span&gt;       &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
  &lt;span class="nl"&gt;FSEL&lt;/span&gt;     &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
  &lt;span class="nl"&gt;nPLL_BYP&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
  &lt;span class="nl"&gt;ADC_ENA&lt;/span&gt;  &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="n"&gt;Finished&lt;/span&gt; &lt;span class="n"&gt;configuring&lt;/span&gt; &lt;span class="n"&gt;IDT&lt;/span&gt; &lt;span class="n"&gt;oscillator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;entering&lt;/span&gt; &lt;span class="k"&gt;while&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;loop&lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="n"&gt;Clock&lt;/span&gt; &lt;span class="n"&gt;frequency&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;IDT&lt;/span&gt; &lt;span class="n"&gt;osc&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;6250002&lt;/span&gt; &lt;span class="n"&gt;MHz&lt;/span&gt;
&lt;span class="n"&gt;Clock&lt;/span&gt; &lt;span class="n"&gt;frequency&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;IDT&lt;/span&gt; &lt;span class="n"&gt;osc&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;103430663&lt;/span&gt; &lt;span class="n"&gt;MHz&lt;/span&gt;
&lt;span class="n"&gt;Clock&lt;/span&gt; &lt;span class="n"&gt;frequency&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;IDT&lt;/span&gt; &lt;span class="n"&gt;osc&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;156250082&lt;/span&gt; &lt;span class="n"&gt;MHz&lt;/span&gt;
&lt;span class="n"&gt;Clock&lt;/span&gt; &lt;span class="n"&gt;frequency&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;IDT&lt;/span&gt; &lt;span class="n"&gt;osc&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;156250082&lt;/span&gt; &lt;span class="n"&gt;MHz&lt;/span&gt;
&lt;span class="n"&gt;Clock&lt;/span&gt; &lt;span class="n"&gt;frequency&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;IDT&lt;/span&gt; &lt;span class="n"&gt;osc&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;156250085&lt;/span&gt; &lt;span class="n"&gt;MHz&lt;/span&gt;
&lt;span class="n"&gt;Clock&lt;/span&gt; &lt;span class="n"&gt;frequency&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;IDT&lt;/span&gt; &lt;span class="n"&gt;osc&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;156250086&lt;/span&gt; &lt;span class="n"&gt;MHz&lt;/span&gt;
&lt;span class="n"&gt;Clock&lt;/span&gt; &lt;span class="n"&gt;frequency&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;IDT&lt;/span&gt; &lt;span class="n"&gt;osc&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;156250086&lt;/span&gt; &lt;span class="n"&gt;MHz&lt;/span&gt;
&lt;span class="n"&gt;Clock&lt;/span&gt; &lt;span class="n"&gt;frequency&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;IDT&lt;/span&gt; &lt;span class="n"&gt;osc&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;156250085&lt;/span&gt; &lt;span class="n"&gt;MHz&lt;/span&gt;
&lt;span class="n"&gt;Clock&lt;/span&gt; &lt;span class="n"&gt;frequency&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;IDT&lt;/span&gt; &lt;span class="n"&gt;osc&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;156250086&lt;/span&gt; &lt;span class="n"&gt;MHz&lt;/span&gt;
&lt;span class="n"&gt;Clock&lt;/span&gt; &lt;span class="n"&gt;frequency&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;IDT&lt;/span&gt; &lt;span class="n"&gt;osc&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;156250084&lt;/span&gt; &lt;span class="n"&gt;MHz&lt;/span&gt;
&lt;span class="n"&gt;Clock&lt;/span&gt; &lt;span class="n"&gt;frequency&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;IDT&lt;/span&gt; &lt;span class="n"&gt;osc&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;156250083&lt;/span&gt; &lt;span class="n"&gt;MHz&lt;/span&gt;
&lt;span class="n"&gt;Clock&lt;/span&gt; &lt;span class="n"&gt;frequency&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;IDT&lt;/span&gt; &lt;span class="n"&gt;osc&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;156250082&lt;/span&gt; &lt;span class="n"&gt;MHz&lt;/span&gt;
&lt;span class="n"&gt;Clock&lt;/span&gt; &lt;span class="n"&gt;frequency&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;IDT&lt;/span&gt; &lt;span class="n"&gt;osc&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;156250082&lt;/span&gt; &lt;span class="n"&gt;MHz&lt;/span&gt;
&lt;span class="n"&gt;Clock&lt;/span&gt; &lt;span class="n"&gt;frequency&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;IDT&lt;/span&gt; &lt;span class="n"&gt;osc&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;156250082&lt;/span&gt; &lt;span class="n"&gt;MHz&lt;/span&gt;
&lt;span class="n"&gt;Clock&lt;/span&gt; &lt;span class="n"&gt;frequency&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;IDT&lt;/span&gt; &lt;span class="n"&gt;osc&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;156250081&lt;/span&gt; &lt;span class="n"&gt;MHz&lt;/span&gt;
&lt;span class="n"&gt;Clock&lt;/span&gt; &lt;span class="n"&gt;frequency&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;IDT&lt;/span&gt; &lt;span class="n"&gt;osc&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;156250084&lt;/span&gt; &lt;span class="n"&gt;MHz&lt;/span&gt;
&lt;span class="p"&gt;[...]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;We see that after the measurement has stabilized, we are receiving the clock
which we have configured, and we see that both the reference clock (125 MHz to
generate the strobe signal) and the measured clock have a low wander and a low
offset.&lt;/p&gt;
&lt;h1&gt;Conclusion and future plans&lt;/h1&gt;
&lt;p&gt;I have managed to prepare a setup that will allow me to develop and debug the
Stratix V FPGA directly from Quartus. I am satisfied that I could develop a
software solution and use the on-board FTDI chip, and I did not have to solder
wires for the JTAG to the board.&lt;/p&gt;
&lt;p&gt;With the access to useful tools in Quartus (e.g. SignalTap, Console, Nios II
debugger, Transceiver toolkit, ...) I believe bringing up the rest of the board
will be much easier.&lt;/p&gt;
&lt;p&gt;Finally, to validate that the library for the JTAG cable runs reliably, I have
used it to develop a small program that can configure the on-board oscillator
to a desired frequency, in my case 156.25 MHz. I plan to use this for a test
of the transceivers connected to the QSFP slots.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;All trademarks and registered trademarks are the property of their respective
owners.&lt;/p&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="FPGA"></category></entry><entry><title>Stratix V accelerator card from eBay, part 2</title><link href="www.j-marjanovic.io/stratix-v-accelerator-card-from-ebay-part-2.html" rel="alternate"></link><published>2020-06-07T16:00:00+02:00</published><updated>2020-06-07T16:00:00+02:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2020-06-07:www.j-marjanovic.io/stratix-v-accelerator-card-from-ebay-part-2.html</id><summary type="html">&lt;p&gt;In my previous blog post I have explored the FPGA card I have purchased on eBay,
and in this post I will present an adapter card which I have developed. The
adapter provides PCI Express connection between a normal card-edge slot and the
FPGA card, as well as the access …&lt;/p&gt;</summary><content type="html">&lt;p&gt;In my previous blog post I have explored the FPGA card I have purchased on eBay,
and in this post I will present an adapter card which I have developed. The
adapter provides PCI Express connection between a normal card-edge slot and the
FPGA card, as well as the access to the I2C bus and some additional signals.&lt;/p&gt;
&lt;p&gt;To summarize my previous blog post, the FPGA card has a 160-pin Samtec
connector, providing power (12V), I2C bus for management, and a total of 16
lanes for PCI Express.&lt;/p&gt;
&lt;h1&gt;The adapter&lt;/h1&gt;
&lt;p&gt;To keep the cost down, I have tried to make the adapter as small as possible,
and at this point to only develop a proof-of-concept. On this first variant
of the adapter only connects one PCIe lane, and I plan a second variant where
all 8 or 16 lanes will be connected to the PCIe edge connector.&lt;/p&gt;
&lt;p&gt;The KiCad project for the adapter card is available on &lt;a href="https://github.com/j-marjanovic/ocs-tray-mezzanine-adapter"&gt;my
GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Shown in the image below are the relevant parts of the adapter card.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Block diagram of the adapter card" src="www.j-marjanovic.io/images/2020_fpga_card_part_2/otma.png" style="width:80%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;To connect to the PCIe slot I have purchased one of those &lt;a href="https://lmgtfy.com/?q=bitcoin+pci+express+riser+card&amp;amp;t=i"&gt;Bitcoin mining riser
cards&lt;/a&gt;, which provide
connections between PCIe slot and a USB3 cable. Since there are a total of
4 PCIe reference clock inputs per OCS specification, I have distributed
the clock to all inputs using a dedicated IC. I have connected the management
I2C bus (&lt;code&gt;MEZZ_SDA&lt;/code&gt; and &lt;code&gt;MEZZ_SCL&lt;/code&gt;) to a header, which allows me to explore the
bus with a Raspberry Pi for example, as shown in the picture below.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Raspberry Pi connected to the management I2C bus" src="www.j-marjanovic.io/images/2020_fpga_card_part_2/adapter_rpi.jpg" style="width:80%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;h1&gt;I2C&lt;/h1&gt;
&lt;p&gt;Having a Raspberry Pi connected to the I2C bus, we can first explore the
present devices.&lt;/p&gt;
&lt;p&gt;Using &lt;code&gt;i2cdetect&lt;/code&gt; we can find all devices which have acknowledged their I2C
address:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;pi@raspberrypi:~ $ i2cdetect -y 1
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:          -- -- -- -- -- -- -- -- -- -- -- -- -- 
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
40: -- -- -- -- -- -- -- -- -- -- -- -- 4c -- -- -- 
50: -- 51 -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
70: -- -- -- -- -- -- -- 77         
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;We see the EEPROM at the address 0x51, and another two devices at addresses 0x4C
and 0x77 - I would assume that these are some kind of sensors and/or regulators.&lt;/p&gt;
&lt;h2&gt;IPMI FRU&lt;/h2&gt;
&lt;p&gt;We can now use the &lt;code&gt;i2cdump&lt;/code&gt; to dump the content of the EEPROM. As I have
expected, the EEPROM is used by the Baseboard Management Controller (or
something similar) and the content is compliant with &lt;a href="https://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/platform-management-fru-document-rev-1-2-feb-2013.pdf"&gt;IPMI Platform Management
FRU Information Storage
Definition&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The following information are stored in the EEPROM:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;ChassisArea:
  version: 1
  length: 32 bytes
  chassis type: Rack Mount Chassis
  part nr: X907370-001
  serial nr: (len 0)
  checksum: 146 (OK)
BoardArea:
  version: 1
  length: 64 bytes
  lang: 25
  mfg date: 2016-01-06 03:26
  mfgr: Microsoft
  prod_name: PPFPGA
  serial: OLJ60100194
  part: X900563-001
  file id: FRU 1.0
  checksum: 108 (OK)
ProductArea:
  version: 1
  length: 72 bytes
  mfgr: Microsoft
  prod_name: PPFPGA
  part_nr: X900563-001
  part_ver: 1.0
  part_sn: OLJ60100194
  asset_tag:
  file id: 1.0
  checksum: 241 (OK) 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;From my understanding of the IPMI standard, the Chassis information should
not be present on this card:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;A system can have multiple FRU Information Devices within a chassis, but only
one device should provide the Chassis Info Area. Thus, this area will
typically be absent from most FRU Information Devices.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h1&gt;PCI Express&lt;/h1&gt;
&lt;p&gt;And finally, the most interesting and the most challenging part, the PCI Express
connection.&lt;/p&gt;
&lt;p&gt;I have plugged in the card, as shown on the image below and turned on the computer. &lt;/p&gt;
&lt;p&gt;&lt;img alt="PCIe connection over USB cable" src="www.j-marjanovic.io/images/2020_fpga_card_part_2/adapter_pcie.jpg" style="width:80%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;And it works! Using &lt;code&gt;lspci&lt;/code&gt; command to list all the devices visible to the CPU,
one can also note the Microsoft card:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$ lspci
00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)
00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04)
00:16.2 IDE interface: Intel Corporation 6 Series/C200 Series Chipset Family IDE-r Controller (rev 04)
00:16.3 Serial controller: Intel Corporation 6 Series/C200 Series Chipset Family KT Controller (rev 04)
00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection (Lewisville) (rev 04)
00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b4)
00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b4)
00:1c.6 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 7 (rev b4)
00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 04)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a4)
00:1f.0 ISA bridge: Intel Corporation Q67 Express Chipset LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port Desktop SATA AHCI Controller (rev 04)
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 04)
03:00.0 Unassigned class [ff00]: Microsoft Corporation Device b100 (rev 01)
04:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 04)
05:03.0 FireWire (IEEE 1394): LSI Corporation FW322/323 [TrueFire] 1394a Controller (rev 70)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Using &lt;code&gt;lspci -vv&lt;/code&gt; to get more information, we see that the link is established
at 2.5 GT/s (probably too many connectors in series to go faster) and at
x1 width (as expected, since we only pass one lane through the USB cable):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$ sudo lspci -s 03:00 -vv
03:00.0 Unassigned class [ff00]: Microsoft Corporation Device b100 (rev 01)
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast &amp;gt;TAbort- &amp;lt;TAbort- &amp;lt;MAbort- &amp;gt;SERR- &amp;lt;PERR- INTx-
    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin A routed to IRQ 11
    Region 0: Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
    Region 2: Memory at fb000000 (32-bit, non-prefetchable) [size=1K]
    Capabilities: [50] MSI: Enable- Count=1/4 Maskable- 64bit+
        Address: 0000000000000000  Data: 0000
    Capabilities: [78] Power Management version 3
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
        Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [80] Express (v2) Endpoint, MSI 00
        DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s &amp;lt;64ns, L1 &amp;lt;1us
            ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 10.000W
        DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
            RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
            MaxPayload 128 bytes, MaxReadReq 128 bytes
        DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
        LnkCap: Port #1, Speed 8GT/s, Width x8, ASPM not supported
            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
        LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta: Speed 2.5GT/s (downgraded), Width x1 (downgraded)
            TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR-
             10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
             EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
             FRS-, TPHComp-, ExtTPHComp-
             AtomicOpsCap: 32bit- 64bit- 128bitCAS-
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
             AtomicOpsCtl: ReqEn-
        LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
             Compliance De-emphasis: -6dB
        LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
             EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
    Capabilities: [100 v1] Virtual Channel
        Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
        Arb:    Fixed- WRR32- WRR64- WRR128-
        Ctrl:   ArbSelect=Fixed
        Status: InProgress-
        VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
            Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
            Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
            Status: NegoPending- InProgress-
    Capabilities: [200 v1] Vendor Specific Information: ID=0000 Rev=0 Len=044 &amp;lt;?&amp;gt;
    Capabilities: [300 v1] Secondary PCI Express
        LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
        LaneErrStat: 0
    Capabilities: [800 v1] Advanced Error Reporting
        UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
        CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
        CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
        AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
            MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
        HeaderLog: 00000000 00000000 00000000 00000000
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;h1&gt;Summary and outlook&lt;/h1&gt;
&lt;p&gt;The adapter card did its job and provided access to the I2C bus and a connection
to the PCI Express. I have managed to parse the content of the EEPROM (and
realize that there is nothing interesting there) and establish the PCIe
connection to the FPGA, as a first step of getting the hardware ready for custom
developments.&lt;/p&gt;
&lt;p&gt;Eventually I plan to develop a card with a wider PCIe link and get rid of the
USB cable setup.&lt;/p&gt;
&lt;p&gt;There were some mistakes on the adapter board (rotated Samtec connector, swapped
TX and RX on the USB connector) which I could work around, and serve as a lesson
to be more careful next time and double-check everything.&lt;/p&gt;
&lt;p&gt;As the next step, I would like to understand better how the JTAG chip works. I
know that it can be used with OpenOCD, but I would imagine that one can somehow
also make it talk to Quartus directly.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;All trademarks and registered trademarks are the property of their respective owners.&lt;/p&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="FPGA"></category></entry><entry><title>Stratix V accelerator card from eBay</title><link href="www.j-marjanovic.io/stratix-v-accelerator-card-from-ebay.html" rel="alternate"></link><published>2020-05-03T14:30:00+02:00</published><updated>2020-05-03T14:30:00+02:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2020-05-03:www.j-marjanovic.io/stratix-v-accelerator-card-from-ebay.html</id><summary type="html">&lt;p&gt;A couple of weeks ago &lt;a href="https://twitter.com/rombik_su"&gt;@rombik_su&lt;/a&gt; in a &lt;a href="https://twitter.com/rombik_su/status/1250382904074608642"&gt;Twitter
thread&lt;/a&gt; pointed out a
very cheap FPGA accelerator card on eBay. The board contains a proprietary
Samtec board-to-board connector, most likely carrying power, PCI Express, and
auxiliary signals (JTAG, IPMI to BMC, ...), two QSFP cages, DDR3 and a large
FPGA, hidden …&lt;/p&gt;</summary><content type="html">&lt;p&gt;A couple of weeks ago &lt;a href="https://twitter.com/rombik_su"&gt;@rombik_su&lt;/a&gt; in a &lt;a href="https://twitter.com/rombik_su/status/1250382904074608642"&gt;Twitter
thread&lt;/a&gt; pointed out a
very cheap FPGA accelerator card on eBay. The board contains a proprietary
Samtec board-to-board connector, most likely carrying power, PCI Express, and
auxiliary signals (JTAG, IPMI to BMC, ...), two QSFP cages, DDR3 and a large
FPGA, hidden under the heatsink.&lt;/p&gt;
&lt;p&gt;Being passionate about everything FPGA-related, and with Coronavirus lockdown
limiting the number of fun things to do, I decided to purchase the board. An
evaluation kit of this kind can easily cost thousands, and 40 USD is a real
bargain.&lt;/p&gt;
&lt;h2&gt;Initial research&lt;/h2&gt;
&lt;p&gt;While waiting for the board to arrive, I did some initial investigation. The
description on eBay is quite cryptic. The title of the listing included all the
text from the labels on the board, including the label "AIRFLOW" indicating
the direction of the forced air through the board.&lt;/p&gt;
&lt;p&gt;One of the most fruitful clues was the label "Microsoft" on the board. It was
well-publicized a couple of years ago that Microsoft is using FPGAs to
accelerate Bing searches, and this might be one of the boards used in the
servers.&lt;/p&gt;
&lt;p&gt;Remembering that Microsoft went with Altera (now Intel PSG) and that this was
some years back, it is most likely that the card contains a Stratix V FPGA.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://www.microsoft.com/en-us/research/uploads/prod/2014/06/HC26.12.520-Recon-Fabric-Pulnam-Microsoft-Catapult.pdf"&gt;first
link&lt;/a&gt;
on Google (or should I have used Bing? would the FPGAs be aware that I am
looking for information about them?) for "microsoft catapult stratix v"
presented some conceptually similar cards, but not exactly the same.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2016/10/Cloud-Scale-Acceleration-Architecture.pdf"&gt;second
link&lt;/a&gt;,
however, presented the exact card which I have purchased:&lt;/p&gt;
&lt;p&gt;&lt;img alt="FPGA card, from A. Caulfield et al: A Cloud-Scale Acceleration Architecture" src="www.j-marjanovic.io/images/2020_fpga_card/img-002.jpg" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;div style="text-align: center;"&gt;&lt;small&gt;(from A. Caulfield et al: A Cloud-Scale Acceleration Architecture)&lt;/small&gt;&lt;/div&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;The paper also contains a block diagram highlighting the main components
of the board:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Block diagram, from A. Caulfield et al: A Cloud-Scale Acceleration Architecture" src="www.j-marjanovic.io/images/2020_fpga_card/block_diagram.png" style="width:60%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;div style="text-align: center;"&gt;&lt;small&gt;(from A. Caulfield et al: A Cloud-Scale Acceleration Architecture)&lt;/small&gt;&lt;/div&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;The paper also mentions "widely-used OpenCompute server" which could give out
some information about the pinout on the Samtec connector. Since there are
numerous formats for mezzanine cards standardized by the OpenCompute project,
and finding the matching one is not trivial, I decided to leave this task for
later.&lt;/p&gt;
&lt;p&gt;Further digging in the search results, I was also able to find &lt;a href="https://mspoweruser.com/microsoft-talks-about-the-project-that-helped-them-build-the-worlds-first-ai-supercomputer/"&gt;another photo&lt;/a&gt;
of the board with the heatsink removed.&lt;/p&gt;
&lt;p&gt;The marking on the FPGA are removed, but we already know that it is a Stratix V,
and some things were expected from previous documents, e.g. 5 DDR3 chips for
a combined of 9 chips for 72-bit DDR3 data width).&lt;/p&gt;
&lt;p&gt;What I found interesting in this picture is a Flash memory (Micron 25Q256) in
the top right corner, which means that the image for the FPGA is most likely
stored on the board and is not downloaded through the connector at the startup.
It is also possible that &lt;a href="https://www.intel.com/content/www/us/en/programmable/support/support-resources/support-centers/devices/cfg-index/cfg-via-protocol.html"&gt;Configuration via
Protocol&lt;/a&gt;
is used, and only the basic image is stored in the Flash.&lt;/p&gt;
&lt;h1&gt;Board overview&lt;/h1&gt;
&lt;p&gt;Then one day, the board finally arrives. The board matches the description
in the previously-mentioned article.&lt;/p&gt;
&lt;p&gt;A quick look at the board identifies the following components:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;160 pin Samtec SEARAY™ connector&lt;/li&gt;
&lt;li&gt;USB (FT232H, used as a JTAG interface)&lt;/li&gt;
&lt;li&gt;power entry (FDMS0310AS MOSFET, 0.01 Ohm resistor, unidentified IC with the markings "L536FCD")&lt;/li&gt;
&lt;li&gt;power converters (Enpirion® EN2342QI DC-DC converter)&lt;/li&gt;
&lt;li&gt;programmable oscillator (IDT8N4Q001)&lt;/li&gt;
&lt;li&gt;Flash memory for FPGA (Micron N25Q256A)&lt;/li&gt;
&lt;li&gt;I2C EEPROM (ST M24128-BW)&lt;/li&gt;
&lt;li&gt;DDR3 memory (SK hynix H5TC4G83BFR)&lt;/li&gt;
&lt;/ul&gt;
&lt;p style="width:80%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;img alt="The board with annotated sections" src="www.j-marjanovic.io/images/2020_fpga_card/9560_annotated.jpg"&gt;&lt;/p&gt;
&lt;h1&gt;USB/JTAG&lt;/h1&gt;
&lt;p&gt;At this point, it was time to start experimenting with the board. The FT232H
seems to be powered from the USB BUS voltage: pin 1 of J3 is connected to pin 40
(VREGIN) of FT232H. This is why I decided to first start with the USB. With the
datasheet for FT232H it was trivial to determine the pin assignments on the
connector. I did not want to solder directly on the connector pins, as I would
like to make a proper cable in the future.&lt;/p&gt;
&lt;p style="width:80%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;img alt="USB connection with signal annotations" src="www.j-marjanovic.io/images/2020_fpga_card/9570_annotated.jpg"&gt;&lt;/p&gt;
&lt;p&gt;After plugging the cable in the computer, the FT232H is recognized as "USB
Serial Converter":&lt;/p&gt;
&lt;p style="width:40%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;img alt="Driver information about the FT232H" src="www.j-marjanovic.io/images/2020_fpga_card/device_manager2.png"&gt;&lt;/p&gt;
&lt;h1&gt;Connector, 1st look&lt;/h1&gt;
&lt;p&gt;At this point I could start trying to reverse engineer the connector pinout. I
managed to figure out the following connections:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Ground&lt;/li&gt;
&lt;li&gt;Input power: there are a couple of pins connected to the Drain of the large
    MOSFET. I also assume that the level is, as it is common for PCIe cards,
    12V.&lt;/li&gt;
&lt;li&gt;There are 16 pairs of AC-coupling capacitors near the connector. PCIe
    standard mandates capacitors on the TX side, so I assume this is PCIE_TX.
    Not knowing the exact lane numbering, I decided to enumerate them with
    letters instead of numbers.&lt;/li&gt;
&lt;li&gt;PCIe RX is TBD, but looking at unassigned pins a clear pattern is visible&lt;/li&gt;
&lt;li&gt;Some of the pins are connected to the circuit above the connector - I have
    annotated these pins, but right now I do not have a clear idea what is the
    purpose. I annotated those pins as AUX.&lt;/li&gt;
&lt;/ul&gt;
&lt;p style="width:100%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;img alt="Samtec connector with partial pinout, 1" src="www.j-marjanovic.io/images/2020_fpga_card/pinout_1.png"&gt;&lt;/p&gt;
&lt;h1&gt;Power&lt;/h1&gt;
&lt;p&gt;I have connected the 12V on the Drain side of the Q12 (FDMS0310AS), which has
also energized the PU12 and +12V pins on the Samtec connector. However, the
power consumption was only 3 mA, and the MOSFET was left closed.&lt;/p&gt;
&lt;p&gt;I could not find any information about PU12 (L536FCD). I assume it is some
kind of a current-limit protection, measuring the current through the 0.01 Ohm
shunt resistor and controlling the Gate pin of Q12.&lt;/p&gt;
&lt;p&gt;To literally bypass this problem, I have decided to also connect 12 V on the
other side of the Q12. This yielded some results; the power consumption raised
to 695 mA, which is what one would expect from such board, and the LEDs turned
on. &lt;/p&gt;
&lt;p style="width:80%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;img alt="Power consumption" src="www.j-marjanovic.io/images/2020_fpga_card/fr_9586_size1024.jpg"&gt;&lt;/p&gt;
&lt;p&gt;On various points in the circuit I could also measure all the voltages one
would expect to find in such a circuit:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;1.5V on the Enpirion (PU16) output&lt;/li&gt;
&lt;li&gt;1.35 V on C406 (DDR3 voltage)&lt;/li&gt;
&lt;li&gt;0.674 V on C1005 (DDR3 termination voltage)&lt;/li&gt;
&lt;li&gt;2.5 V on C442 (periphery)&lt;/li&gt;
&lt;li&gt;3.3 V on QSFP capacitors&lt;/li&gt;
&lt;li&gt;0.9 V on C354 (FPGA core voltage)&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Memories&lt;/h1&gt;
&lt;h2&gt;FPGA configuration memory&lt;/h2&gt;
&lt;p&gt;The heatsink covers the large part of the top side of the board including the
Flash memory for the FPGA configuration. One can, however, still reach the pins
8 and 9 (DQ1 and DQ2, respectively) with an oscilloscope probe.&lt;/p&gt;
&lt;p&gt;After the power is applied to the board, we can observe that the FPGA gets
programmed in roughly a second. This is above the 100 ms/200 ms limit required
by the PCIe standard, but in this custom form factor the value might be
different.&lt;/p&gt;
&lt;p&gt;DQ2 pin on N25Q256A:&lt;/p&gt;
&lt;p style="width:80%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;img alt="Data from DQ2 pin after power up" src="www.j-marjanovic.io/images/2020_fpga_card/mem_top_left_pin.png"&gt;&lt;/p&gt;
&lt;p&gt;LEDs on the board are also driven by the FPGA and remain lit for a second until
the FPGA is not programmed, another indication that the FPGA gets configured
from the memory.&lt;/p&gt;
&lt;h2&gt;EEPROM&lt;/h2&gt;
&lt;p&gt;Another memory on the board is a small 128Kbit EEPROM, which probably stores MAC
addresses, serial numbers, and other similar information.&lt;/p&gt;
&lt;p&gt;Quite interestingly, SDA and SCL lines remain stuck low after some time.
Maybe the EEPROM is not used at all, or maybe there is some other part
of the circuit is keeping the EEPROM interface state machine in a reset.&lt;/p&gt;
&lt;p&gt;SDA pin on M24128-BW, SCL is very similar:&lt;/p&gt;
&lt;p style="width:80%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;img alt="SDA pin on M24128-BW" src="www.j-marjanovic.io/images/2020_fpga_card/M24128_SDA.png"&gt;&lt;/p&gt;
&lt;h1&gt;Connector, 2nd look&lt;/h1&gt;
&lt;p&gt;With the board powered on, I could measure the voltage on the connector pins.
As expected, on 16 differential pairs I can sense a bias voltage of the PCIe
receivers, around 0.7 V.&lt;/p&gt;
&lt;p&gt;Some of the AUX pins have a slight bias, but this is probably caused by
pull-up resistors and other components. It would require more investigation
to fully understand the purpose of these pins.&lt;/p&gt;
&lt;p style="width:100%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;img alt="Samtec connector with partial pinout, 2" src="www.j-marjanovic.io/images/2020_fpga_card/pinout_2.png"&gt;&lt;/p&gt;
&lt;h1&gt;Components&lt;/h1&gt;
&lt;h2&gt;QSFP&lt;/h2&gt;
&lt;p&gt;Plugging in a QSFP cable only marginally increases the power consumption.
It is clear that the high-speed circuit is disabled, most likely because
of an internal register configuration and less likely because the cable
is "not compatible", i.e. the board parses the EEPROM and it does not enable
the high-speed circuit.&lt;/p&gt;
&lt;p&gt;Consumption from the 12V input:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;without the QSFP module inserted: 696.7 mA&lt;/li&gt;
&lt;li&gt;with the QSFP module inserted: 697.5 mA&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;IDT oscillator&lt;/h2&gt;
&lt;p&gt;IDT oscillator is producing 645 MHz clock.&lt;/p&gt;
&lt;h2&gt;DDR3&lt;/h2&gt;
&lt;p&gt;I have probed what I presume are the DDR3 termination resistors (on the bottom
side, on the other side of the 5th DDR3 component on the top) and observed
no switching activity. It seems that the DDR3 controller is kept in reset.&lt;/p&gt;
&lt;h1&gt;Outlook&lt;/h1&gt;
&lt;p&gt;The first bring-up session was quite successful; I have managed to turn on the
board without damaging it and figure out the basic pinout of the connector.&lt;/p&gt;
&lt;p&gt;For the next step, I plan to investigate the OpenCompute website if there exists
a document that would describe the pinout of the connector. Still left to be
determined are the reference clock and an enable signal from the connector.&lt;/p&gt;
&lt;p&gt;Eventually I plan to produce a small PCB that would allow plugging this board
in a normal PCIe card slot.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;All trademarks and registered trademarks are the property of their respective owners.&lt;/p&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="FPGA"></category></entry><entry><title>Books read: Synchrotron Radiation Sources</title><link href="www.j-marjanovic.io/books-read-synchrotron-radiation-sources.html" rel="alternate"></link><published>2019-11-02T12:30:00+01:00</published><updated>2019-11-02T12:30:00+01:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2019-11-02:www.j-marjanovic.io/books-read-synchrotron-radiation-sources.html</id><summary type="html">&lt;p&gt;Here are my notes on the book "Synchrotron Radiation Sources: A Primer", edited
by H. Winick. Although a little bit dated (the book is from 1995) it gives a
nice overview of all components of a modern synchrotron and helps with better
understanding of how all subsystem interconnect together. It …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Here are my notes on the book "Synchrotron Radiation Sources: A Primer", edited
by H. Winick. Although a little bit dated (the book is from 1995) it gives a
nice overview of all components of a modern synchrotron and helps with better
understanding of how all subsystem interconnect together. It also serves as a
good introduction in the field of machine physics, which was for me (I have a
diploma degree in electronics) quite effective to better understand the
challenges faced in synchrotrons of the 4th generation (diffraction-limited
storage rings).&lt;/p&gt;
&lt;p style="width:80%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;img alt="Synchrotron Radiation Sources: A Primer" src="www.j-marjanovic.io/images/synchrotron_light_sources.jpg"&gt;&lt;/p&gt;
&lt;h1&gt;Chapter 1&lt;/h1&gt;
&lt;p&gt;page 2: "electron (or positrons)" --&amp;gt; how would a a SLS with protons look like?
larger insertions devices? different SR wavelengths?&lt;/p&gt;
&lt;p&gt;page 4: betatron - early machines, vertical magnetic field (spatially const,
time varying)&lt;/p&gt;
&lt;p&gt;page 7: non top-up mode --&amp;gt; from what I heard it took several hours to start
the machine&lt;/p&gt;
&lt;p&gt;page 9: &lt;span class="math"&gt;\(\gamma = \frac{mc^2}{E}\)&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;page 12: time structure - ESRF hybrid mode: "pulsed experiments in us, ns and ps
time scale" (&lt;em&gt;M. Wulff et al., Time-resolved structures of macromolecules at the
ESRF&lt;/em&gt;)&lt;/p&gt;
&lt;p&gt;page 13: are the main advantage of FELs short pulses or higher brightness?&lt;/p&gt;
&lt;p&gt;page 15: TESLA CDR published in 1988, the book is from 1994.&lt;/p&gt;
&lt;h1&gt;Chapter 2&lt;/h1&gt;
&lt;p&gt;page 34: theory on strong focusing --&amp;gt; very interesting, study &lt;em&gt;E. D. Curant et
al., Theory of the alternating-gradient synchrotron&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;page 35: are betatron oscillations period or is there a phase advance?&lt;/p&gt;
&lt;p&gt;page 39: (slightly) rotated quadrupoles introduce x-y coupling&lt;/p&gt;
&lt;p&gt;page 41: correctors at the beggining and at the end of the insertion device --&amp;gt;
modern take presented in &lt;em&gt;G. Rehm et al.: First projects at Diamond Light Source
involving MTCA&lt;/em&gt;
(https://indico.desy.de/indico/event/20703/session/0/contribution/77/material/slides/1.pdf, page 9)&lt;/p&gt;
&lt;p&gt;page 41: what is a "tune shift"?&lt;/p&gt;
&lt;p&gt;page 43: planar design --&amp;gt; prevents orbit "growth" in y (apart from coupling from quadrupoles)&lt;/p&gt;
&lt;p&gt;page 44: dispersion-free region&lt;/p&gt;
&lt;p&gt;page 46: dynamic aperture: max betatron osc that can be sustained&lt;/p&gt;
&lt;p&gt;page 48: for how much do insertion devices reduce the energy? is this important
for the orbit?&lt;/p&gt;
&lt;p&gt;page 49: vertical/horizontal emittance ratio for most machines: 0.01 to 0.03&lt;/p&gt;
&lt;h1&gt;Chapter 3&lt;/h1&gt;
&lt;p&gt;page 60: septum magnets: what is the purpose, how do they work&lt;/p&gt;
&lt;p&gt;page 62: off axis &amp;lt;-/-&amp;gt; on axis&lt;/p&gt;
&lt;p&gt;page 68: max booster boost ratio of 50, in reality a little less. example 1:
DESY II (injection at 450 MeV, ejection at 6 GeV - factor 13). example 2: SPS
(max energy 450 GeV) to LHC (max energy 6500 GeV), factor 14&lt;/p&gt;
&lt;p&gt;page 65: RF photo cathode gun now more popular&lt;/p&gt;
&lt;p&gt;page 73: "brute force" used at MedAustron&lt;/p&gt;
&lt;p&gt;page 75: 3 kA, 13 kV!!!&lt;/p&gt;
&lt;p&gt;page 78: high-Z = high atomic number (e.g. W, Pb)&lt;/p&gt;
&lt;p&gt;page 81: harmonic nr = number of buckets&lt;/p&gt;
&lt;h1&gt;Chapter 4&lt;/h1&gt;
&lt;p&gt;page 95: first mention of time-resolved measurement&lt;/p&gt;
&lt;p&gt;page 91: synchrotron osc = longitudinal, betatron osc = transversal&lt;/p&gt;
&lt;p&gt;page 96: cavity impedance scales lineraly with the number of cells&lt;/p&gt;
&lt;p&gt;page 105: what about traveling-wave cavity? is this used in SLS?&lt;/p&gt;
&lt;p&gt;page 111: How a Klystron amplifier works (https://www.youtube.com/watch?v=Fvud81pYGOg)&lt;/p&gt;
&lt;p&gt;page 117: protection for kylstron: Klystron Lifetime Management (http://accelconf.web.cern.ch/AccelConf/ICALEPCS2013/talks/tucoca09_talk.pdf)&lt;/p&gt;
&lt;h1&gt;Chapter 5&lt;/h1&gt;
&lt;p&gt;page 122: photon BPMs to global orbit feedback also possible - Diamond, ELETTRA
(here only local feedback is described)&lt;/p&gt;
&lt;p&gt;page 132: "DIAMOND at Deresbury"&lt;/p&gt;
&lt;p&gt;page 145: "the designer is relying very much on the good will of the stell
company" --&amp;gt; the reality we work in&lt;/p&gt;
&lt;h1&gt;Chapter 6&lt;/h1&gt;
&lt;p&gt;page 159: digital feedback have taken over since the book was written&lt;/p&gt;
&lt;h1&gt;Chapter 7&lt;/h1&gt;
&lt;p&gt;page 163: book is from 1994, more modern methods could be used -&amp;gt; PXI or MTCA
crate with motor controller and fast ADC&lt;/p&gt;
&lt;p&gt;page 194: corrector magnets are not mentioned, but it would be convenient (and
interesting) to measure the resp with high freq (e.g. at 1 kHz)&lt;/p&gt;
&lt;h1&gt;Chapter 8&lt;/h1&gt;
&lt;p&gt;page 197: "the beam would propagate only a few meters in atmosphere" --&amp;gt; more
than I would expect&lt;/p&gt;
&lt;p&gt;page 199: desorption &amp;lt;-/-&amp;gt; absorption (photon- and electron-stimulated desorbtion)&lt;/p&gt;
&lt;p&gt;page 203: beam stop - photones after diploe --&amp;gt; more than 10 kW of power&lt;/p&gt;
&lt;p&gt;page 211: 1e-11 Torr = still 1e11 molecules per L (= 2.5e22 molecules/L of air *
1e-11 Tor in atm)&lt;/p&gt;
&lt;h1&gt;Chapter 9&lt;/h1&gt;
&lt;p&gt;page 218: AI: last wave of AI, Lisp-based, very advanced but still very limited (=specific)&lt;/p&gt;
&lt;p&gt;page 219: interesting from historical point of view - only EPICS is mentioned&lt;/p&gt;
&lt;p&gt;page 220: communication protocols from the past Bitbus (http://accelconf.web.cern.ch/accelconf/p91/PDF/PAC1991_1496.PDF)
and Multibus &lt;/p&gt;
&lt;p&gt;page 220: reflective memory techniques (for FOFB)&lt;/p&gt;
&lt;p&gt;page 225: drift and negative drift sounds very hackish; wouldn't it be easier to
take the position and angle of the insertion device (2 step simulation)&lt;/p&gt;
&lt;p&gt;page 226: check ref 27: "Computer Codes for Particle Accelerator Design and
Analysis: A Compendium"&lt;/p&gt;
&lt;p&gt;page 227: phase-space 6D - x, x', y, y' dp/p, ds - each coordinate relative
to ideal orbit&lt;/p&gt;
&lt;p&gt;page 228: R matrix sometimes 2x2, should is be 6x6 in "normal" case? find some
examples for individual elements ...&lt;/p&gt;
&lt;p&gt;page 228: beta-function is a solution for single particle motion&lt;/p&gt;
&lt;p&gt;page 229: Twiss parameters &amp;lt;-&amp;gt; beta func and beta' (alfa, beta, gamma)&lt;/p&gt;
&lt;p&gt;page 239: read again G. Strang: Linear Algebra and Its Applications&lt;/p&gt;
&lt;h1&gt;Chapter 10&lt;/h1&gt;
&lt;p&gt;page 245: time-resolved spectroscopy - learn more on this&lt;/p&gt;
&lt;p&gt;page 251: "Signal Processing" chapter was written before DSP became mainstream&lt;/p&gt;
&lt;p&gt;page 262: section on BPM is rather short --&amp;gt; study ref 61: K. Wittenburg "Beam
Loss Detection"&lt;/p&gt;
&lt;p&gt;page 271: wire scan is not mentioned?&lt;/p&gt;
&lt;h1&gt;Chapter 11&lt;/h1&gt;
&lt;p&gt;page 297: successive alignment steps: non-converging, circling around 0 in
N-dim space&lt;/p&gt;
&lt;p&gt;page 301: "Cultural noise at DESY" :D&lt;/p&gt;
&lt;h1&gt;Chapter 12&lt;/h1&gt;
&lt;p&gt;page 306: "resonate for a long time" - wake field decay --&amp;gt; check at FLASH&lt;/p&gt;
&lt;p&gt;page 308: slightly of topic: we are dealing with 18 orders of magnitude&lt;/p&gt;
&lt;p&gt;page 312: wake functions = causal functions; here i do not understand enough
physics, isn't the field also present in front of the bunch? or is this
valid only for ultra-relativistic bunches?&lt;/p&gt;
&lt;h1&gt;Chapter 13&lt;/h1&gt;
&lt;p&gt;page 346: three types of motion, three different time scales: longitudinal osc,
transversal osc and closed orbit errors&lt;/p&gt;
&lt;p&gt;page 349: Fig 13.2 (SSRL) --&amp;gt; feedback too slow to suppres 60 Hz and harmonics&lt;/p&gt;
&lt;p&gt;page 351: phBPM: gap few times RMS of the beam&lt;/p&gt;
&lt;p&gt;page 353: a mention of feedback simulation, no references given&lt;/p&gt;
&lt;p&gt;page 358: Z transform: http://techteach.no/publications/discretetime_signals_systems/discrete.pdf&lt;/p&gt;
&lt;p&gt;page 358: "beyond the Nyquist freq" --&amp;gt; not entirely true, undersampling is
possible&lt;/p&gt;
&lt;p&gt;page 362: MIMO, check:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;ref 19&lt;/li&gt;
&lt;li&gt;ref 20&lt;/li&gt;
&lt;li&gt;ref 21&lt;/li&gt;
&lt;li&gt;ref 22&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Chapter 14 and 15&lt;/h1&gt;
&lt;p&gt;no relevant notes&lt;/p&gt;
&lt;h1&gt;Chapter 16&lt;/h1&gt;
&lt;p&gt;page 432: "safety is a part of doing things"&lt;/p&gt;
&lt;p&gt;page 435: general observation: unreliable safety feature (i.e. interlock) will
increase the danger&lt;/p&gt;
&lt;p&gt;page 440: "The OPCOs have [...] the authority to stop any activity where safety
[...] is in question" - everybody has (or should have) the Stop Work Authority&lt;/p&gt;
&lt;p&gt;page 448: tungsten --&amp;gt; impossible to melt with beam&lt;/p&gt;
&lt;p&gt;page 456: interlock testing: each input --&amp;gt; response&lt;/p&gt;
&lt;p&gt;page 457: for PLCs the standard for functional safety (IEC 61508) should be
mentioned. The standard was first published in 1998, while the book is from
1994.&lt;/p&gt;
&lt;script type="text/javascript"&gt;if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
    var align = "center",
        indent = "0em",
        linebreak = "false";

    if (false) {
        align = (screen.width &lt; 768) ? "left" : align;
        indent = (screen.width &lt; 768) ? "0em" : indent;
        linebreak = (screen.width &lt; 768) ? 'true' : linebreak;
    }

    var mathjaxscript = document.createElement('script');
    mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
    mathjaxscript.type = 'text/javascript';
    mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML';

    var configscript = document.createElement('script');
    configscript.type = 'text/x-mathjax-config';
    configscript[(window.opera ? "innerHTML" : "text")] =
        "MathJax.Hub.Config({" +
        "    config: ['MMLorHTML.js']," +
        "    TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'none' } }," +
        "    jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
        "    extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
        "    displayAlign: '"+ align +"'," +
        "    displayIndent: '"+ indent +"'," +
        "    showMathMenu: true," +
        "    messageStyle: 'normal'," +
        "    tex2jax: { " +
        "        inlineMath: [ ['\\\\(','\\\\)'] ], " +
        "        displayMath: [ ['$$','$$'] ]," +
        "        processEscapes: true," +
        "        preview: 'TeX'," +
        "    }, " +
        "    'HTML-CSS': { " +
        "        availableFonts: ['STIX', 'TeX']," +
        "        preferredFont: 'STIX'," +
        "        styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
        "        linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
        "    }, " +
        "}); " +
        "if ('default' !== 'default') {" +
            "MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
                "var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
                "VARIANT['normal'].fonts.unshift('MathJax_default');" +
                "VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
                "VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
                "VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
            "});" +
            "MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
                "var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
                "VARIANT['normal'].fonts.unshift('MathJax_default');" +
                "VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
                "VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
                "VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
            "});" +
        "}";

    (document.body || document.getElementsByTagName('head')[0]).appendChild(configscript);
    (document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
&lt;/script&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="misc"></category><category term="Books"></category></entry><entry><title>New Ubuntu, old problems with ModelSim</title><link href="www.j-marjanovic.io/new-ubuntu-old-problems-with-modelsim.html" rel="alternate"></link><published>2019-04-20T08:40:00+02:00</published><updated>2019-04-20T08:40:00+02:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2019-04-20:www.j-marjanovic.io/new-ubuntu-old-problems-with-modelsim.html</id><summary type="html">&lt;p&gt;A couple of days ago a new release of Ubuntu, Ubuntu 19.04 Disco Dingo was
released. On my personal laptop I follow the non-LTS line, which brings
me cool features and updated programs (e.g. Python 3.7, GCC 8.3, ...)
out-of-the-box.&lt;/p&gt;
&lt;p&gt;Unfortunately, because the libraries are updated, the …&lt;/p&gt;</summary><content type="html">&lt;p&gt;A couple of days ago a new release of Ubuntu, Ubuntu 19.04 Disco Dingo was
released. On my personal laptop I follow the non-LTS line, which brings
me cool features and updated programs (e.g. Python 3.7, GCC 8.3, ...)
out-of-the-box.&lt;/p&gt;
&lt;p&gt;Unfortunately, because the libraries are updated, the update process causes
some programs to become broken. Mentor Graphics ModelSim is for example
one of the tools which required some tweaks to make it work on Ubuntu 19.04.&lt;/p&gt;
&lt;p&gt;Described here are the steps which made ModelSim to work on Ubuntu 19.04.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Please note: according to &lt;a href="https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/manual/quartus_install.pdf"&gt;Intel® FPGA Software Installation and
Licensing&lt;/a&gt;,
ModelSim - Intel FPGA Edition officially supports RHEL 5, 6 or 7 and Windows.
Ubuntu is officially not supported.&lt;/strong&gt;&lt;/p&gt;
&lt;h1&gt;Initial attempt&lt;/h1&gt;
&lt;p&gt;I have started with a fresh installation of ModelSim*-Intel® FPGA Starter
Edition Software from Quartus 19.1 package.&lt;/p&gt;
&lt;p&gt;When running &lt;code&gt;vsim&lt;/code&gt; from &lt;code&gt;intelFPGA/19.1/modelsim_ase/bin&lt;/code&gt; I get the
following error message:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$ ./vsim
Error: cannot find &lt;span class="s2"&gt;&amp;quot;./../linux_rh60/vsim&amp;quot;&lt;/span&gt;
$
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;From the path it is clear that ModelSim thinks it is running on RHEL 6. As
described in an extensive &lt;a href="https://wiki.archlinux.org/index.php/Altera_Design_Software#ModelSim-Altera_Edition"&gt;Wiki entry on Altera
software on Arch Linux Wiki&lt;/a&gt;,
one needs to modify &lt;code&gt;vco&lt;/code&gt; file and downgrade &lt;code&gt;freetype&lt;/code&gt; library.&lt;/p&gt;
&lt;p&gt;Once this is settles (by the way, this used to be enough to make ModelSim work on
Ubuntu 18.10) we get the following error:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$ ./vsim
Reading pref.tcl
./../linuxaloem/vish: symbol lookup error: /usr/lib/i386-linux-gnu/libfontconfig.so.1: undefined symbol: FT_Done_MM_Var
** Fatal: Read failure in vlm process &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;,0&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;This message is new and it required me to do some investigation to get it fixed.&lt;/p&gt;
&lt;h1&gt;Downgrading fontconfig&lt;/h1&gt;
&lt;p&gt;From the error message it is clear that &lt;code&gt;libfontconfig.so&lt;/code&gt; tries to use function
called &lt;code&gt;FT_Done_MM_Var&lt;/code&gt; and is unable to find it.&lt;/p&gt;
&lt;p&gt;To investigate further I cloned &lt;code&gt;fontconfig&lt;/code&gt; source code from:
https://gitlab.freedesktop.org/fontconfig/fontconfig.git&lt;/p&gt;
&lt;p&gt;A quick &lt;code&gt;grep&lt;/code&gt; finds the following instances of the symbol in question:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$ grep -rn FT_Done_MM_Var .
./README:119:      Use FT_Done_MM_Var &lt;span class="k"&gt;if&lt;/span&gt; available
./src/fcfreetype.c:2261:    FT_Done_MM_Var &lt;span class="o"&gt;(&lt;/span&gt;ftLibrary, mm_var&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
./configure.ac:321:AC_CHECK_FUNCS&lt;span class="o"&gt;(&lt;/span&gt;FT_Get_BDF_Property FT_Get_PS_Font_Info FT_Has_PS_Glyph_Names FT_Get_X11_Font_Format FT_Done_MM_Var&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;One &lt;code&gt;git blame&lt;/code&gt; after we find the following commit which introduced this function:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;commit&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;94683&lt;/span&gt;&lt;span class="n"&gt;a1255c065a7f8e7fadee9de605f3eaf9ac7&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="nl"&gt;Author&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Behdad&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Esfahbod&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;behdad&lt;/span&gt;&lt;span class="nv"&gt;@behdad&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;org&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;Mon&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Jan&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;09&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;55&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;41&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2018&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;0000&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;Use&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;FT_Done_MM_Var&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;available&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;And then we can find out that release &lt;strong&gt;2.12.92&lt;/strong&gt; is the last one which does
not contain this change.&lt;/p&gt;
&lt;p&gt;I checked out the code from release &lt;strong&gt;2.12.92&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;git checkout -b 2.12.92 2.12.92&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Installed the libraries needed:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;sudo apt install libxml2-dev:i386 uuid-dev:i386&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;And used the following commands to compile and install an older version
of &lt;code&gt;fontconfig&lt;/code&gt; library:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;aclocal -I m4&lt;/span&gt;
&lt;span class="err"&gt;autoconf&lt;/span&gt;
&lt;span class="err"&gt;libtoolize&lt;/span&gt;
&lt;span class="err"&gt;./autogen.sh&lt;/span&gt;
&lt;span class="err"&gt;CFLAGS=-m32 LDFLAGS=-L/home/jan/local/packages/freetype-2.4.7-32bit/lib ./configure --prefix=/home/jan/local/packages/fontconfig-2.12.92-32bit --enable-libxml2&lt;/span&gt;
&lt;span class="err"&gt;make&lt;/span&gt;
&lt;span class="err"&gt;make install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;h1&gt;Changes to vco&lt;/h1&gt;
&lt;p&gt;Finally, I needed to change &lt;code&gt;vco&lt;/code&gt; in &lt;code&gt;intelFPGA/19.1/modelsim_ase/bin&lt;/code&gt; folder
to load the freshly-recompiled libraries:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;$&lt;/span&gt; &lt;span class="s s-Atom"&gt;diff&lt;/span&gt; &lt;span class="s s-Atom"&gt;vco&lt;/span&gt; &lt;span class="s s-Atom"&gt;vco&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="s s-Atom"&gt;orig&lt;/span&gt;
&lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="s s-Atom"&gt;d10&lt;/span&gt;
&lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="s s-Atom"&gt;#&lt;/span&gt; &lt;span class="s s-Atom"&gt;added&lt;/span&gt; &lt;span class="s s-Atom"&gt;for&lt;/span&gt; &lt;span class="nv"&gt;Ubuntu&lt;/span&gt; &lt;span class="mf"&gt;19.04&lt;/span&gt;&lt;span class="s s-Atom"&gt;:&lt;/span&gt; &lt;span class="s s-Atom"&gt;recompiled&lt;/span&gt; &lt;span class="s s-Atom"&gt;libraries&lt;/span&gt;
&lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="s s-Atom"&gt;export&lt;/span&gt; &lt;span class="nv"&gt;LD_LIBRARY_PATH&lt;/span&gt;&lt;span class="s s-Atom"&gt;=/home&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="s s-Atom"&gt;jan&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="s s-Atom"&gt;local&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="s s-Atom"&gt;packages&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="s s-Atom"&gt;freetype&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;2.4.7&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="s s-Atom"&gt;bit&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nn"&gt;lib&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="s s-Atom"&gt;home&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="s s-Atom"&gt;jan&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="s s-Atom"&gt;local&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="s s-Atom"&gt;packages&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="s s-Atom"&gt;fontconfig&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;2.12.92&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="s s-Atom"&gt;bit&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nn"&gt;lib&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="nv"&gt;LD_LIBRARY_PATH&lt;/span&gt;
&lt;span class="o"&gt;&amp;lt;&lt;/span&gt; 
&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;17&lt;/span&gt;&lt;span class="s s-Atom"&gt;c13&lt;/span&gt;
&lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="s s-Atom"&gt;#&lt;/span&gt; &lt;span class="s s-Atom"&gt;changed&lt;/span&gt; &lt;span class="s s-Atom"&gt;for&lt;/span&gt; &lt;span class="nv"&gt;Ubuntu&lt;/span&gt; &lt;span class="mf"&gt;19.04&lt;/span&gt;&lt;span class="s s-Atom"&gt;:&lt;/span&gt; &lt;span class="s s-Atom"&gt;force&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="s s-Atom"&gt;bit&lt;/span&gt; &lt;span class="s s-Atom"&gt;mode&lt;/span&gt;
&lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="s s-Atom"&gt;mode=&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;MTI_VCO_MODE&lt;/span&gt;&lt;span class="p"&gt;:-&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;32&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="s s-Atom"&gt;---&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="s s-Atom"&gt;mode=&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;MTI_VCO_MODE&lt;/span&gt;&lt;span class="p"&gt;:-&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="mi"&gt;213&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;214&lt;/span&gt;&lt;span class="s s-Atom"&gt;d208&lt;/span&gt;
&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;           &lt;span class="s s-Atom"&gt;#&lt;/span&gt; &lt;span class="s s-Atom"&gt;added&lt;/span&gt; &lt;span class="s s-Atom"&gt;for&lt;/span&gt; &lt;span class="nv"&gt;Ubuntu&lt;/span&gt; &lt;span class="mf"&gt;19.04&lt;/span&gt;&lt;span class="s s-Atom"&gt;:&lt;/span&gt; &lt;span class="s s-Atom"&gt;if&lt;/span&gt; &lt;span class="s s-Atom"&gt;kernel&lt;/span&gt; &lt;span class="s s-Atom"&gt;version&lt;/span&gt; &lt;span class="mf"&gt;5.&lt;/span&gt;&lt;span class="s s-Atom"&gt;x&lt;/span&gt; &lt;span class="s s-Atom"&gt;then&lt;/span&gt; &lt;span class="s s-Atom"&gt;use&lt;/span&gt; &lt;span class="s s-Atom"&gt;linuxaloem&lt;/span&gt;
&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;           &lt;span class="mf"&gt;5.&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="s s-Atom"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;         &lt;span class="s s-Atom"&gt;vco=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;linuxaloem&amp;quot;&lt;/span&gt; &lt;span class="p"&gt;;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;h1&gt;Conclusion&lt;/h1&gt;
&lt;p&gt;Ta-da, ModelSim now works on Ubuntu 19.04. The font styles are a little bit
broken, but being humble is a good characteristic, and we won't ask too much.&lt;/p&gt;
&lt;p style="width:80%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;img alt="ModelSim up and running" src="www.j-marjanovic.io/images/modelsim_on_ubuntu19-04.png"&gt;&lt;/p&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="misc"></category><category term="FPGA"></category><category term="ModelSim"></category></entry><entry><title>Notes from FOSDEM 2019</title><link href="www.j-marjanovic.io/notes-from-fosdem-2019.html" rel="alternate"></link><published>2019-02-03T20:10:00+01:00</published><updated>2019-02-03T20:10:00+01:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2019-02-03:www.j-marjanovic.io/notes-from-fosdem-2019.html</id><summary type="html">&lt;p&gt;I attended FOSDEM 2019 in Brussels:&lt;/p&gt;
&lt;p style="width:40%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;img alt="Brussels" src="www.j-marjanovic.io/images/fosdem2019.jpg"&gt;&lt;/p&gt;
&lt;p&gt;and these are my notes from Quantum Computers, CAD and Open Hardware
and Python tracks:&lt;/p&gt;
&lt;h1&gt;Quantum Computing&lt;/h1&gt;
&lt;h2&gt;Delivering Practical Quantum Computing on the D-Wave System&lt;/h2&gt;
&lt;h3&gt;Intro&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;marketing slide for D-Wave Leap&lt;/li&gt;
&lt;li&gt;practical = Adiabatic Quantum computer&lt;/li&gt;
&lt;li&gt;Riggeti uses gate model instead&lt;/li&gt;
&lt;li&gt;physical impl: 3 m …&lt;/li&gt;&lt;/ul&gt;</summary><content type="html">&lt;p&gt;I attended FOSDEM 2019 in Brussels:&lt;/p&gt;
&lt;p style="width:40%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;img alt="Brussels" src="www.j-marjanovic.io/images/fosdem2019.jpg"&gt;&lt;/p&gt;
&lt;p&gt;and these are my notes from Quantum Computers, CAD and Open Hardware
and Python tracks:&lt;/p&gt;
&lt;h1&gt;Quantum Computing&lt;/h1&gt;
&lt;h2&gt;Delivering Practical Quantum Computing on the D-Wave System&lt;/h2&gt;
&lt;h3&gt;Intro&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;marketing slide for D-Wave Leap&lt;/li&gt;
&lt;li&gt;practical = Adiabatic Quantum computer&lt;/li&gt;
&lt;li&gt;Riggeti uses gate model instead&lt;/li&gt;
&lt;li&gt;physical impl: 3 m high box + 3 normal racks&lt;/li&gt;
&lt;li&gt;QPU (Quantum Processor Unit): 16x16 grid&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;theory&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;language&lt;ul&gt;
&lt;li&gt;qubit&lt;/li&gt;
&lt;li&gt;coupler (either both in same direction or opposite)&lt;/li&gt;
&lt;li&gt;weights (? initial state)&lt;/li&gt;
&lt;li&gt;strength (for couplers)&lt;/li&gt;
&lt;li&gt;objective (function which gets minimized)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;BQM (?)&lt;/li&gt;
&lt;li&gt;5 us annealing&lt;/li&gt;
&lt;li&gt;problems: noise&lt;/li&gt;
&lt;li&gt;noise can bring you in more than one solution (might be useful for some problems)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Markets&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;portfolio&lt;/li&gt;
&lt;li&gt;internet ad&lt;/li&gt;
&lt;li&gt;high-energy physics&lt;/li&gt;
&lt;li&gt;image recognition&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Q &amp;amp; A&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;clock speed -&amp;gt; no clock speed&lt;/li&gt;
&lt;li&gt;total computation time -&amp;gt; 5 us to 1 s&lt;/li&gt;
&lt;li&gt;Pegasus: proposed arch for a new machine&lt;/li&gt;
&lt;li&gt;new Hamiltonioan in Pegasus&lt;/li&gt;
&lt;li&gt;error corrections: from 2048 bits, no error corrections because of what
  they calculate&lt;/li&gt;
&lt;li&gt;classical solutions vs quantum solutions: wall-clock time is the benchmark&lt;/li&gt;
&lt;li&gt;hello world: &lt;/li&gt;
&lt;li&gt;D-Wave: not a threat for a normal crypto (factoring a number is not a good problem)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;D-Wave's Software Development Kit&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;tools and utilities for QC development&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;motivation&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;solution space is "smooth": good solutions are "grouped together"&lt;/li&gt;
&lt;li&gt;step 1: problem as polynomial&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;equation&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;b-terms: linear bias&lt;/li&gt;
&lt;li&gt;a-terms: quadratic bias&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Quantum Machine Instr: biases in certain range: because of physical limitaiton?
* Chimera graph and Pegasus graph&lt;/p&gt;
&lt;h3&gt;Ocean software&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Python front-end, C++ for high performance&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;mapping methods&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;samplers&lt;/li&gt;
&lt;li&gt;compute resources&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;dimod&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;API for samplers&lt;/li&gt;
&lt;li&gt;BQM&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;cloud client&lt;/h4&gt;
&lt;h4&gt;minorminer&lt;/h4&gt;
&lt;h4&gt;dwavebinaryscp&lt;/h4&gt;
&lt;p&gt;constraint satisfaction&lt;/p&gt;
&lt;h4&gt;dwave-networkx&lt;/h4&gt;
&lt;p&gt;graph theory problem,
same API as networkx&lt;/p&gt;
&lt;h3&gt;Steps&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;translate to binary&lt;/li&gt;
&lt;li&gt;define BQM function&lt;/li&gt;
&lt;li&gt;BQM to matrix form&lt;/li&gt;
&lt;li&gt;BQM through sampler&lt;/li&gt;
&lt;li&gt;post-processing and interpretation&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;pip install dwave-ocean-sdk&lt;/code&gt;&lt;/p&gt;
&lt;h3&gt;Q &amp;amp; A&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;resolution: 0.01 resolution&lt;/li&gt;
&lt;li&gt;Fujitsu and Hitachi: another providers&lt;/li&gt;
&lt;li&gt;8 couplers per qbit&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;D-Wave Hybrid Framework&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;decomposition -&amp;gt; split the problem to fit into BQM&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;github.com/dwavesystems/dwave-hybrid&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;solver/sampler framework&lt;/li&gt;
&lt;li&gt;uses both quantum and classical resources&lt;/li&gt;
&lt;li&gt;dataflow paradigm&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;What is IBMQ&lt;/h2&gt;
&lt;h3&gt;quantum algorithms&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;current algorithms --&amp;gt; quantum algorithms&lt;/li&gt;
&lt;li&gt;supra-polynomial speed-up&lt;/li&gt;
&lt;li&gt;Schor's algorithm: polynomial time for factoring of the numbers&lt;/li&gt;
&lt;li&gt;simulating quantum mechanics (Hamiltonian equations); for chemistry&lt;/li&gt;
&lt;li&gt;factoring 1024-bit number: hours with QC&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;how to program a QC: mapping interference pattern on qbits&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;entanglement -&amp;gt; consistent quantum system -&amp;gt; colapse&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;every quantum program: circuit (no feedback!)&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;result is non deterministic (robust algorithms provide good results)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;quantum volume (metric used at IBM)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;coherence time: 100s of us&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;technology is quantum ready&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;quantum technologies&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;superconductive Josephson junction&lt;/li&gt;
&lt;li&gt;entaglion (only hypothetical?)&lt;/li&gt;
&lt;li&gt;QASM (quantum assembly language)&lt;/li&gt;
&lt;li&gt;5 GHz, 240 mK, low noise&lt;/li&gt;
&lt;li&gt;all QC look similar: only way to do it&lt;/li&gt;
&lt;li&gt;current state: oscilloscopes, signal generators, ...&lt;/li&gt;
&lt;li&gt;pizza box for controlling the QC in the future&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;quantum advantage&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;IBM provides a lot of stuff on their GitHub&lt;/li&gt;
&lt;li&gt;Jupyter notebooks, vscode plugin&lt;/li&gt;
&lt;li&gt;Quiskit Acua: example problem: bonding energy for a molecule &lt;/li&gt;
&lt;li&gt;Quiskit Aer: ...&lt;/li&gt;
&lt;li&gt;publicly available QC&lt;/li&gt;
&lt;li&gt;gate error and readout error are publicly available (~1e-3 for the example)&lt;/li&gt;
&lt;li&gt;IBM has several QC architectures (Tokyo, Melburne)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Q &amp;amp; A&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;noise problems for QC, do qbits produce noise -&amp;gt; engineering will find a solution&lt;/li&gt;
&lt;li&gt;quantum volue of Tokyo -&amp;gt; "the best, it is not just the number of qbits"&lt;/li&gt;
&lt;li&gt;Quiskit Aqua: chemistry API&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;CAD and Open Hardware&lt;/h1&gt;
&lt;h2&gt;gnucap&lt;/h2&gt;
&lt;h3&gt;intro&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;mixed-signal simulator&lt;/li&gt;
&lt;li&gt;prototype for Verilog-AMS (~10 years ago)&lt;/li&gt;
&lt;li&gt;analog circuits and digital circuits simulators are different (transient vs event-based)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;analog circuit simulation&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;node equations --&amp;gt; matrix form&lt;/li&gt;
&lt;li&gt;often: differential equations, Newton iteration&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;digital circuit simulation&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;event based, with evaluation queue&lt;/li&gt;
&lt;li&gt;can be used&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Gnucap&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Gnucap decompose the circuit matrix into L and U&lt;/li&gt;
&lt;li&gt;Gnucap keeps track of the changes to the matrix, schedules an update to
  the circuit matrix &lt;/li&gt;
&lt;li&gt;bypass = not computing something&lt;/li&gt;
&lt;li&gt;Gnucap uses all the tricks to calculate inverse of the matrix (pivoting)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;architecture of Gnucap&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;the concept (matrix solving + event-based updates) is tighly integrated
  in the codebase&lt;/li&gt;
&lt;li&gt;plugin infrasctructure: modeling languages (VHDL, Verilog-AMS, SystemC considered)&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;shared library for basic s&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;compoenents are plugins (dlopen)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;components&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;commands&lt;/li&gt;
&lt;li&gt;algorithms &lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;plugins&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Turing complete&lt;/li&gt;
&lt;li&gt;examples of plugins&lt;/li&gt;
&lt;li&gt;import module (python)&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;insmod module (linux)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;gnucap-python&lt;/code&gt;, e.g. Jupyter, user can access internal data, use Scipy, ...&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;Verilog-A in QUCS/gnucsator&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;license for models&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Gnucap supports models from other sources&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;two types:&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;distributed as source code: --&amp;gt; just log it into the Gnucap (no issues)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;distributed as binary: --&amp;gt; wrapper + blob&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;summary&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;mixed-mode is faster&lt;/li&gt;
&lt;li&gt;more front-end work needed&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;ngspice&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;talk is not about the details, but about the framework, user interface and future&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;intro&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;input: standard SPICE text inpu&lt;/li&gt;
&lt;li&gt;output: transient simulator &lt;/li&gt;
&lt;li&gt;
&lt;p&gt;successor of spice3f5 from Berkley&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;three flavors:&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;standard executable: CLI, file and graphics output, control language&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;shared library for tcl/tk (not used so much)&lt;/li&gt;
&lt;li&gt;C shared library (so/dll): input and output via callbacks&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;scripting language&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;its own library (developer don't like python)&lt;/li&gt;
&lt;li&gt;94 commands, math functions, loops, ...&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;device models&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;hard-coded models (BJS, MOS, JFET, xFET, trans lines, Verilog A interface via adms)&lt;/li&gt;
&lt;li&gt;B source with build-in function&lt;/li&gt;
&lt;li&gt;XSPICE shared library (written in C, both analog and digital)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;application areas&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;PCB design&lt;/li&gt;
&lt;li&gt;mix of ICs and discrete components&lt;/li&gt;
&lt;li&gt;requires a comfortable user interface (offered by 3rd parties - e.g. KiCAD)&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;PSPICE and LTSPICE model requirements&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;IC design&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;models from the foundries (very reliable but complex)&lt;/li&gt;
&lt;li&gt;supports HSPICE&lt;/li&gt;
&lt;li&gt;MOS models, large circuits, certain speed&lt;/li&gt;
&lt;li&gt;integration with other tools ongoing&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;mixed-signal capabilities&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;from XSPICE&lt;/li&gt;
&lt;li&gt;digital: event based, signal strengths and delays&lt;/li&gt;
&lt;li&gt;analog: C coded models, time and freq domain&lt;/li&gt;
&lt;li&gt;simple example: digital is 50x faster than analog&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;experimental developments&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;KLU solver: 2x, 3x faster&lt;/li&gt;
&lt;li&gt;CUDA for GPU: development on-going&lt;/li&gt;
&lt;li&gt;Cider: 1D and 2D TCAD: device structure, solve physics equations&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;licenses&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;core: BSD, LGPL, ... no issues&lt;/li&gt;
&lt;li&gt;Verilog A models: more complicated&lt;/li&gt;
&lt;li&gt;vendor devices: can be used, but not distributed&lt;/li&gt;
&lt;li&gt;IC model data: PDKs are under NDA (also encryption)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;future&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;unicode&lt;/li&gt;
&lt;li&gt;some commands (pz, ...)&lt;/li&gt;
&lt;li&gt;integration with other tools and flows&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;openEMS - An Introduction and Overview&lt;/h2&gt;
&lt;h3&gt;intro&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;FOSS solver for electromagnetics fields&lt;/li&gt;
&lt;li&gt;simulate and evaluate RF and optical devices&lt;/li&gt;
&lt;li&gt;uses FDTD (finite differences in time domain)&lt;/li&gt;
&lt;li&gt;co-ordinate systems: cylindrial and cartesian&lt;/li&gt;
&lt;li&gt;lumped elements available&lt;/li&gt;
&lt;li&gt;human body models&lt;/li&gt;
&lt;li&gt;dispersive models&lt;/li&gt;
&lt;li&gt;support for remote simulation (cluster)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;show cases&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;notch filter, very nice demo&lt;/li&gt;
&lt;li&gt;examples: helical antena, antenna array, MRI antenna design (loop coils)&lt;/li&gt;
&lt;li&gt;small size PCB antenna&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;interfacing&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;nice to have: interface to PCB editors&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;problem: link (between EMS and PCB editors) is very week&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;some examples:&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;hyp2mat&lt;/li&gt;
&lt;li&gt;pcb-rnd&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;pcbmodelgen (KiCAD to openEMS)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;ultimate goal: Circuit simulation &amp;lt;--&amp;gt; PCB design &amp;lt;--&amp;gt; RF simulation&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;status&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;openEMS is a mature EM simulation package&lt;/li&gt;
&lt;li&gt;TODO list: improve the documentation, interface to tools, ...&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Project Trellis and nextpnr&lt;/h2&gt;
&lt;h3&gt;ECP5&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;85k logic cells (4 LUTS, FF, carry), block RAM, 18x18 DSPs, SERDES (up to 5 GTs)&lt;/li&gt;
&lt;li&gt;split into tiles, tiles split into slices&lt;/li&gt;
&lt;li&gt;fixed wires&lt;/li&gt;
&lt;li&gt;arcs and pip&lt;/li&gt;
&lt;li&gt;all arcs and wires are undirectional - mux topology&lt;/li&gt;
&lt;li&gt;dedicated clock network&lt;/li&gt;
&lt;li&gt;programmable interconnect: pass gates (cascade of 2 mux)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;status&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;bit and routing done&lt;/li&gt;
&lt;li&gt;missing: DSP&lt;/li&gt;
&lt;li&gt;timing documentation for fabric, logic cells, RAM, ...&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;text configuration format&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;tools to convert from and to bitstream&lt;/li&gt;
&lt;li&gt;intermediate format for place &amp;amp; route&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;timing&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;not enought vendor support&lt;/li&gt;
&lt;li&gt;delays for the cells extracted from SDF files&lt;/li&gt;
&lt;li&gt;routing delay obtained using least-squares from reports for entire net&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;workflow&lt;/h3&gt;
&lt;h4&gt;yosys&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;support ECP5, iCE40, Xilinx, ...&lt;/li&gt;
&lt;li&gt;uses Berkley ABC for logic optimization&lt;/li&gt;
&lt;li&gt;formal equivalence checking, assertions&lt;/li&gt;
&lt;li&gt;....&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;nextpnr&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;replacement for arachnepnr&lt;/li&gt;
&lt;li&gt;developments from May 2018&lt;/li&gt;
&lt;li&gt;timing-driven&lt;/li&gt;
&lt;li&gt;architecture implements an API: useful for different architectures&lt;/li&gt;
&lt;li&gt;each arch has its own binary: a lot of optimization possible&lt;/li&gt;
&lt;li&gt;7-series is VERY experimental (more work planned)&lt;/li&gt;
&lt;li&gt;first implementation:&lt;/li&gt;
&lt;li&gt;SA placer&lt;/li&gt;
&lt;li&gt;A*+ripup router&lt;/li&gt;
&lt;li&gt;future&lt;/li&gt;
&lt;li&gt;analyty placer&lt;/li&gt;
&lt;li&gt;SAT-based placer,&lt;/li&gt;
&lt;li&gt;...&lt;/li&gt;
&lt;li&gt;nice graphical interface&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Design Automation in Wonderland&lt;/h2&gt;
&lt;h3&gt;intro&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;motivation and goals&lt;/li&gt;
&lt;li&gt;reuse common functionality&lt;/li&gt;
&lt;li&gt;easy to integrate, easy to adapt libraries&lt;/li&gt;
&lt;li&gt;a set of modular libraries&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;based on Berkley ABC&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;use C++14 or C++17&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;header-only&lt;/li&gt;
&lt;li&gt;well documented, well tested&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;libraries&lt;/h3&gt;
&lt;h4&gt;lorina: parsing library&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;can pasrse very simple Verilog&lt;/li&gt;
&lt;li&gt;parser reads Verilog and provides data to mockturtle&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;mockturtle: logic network library&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;network interface API&lt;/li&gt;
&lt;li&gt;logic synth, opt, technology mapping&lt;/li&gt;
&lt;li&gt;impelementations: and-inverted, kLUT, ...&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;performance tweeks&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;cut the combinatorial network into LUTs, based on cost function (speed/area)&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;kitty: truth table&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;manipulation of truth table&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;percy: exact synthesis library&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;re-synthesis&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;conclusion&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;exctract the logic function&lt;/li&gt;
&lt;li&gt;optimization&lt;/li&gt;
&lt;li&gt;mapping to tech&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;github.com/lsils&lt;/p&gt;
&lt;h2&gt;Open source virtual prototyping for faster hardware and software co-design&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;virtual prototyping &lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;current development&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;idea&lt;/li&gt;
&lt;li&gt;SW and HW developed in parallel&lt;/li&gt;
&lt;li&gt;integration&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;virtual prototype&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;idea&lt;/li&gt;
&lt;li&gt;virtual prototype --&amp;gt; SW can be developed in parallel&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;virtual prototyping: SW environment simulating the HW&lt;/p&gt;
&lt;h3&gt;example&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;model entire SoC (RPi: quad core, peripheral)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;issues:&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;models (of the IP) are hard to find&lt;/li&gt;
&lt;li&gt;too much components --&amp;gt; needs to be done: shared effort&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;?&lt;/h3&gt;
&lt;p&gt;"interoperability is the key" --&amp;gt; take advantage of the community&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;toolchain: marketplace for components, GUI, ...&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Q &amp;amp; A&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;interface with SystemC and TLM: support for TLM is there&lt;/li&gt;
&lt;li&gt;modules: how to verify the model: ?&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Lesson learned from Retro-uC and search for ideal HDL for open source silicon&lt;/h2&gt;
&lt;h3&gt;intro&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;idea: open source microcontroller (Z80, MOS 65 and 68000, 3 uC in one chip)&lt;/li&gt;
&lt;li&gt;
&lt;ul&gt;
&lt;li&gt;a development board&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;"VHDL and Verilog are not the right tools for the job"&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;RTL faults&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;clock is logic signal&lt;/li&gt;
&lt;li&gt;if --&amp;gt; mux or flip-flop&lt;/li&gt;
&lt;li&gt;synth vs non-synth&lt;/li&gt;
&lt;li&gt;Verilog: block and non-blocking&lt;/li&gt;
&lt;li&gt;FPGA vs ASIC&lt;/li&gt;
&lt;li&gt;RTFLRM&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;improvements&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;signed&lt;/li&gt;
&lt;li&gt;process(all)&lt;/li&gt;
&lt;li&gt;generate&lt;/li&gt;
&lt;li&gt;...&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;"putting a lipstick on a pig"&lt;/p&gt;
&lt;h3&gt;new developments&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;TL-Verilog&lt;/li&gt;
&lt;li&gt;SystemC/TLM&lt;/li&gt;
&lt;li&gt;"good tools are proprierty" (Vivado HLS, Catapult)&lt;/li&gt;
&lt;li&gt;Panda Bamboo&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;GAUT (gaut.fr)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;MyHDL&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;Chisel/SpinalHDL --&amp;gt; "going in the right direction" --&amp;gt; first need to learn Scala&lt;/li&gt;
&lt;li&gt;Migen/MiSoc/nmigen --&amp;gt; prefered by the speaker&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Python&lt;/h1&gt;
&lt;h2&gt;CPython Memory Management&lt;/h2&gt;
&lt;h3&gt;motivation&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;what user needs to know&lt;/li&gt;
&lt;li&gt;learn how to control (gc, sys.getrefcount)&lt;/li&gt;
&lt;li&gt;memory leaks &lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;allocation of memory&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;CPython has PyObject for everything&lt;/li&gt;
&lt;li&gt;size: obj size &amp;lt; 512 bytes --&amp;gt; small, ele big&lt;/li&gt;
&lt;li&gt;big objects: system allocator&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;small object: 3 levels, pools and arena&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;8-byte alignment --&amp;gt; size idx: size / 8 - 1&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;pools&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;4k size, objects of same size&lt;/li&gt;
&lt;li&gt;blocks &lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;arenas&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;encapsulate pools&lt;/li&gt;
&lt;li&gt;containts 64 pools&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;object specificts&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;string interning (simple string)&lt;/li&gt;
&lt;li&gt;small integers (-5 to 256)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;garbage collection&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;reference counting&lt;/li&gt;
&lt;li&gt;easy to find unused obj&lt;/li&gt;
&lt;li&gt;no marking&lt;/li&gt;
&lt;li&gt;memory overhead&lt;/li&gt;
&lt;li&gt;no cyclical references&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;tools in python&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;two modules: gc and tracemalloc&lt;/li&gt;
&lt;li&gt;plus: sys._debugmallocstats()&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;That's all folks, 'till next year!&lt;/em&gt;&lt;/p&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="Conferences"></category><category term="Quantum computers"></category><category term="Open Hardware"></category><category term="Chisel"></category></entry><entry><title>Chisel tester with overridden step() method</title><link href="www.j-marjanovic.io/chisel-tester-with-overridden-step-method.html" rel="alternate"></link><published>2018-10-14T21:00:00+02:00</published><updated>2018-10-14T21:00:00+02:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2018-10-14:www.j-marjanovic.io/chisel-tester-with-overridden-step-method.html</id><summary type="html">&lt;h1&gt;Introduction&lt;/h1&gt;
&lt;p&gt;&lt;a href="https://chisel.eecs.berkeley.edu/"&gt;Chisel&lt;/a&gt; is a modern take on Hardware
Description Languages, such as (System)Verilog and VHDL. Both Verilog and VHDL
were conceived in 80s, and are currently still the main two options when it
comes to describing hardware. From the Developer Experience point-of-view, I
would say that both languages are …&lt;/p&gt;</summary><content type="html">&lt;h1&gt;Introduction&lt;/h1&gt;
&lt;p&gt;&lt;a href="https://chisel.eecs.berkeley.edu/"&gt;Chisel&lt;/a&gt; is a modern take on Hardware
Description Languages, such as (System)Verilog and VHDL. Both Verilog and VHDL
were conceived in 80s, and are currently still the main two options when it
comes to describing hardware. From the Developer Experience point-of-view, I
would say that both languages are kind of OK once one gets used to them.&lt;/p&gt;
&lt;h1&gt;Short comparison to VHDL and Verilog&lt;/h1&gt;
&lt;p&gt;Obviously there are still areas where this two languages could be improved. That
is why I have started to experiment with Chisel in my free time. The modules
written in Chisel are shorter and thus more readable. &lt;/p&gt;
&lt;h2&gt;Verbosity&lt;/h2&gt;
&lt;p&gt;Having one implicit clock domain is (in most cases) great, and everything is
then clocked from this clock. This saves a lot of typing compared to the
Verilog:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;always&lt;/span&gt; &lt;span class="p"&gt;@(&lt;/span&gt;&lt;span class="k"&gt;posedge&lt;/span&gt; &lt;span class="n"&gt;clk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;begin&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;proc_smth&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;reset&lt;/span&gt; &lt;span class="k"&gt;begin&lt;/span&gt;
    &lt;span class="c1"&gt;// reset logic&lt;/span&gt;
   &lt;span class="k"&gt;end&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;begin&lt;/span&gt;
    &lt;span class="c1"&gt;// here comes the real useful stuff&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;and VHDL:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="nc"&gt;proc_smth&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="k"&gt;process&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;begin&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;rising_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;reset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sc"&gt;&amp;#39;1&amp;#39;&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt;
      &lt;span class="c1"&gt;-- reset logic&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;
      &lt;span class="c1"&gt;-- here comes the real useful stuff&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt; &lt;span class="k"&gt;process&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;I would argue that half of the lines in a typical VHDL modules are not needed,
as demonstrated in previous example. A typical module for me would be some DSP
or protocol processing module, operating in a single clock domain. For special
cases, where precise control of clocks is needed, such as in ADC interface with
ISERDES, one can still write the "sensitive" parts in classic HDL.&lt;/p&gt;
&lt;h2&gt;Development tools&lt;/h2&gt;
&lt;p&gt;Other advantage of Chisel is: one can use &lt;a href="https://www.jetbrains.com/idea/download/"&gt;IntelliJ IDEA Community
Edition&lt;/a&gt; to write code. Compared even
to the best VHDL/Verilog IDEs, e.g. &lt;a href="https://www.sigasi.com/"&gt;Sigasi&lt;/a&gt;, IntelliJ
is light-years ahead when it comes to refactoring, autocompletion, integration
with Git and countless little helpers.&lt;/p&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Chisel is based on Scala, and for hardware generation and testing this is a
significant advantage. Chisel provides &lt;code&gt;ChiselFlatSpec&lt;/code&gt; which is based on
&lt;code&gt;FlatSpec&lt;/code&gt; and allows declaring specifications (in style of &lt;code&gt;"Module" should "do
something"&lt;/code&gt;) which are then evaluated.&lt;/p&gt;
&lt;p&gt;One area where Chisel is seriously lacking compared to VHDL and Verilog are
the implementation of the testbenches (or testers in Chisel-speak). In Verilog
and VHDL one can write testbench in a same language with the same constructs
as "Device" Under Test. In Chisel, synthesizable logic is written in Chisel,
while testbenches are written in Scala.&lt;/p&gt;
&lt;h1&gt;Better testbenches&lt;/h1&gt;
&lt;p&gt;If we cannot write the testbenches in same language as logic, let's explore
other options. Chisel itself provides multiple
&lt;a href="https://github.com/freechipsproject/chisel-testers"&gt;testers&lt;/a&gt;, such as
&lt;code&gt;PeekPokeTester&lt;/code&gt;, &lt;code&gt;SteppedHWIOTester&lt;/code&gt; and &lt;code&gt;OrderedDecoupledHWIOTester&lt;/code&gt;. In my
opinion, &lt;code&gt;OrderedDecoupledHWIOTester&lt;/code&gt; and &lt;code&gt;SteppedHWIOTester&lt;/code&gt; are only suitable
for very small modules, and do not provide enough features to sufficiently test
a DSP module with AXI4-Stream input, AXI4-Stream output and AXI4-Lite slave
port for configuration.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;PeekPokeTester&lt;/code&gt; allows &lt;code&gt;poke&lt;/code&gt;-ing the inputs to DUT and &lt;code&gt;peek&lt;/code&gt;-ing the outputs
from DUT. It also provides a &lt;code&gt;step()&lt;/code&gt; method to advance simulation time by one
or more clock period.&lt;/p&gt;
&lt;p&gt;In the previously-described case with a DUT with three ports (two AXI4-Stream
and one AXI4-Lite) one would ideally need three separate Bus Functional Models
(BFMs) which get executed (read their inputs and update their outputs) every
clock cycle. This can be achieved by overriding the &lt;code&gt;step()&lt;/code&gt; method of the
&lt;code&gt;PeekPokeTester&lt;/code&gt;.&lt;/p&gt;
&lt;h1&gt;Overriding &lt;code&gt;step()&lt;/code&gt; method&lt;/h1&gt;
&lt;p&gt;The code for this example is available on my GitHub, in &lt;a href="https://github.com/j-marjanovic/chisel-stuff/tree/master/example-1-override-step"&gt;chisel-stuff/example-1&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In this simplified (stripped to the minimum) example, we have a DUT with two
interfaces. Each interface consist only of &lt;code&gt;data&lt;/code&gt; and &lt;code&gt;valid&lt;/code&gt; signals, neither
DUT nor monitor BFM are unable to backpressure the stream of data. The testbench
will consist of three logical units: DUT, Driver BFM and Monitor BFM. Both BFMs
are updated every clock cycles, so that driver BFM can drive the input port of
the DUT and monitor BFM can in parallel monitor the output port of the DUT.&lt;/p&gt;
&lt;p&gt;The core of this examples are the following couple of lines (from &lt;a href="https://github.com/j-marjanovic/chisel-stuff/blob/master/example-1-override-step/src/test/scala/overrideStepExample/OverrideStepExampleTester.scala#L126"&gt;OverrideStepExampleTester.scala:126&lt;/a&gt;):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;  &lt;span class="c1"&gt;//==========================================================================&lt;/span&gt;
  &lt;span class="c1"&gt;// step&lt;/span&gt;

  &lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="n"&gt;rm&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="n"&gt;runtimeMirror&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;getClass&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getClassLoader&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="n"&gt;im&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reflect&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="n"&gt;members&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="n"&gt;im&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;symbol&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;typeSignature&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;members&lt;/span&gt;
  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;bfms&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="n"&gt;members&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filter&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;_&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;typeSignature&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;:&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;typeOf&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;ChiselBfm&lt;/span&gt;&lt;span class="o"&gt;])&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;stepSingle&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;&lt;span class="k"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;Unit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bfm&lt;/span&gt; &lt;span class="k"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="n"&gt;bfms&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
      &lt;span class="n"&gt;im&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reflectField&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bfm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;asTerm&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;asInstanceOf&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;ChiselBfm&lt;/span&gt;&lt;span class="o"&gt;].&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;super&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="k"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;Int&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="k"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;Unit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;_&lt;/span&gt; &lt;span class="k"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="n"&gt;until&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
      &lt;span class="n"&gt;stepSingle&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Through Scala's &lt;a href="https://www.michaelpollmeier.com/fun-with-scalas-new-reflection-api-2-10"&gt;Reflection API&lt;/a&gt;
we are able to find all instances of classes which have a trait of &lt;code&gt;ChiselBfm&lt;/code&gt;,
and then call their &lt;code&gt;update()&lt;/code&gt; methods. This allows both BFMs to read and write
to the ports as they desire, independent from each other.&lt;/p&gt;
&lt;p&gt;The instantiations of both BFMs is a little clunky, we need to manually provide
them all the methods from &lt;code&gt;PeekPokeTester&lt;/code&gt; which are needed during the
operation of the BFMs.&lt;/p&gt;
&lt;p&gt;Running &lt;code&gt;sbt test&lt;/code&gt; in &lt;code&gt;example-1-override-step&lt;/code&gt;, we obtain the following result:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;0.002&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SEED&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1539634207505&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;0.023&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Test&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;starting&lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;0.271&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;Driver&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;0.274&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;Driver&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;0.278&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;Monitor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;received&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;0.278&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;Driver&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;0.289&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;Monitor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;received&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;0.290&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;Driver&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;0.310&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;Monitor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;received&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;0.310&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;Driver&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;99&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;0.314&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;Monitor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;received&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;0.315&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;Driver&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;0.324&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;Monitor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;received&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;0.326&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;Driver&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;65534&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;0.340&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;Monitor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;received&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;101&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;0.340&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;Driver&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;65535&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;0.344&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;Monitor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;received&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;65535&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;0.348&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;Monitor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;received&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;0.401&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="mi"&gt;23&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Test&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;finished&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="n"&gt;Enabling&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;waves&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="k"&gt;Exit&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;Code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;0.409&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;RAN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;23&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CYCLES&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;PASSED&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;OverrideStepExampleTest&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tester&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;should&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;compare&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ow"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;obtained&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;And this is the display of the waveforms from GTKWave:&lt;/p&gt;
&lt;p&gt;&lt;img alt="GTKWave display" src="www.j-marjanovic.io/images/chisel_override_clock.png"&gt;&lt;/p&gt;
&lt;p&gt;It can be noted that both Driver and Monitor are able to perform their tasks
in parallel.&lt;/p&gt;
&lt;h1&gt;Conclusion&lt;/h1&gt;
&lt;p&gt;Shown here is a convenient method to enhance the Chisel &lt;code&gt;PeekPokeTester&lt;/code&gt;. In
this particular case (when DUT has only one input and one output port), one
could also use &lt;code&gt;OrderedDecoupledHWIOTester&lt;/code&gt;, but it should be obvious that the
method presented here provides more control and flexibility in more complex
cases.&lt;/p&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="Chisel"></category><category term="FPGA"></category><category term="Scala"></category><category term="BFM"></category></entry><entry><title>Blog restart</title><link href="www.j-marjanovic.io/blog-restart.html" rel="alternate"></link><published>2018-09-15T19:00:00+02:00</published><updated>2018-09-15T19:00:00+02:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2018-09-15:www.j-marjanovic.io/blog-restart.html</id><summary type="html">&lt;p&gt;This blog was left dormant for quite some time now. From the last blog post
in February 2017 a lot has happened. I have moved to Hamburg, Germany, to
start a new position as FPGA Developer at MicroTCA Technology Lab at
&lt;a href="https://www.desy.de/"&gt;DESY&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This is me in European XFEL tunnel:&lt;/p&gt;
&lt;p style="width:70%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;img alt="Jan in European XFEL tunnel" src="www.j-marjanovic.io/images/jan_at_desy.jpg"&gt;&lt;/p&gt;
&lt;p&gt;There …&lt;/p&gt;</summary><content type="html">&lt;p&gt;This blog was left dormant for quite some time now. From the last blog post
in February 2017 a lot has happened. I have moved to Hamburg, Germany, to
start a new position as FPGA Developer at MicroTCA Technology Lab at
&lt;a href="https://www.desy.de/"&gt;DESY&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This is me in European XFEL tunnel:&lt;/p&gt;
&lt;p style="width:70%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;img alt="Jan in European XFEL tunnel" src="www.j-marjanovic.io/images/jan_at_desy.jpg"&gt;&lt;/p&gt;
&lt;p&gt;There are a couple of topics which I would like to explore as a hobby, and
having a blog is a nice way to organize your thoughts and outputs. Writing a
blog post at the end of the project requires someone to gather his thoughts
and to write down the conclusion.&lt;/p&gt;
&lt;p&gt;That is all for now, expect more interesting blog posts (probably focusing on
FPGA, mid-level languages as Chisel and high-level synthesis) in the future.&lt;/p&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="Misc"></category></entry><entry><title>FOSDEM 2017</title><link href="www.j-marjanovic.io/fosdem-2017.html" rel="alternate"></link><published>2017-02-03T18:00:00+01:00</published><updated>2017-02-03T18:00:00+01:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2017-02-03:www.j-marjanovic.io/fosdem-2017.html</id><summary type="html">&lt;p&gt;I am writing this post from Brussels where I am attending &lt;a href="https://fosdem.org/2017/"&gt;FOSDEM 2017&lt;/a&gt;
conference.&lt;/p&gt;
&lt;p&gt;There are a lot of interesting talks, and sometimes it is quite hard to decide
on which one to go. Luckily there the talks are recorded and one can later
check also the missed ones.&lt;/p&gt;
&lt;p&gt;Here …&lt;/p&gt;</summary><content type="html">&lt;p&gt;I am writing this post from Brussels where I am attending &lt;a href="https://fosdem.org/2017/"&gt;FOSDEM 2017&lt;/a&gt;
conference.&lt;/p&gt;
&lt;p&gt;There are a lot of interesting talks, and sometimes it is quite hard to decide
on which one to go. Luckily there the talks are recorded and one can later
check also the missed ones.&lt;/p&gt;
&lt;p&gt;Here is my list of the talks I plan to attend:&lt;/p&gt;
&lt;h2&gt;Saturday&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;th&gt;Title&lt;/th&gt;
&lt;th&gt;Location&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;9:00&lt;/td&gt;
&lt;td&gt;&lt;a href="https://fosdem.org/2017/schedule/event/keynotes_welcome/"&gt;Welcome to FOSDEM 2017&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Janson&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11:00&lt;/td&gt;
&lt;td&gt;&lt;a href="https://fosdem.org/2017/schedule/event/open_power/"&gt;Let's talk about hardware: The POWER of open&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;H.2215 (Ferrer)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12:00&lt;/td&gt;
&lt;td&gt;&lt;a href="https://fosdem.org/2017/schedule/event/lorawan/"&gt;LoRaWAN for exploring the Internet of Things&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;K.1.105 (La Fontaine)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;13:00&lt;/td&gt;
&lt;td&gt;lunch&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;14:00&lt;/td&gt;
&lt;td&gt;&lt;a href="https://fosdem.org/2017/schedule/event/kernel_spi_subsystem/"&gt;Groking the Linux SPI Subsystem&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;UD2.120 (Chavanne)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;14:00 (alternative)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://fosdem.org/2017/schedule/event/sdr_fpga/"&gt;FPGAs in SDR -- Why, when, and how to use them (with RFNoC)&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;AW1.120&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;15:00&lt;/td&gt;
&lt;td&gt;&lt;a href="https://fosdem.org/2017/schedule/event/hello_world/"&gt;Everything You Always Wanted to Know About "Hello, World"*&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Janson&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16:30&lt;/td&gt;
&lt;td&gt;&lt;a href="https://fosdem.org/2017/schedule/event/iot_micropython/"&gt;Scientific MicroPython for Microcontrollers and IoT&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;AW1.126&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;17:00&lt;/td&gt;
&lt;td&gt;&lt;a href="https://fosdem.org/2017/schedule/event/libreboot/"&gt;Libreboot&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;K.1.105 (La Fontaine)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;18:05&lt;/td&gt;
&lt;td&gt;&lt;a href="https://fosdem.org/2017/schedule/event/copyleft_defense/"&gt;Understanding The Complexity of Copyleft Defense&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Janson&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2&gt;Sunday&lt;/h2&gt;
&lt;p&gt;On Sunday I plan to spend most of the time in &lt;a href="https://fosdem.org/2017/schedule/track/electronic_design_automation_eda/"&gt;Electronic Design Automation (EDA) devroom&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here is a selfie of me with the famous &lt;a href="https://en.wikipedia.org/wiki/Manneken_Pis"&gt;Manneken Pis status&lt;/a&gt;:&lt;/p&gt;
&lt;p style="width:50%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;img alt="Selfie with Manneken Pis" src="www.j-marjanovic.io/images/fosdem_2017/IMG_4898.JPG"&gt;&lt;/p&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="Misc"></category><category term="Conference"></category></entry><entry><title>Books read: E. Stavinov: 100 Power Tips for FPGA Designers</title><link href="www.j-marjanovic.io/books-read-e-stavinov-100-power-tips-for-fpga-designers.html" rel="alternate"></link><published>2016-10-18T23:00:00+02:00</published><updated>2016-10-18T23:00:00+02:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2016-10-18:www.j-marjanovic.io/books-read-e-stavinov-100-power-tips-for-fpga-designers.html</id><summary type="html">&lt;p&gt;I recently found a great book explaining in details FPGA workflow for Xilinx
tools, titled 100 Power Tips for FPGA Designers. &lt;a href="http://www.outputlogic.com"&gt;Evgeni
Stavinov&lt;/a&gt; is an experienced FPGA designer who
previously worked for Xilinx. It is not evident from the title, but this book
focuses almost entirely on the Xilinx, while …&lt;/p&gt;</summary><content type="html">&lt;p&gt;I recently found a great book explaining in details FPGA workflow for Xilinx
tools, titled 100 Power Tips for FPGA Designers. &lt;a href="http://www.outputlogic.com"&gt;Evgeni
Stavinov&lt;/a&gt; is an experienced FPGA designer who
previously worked for Xilinx. It is not evident from the title, but this book
focuses almost entirely on the Xilinx, while Altera, Lattice and Microsemi are
mentioned just briefly in an FPGA vendor list and every once in a while. Due
to a fast-paced development of the FPGAs and corresponding tools, it is clear
that a book from 2011 would be slightly outdated in 2016. Most notable change
in the previous years was a new software suite from Xilinx, called Vivado and
the slow introduction of C-to-FPGA tools, such as Vivado HLS.&lt;/p&gt;
&lt;p style="width:70%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;img alt="Evgeni Stavinov: 100 Power Tips for FPGA Designers" src="www.j-marjanovic.io/images/100_power_tips_fpga_designers.jpg"&gt;&lt;/p&gt;
&lt;p&gt;Nonetheless, this book is ideal for somebody who already has some (formal)
education about the FPGA but lacks the real world experience. The author
manages to touch every aspect of the FPGA design, from device selection,
simulation, coding, debugging, communication protocols, FPGA board bring-up
and all small details one should know about FPGAs.&lt;/p&gt;
&lt;p&gt;The book (as the title suggest is organized into 100 tips). Here are my
notes and comments to some tips this book provides.&lt;/p&gt;
&lt;h3&gt;Tip 9&lt;/h3&gt;
&lt;p&gt;The FPGA field has seen some new tools emerge in the past few years, while
some other tools ceased to exist or were integrated in other software suites.
Under &lt;strong&gt;Lint tools&lt;/strong&gt; should definitely be added &lt;em&gt;Sigasi Editor&lt;/em&gt;, an Eclipse
based editor for VHDL and Verilog. &lt;strong&gt;Verilator&lt;/strong&gt; is a cycle based simulator,
since it can be used as a linter (&lt;code&gt;--lint-only&lt;/code&gt;) it should be also added on
this list.&lt;/p&gt;
&lt;p&gt;Another interesting tool worth mentioning is &lt;strong&gt;Doxygen&lt;/strong&gt; which can create the
documentation from the comments in the code and other Markdown documents. The
original program does not support Verilog, but there is a fork &lt;strong&gt;Doxverilog&lt;/strong&gt;
which also adds a support for Verilog.&lt;/p&gt;
&lt;h3&gt;Tip 15&lt;/h3&gt;
&lt;p&gt;This tip states that &lt;code&gt;initial&lt;/code&gt; block are ignored by FPGA synthesis tools. This
probably a feature which was added after the release of the book, but both XST
in ISE 14.7
(http://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/xst.pdf)
and Vivado Synthesis
(http://www.xilinx.com/support/documentation/sw_manuals/xilinx2016_2/ug901-vivado-synthesis.pdf)
now support initialization of the register from &lt;code&gt;initial&lt;/code&gt; block.&lt;/p&gt;
&lt;p&gt;Vivado Synthesis Guidelines go even further and suggest using &lt;code&gt;inital&lt;/code&gt; instead
of reset:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;Avoid operational set/reset logic whenever possible. There may be other,&lt;/span&gt;
&lt;span class="err"&gt;less expensive, ways to achieve the desired effect, such as taking&lt;/span&gt;
&lt;span class="err"&gt;advantage of the circuit global reset by defining an initial content.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;h2&gt;Tip #18&lt;/h2&gt;
&lt;p&gt;A small typo on page 81, in SystemVerilog &lt;code&gt;logic&lt;/code&gt; is a 4-state type which
should replace &lt;code&gt;reg&lt;/code&gt;, especcialy in cases in which &lt;code&gt;reg&lt;/code&gt; keyword may cause a
confusion (e.g. &lt;code&gt;always_comb&lt;/code&gt; block).&lt;/p&gt;
&lt;h2&gt;Tip #19&lt;/h2&gt;
&lt;p&gt;While mentioning code editors for Verilog and VHDL it should be worth
mentioning that the one integrated in Xilinx and Altera tools are complete
garbage. Vivado did not even had a auto-complete until 2016!&lt;/p&gt;
&lt;p&gt;The list of code editors could also be extended with &lt;strong&gt;Sublime Text&lt;/strong&gt; and
&lt;strong&gt;Atom Editor&lt;/strong&gt;.&lt;/p&gt;
&lt;h2&gt;Tip #22&lt;/h2&gt;
&lt;p&gt;This tip discusses meta-stability and data-coherency on clock-domain crossing
logic. It would probably be a good idea to also mention how to proceed when a
state machine transitions are controlled with asynchronous signals. This is
similar problem to data coherency, all input signals should be re-sampled to
the state machine clock domain before they are connected to state-transition
logic. Otherwise it is possible for state machine to enter illegal states due
to different delays from IO pins to registers.&lt;/p&gt;
&lt;h2&gt;Tip #26&lt;/h2&gt;
&lt;p&gt;A small typo on page 124, the last line of code example for shift registers
should be:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;initial&lt;/span&gt; &lt;span class="n"&gt;shift4&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;init2&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="mh"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;   &lt;span class="c1"&gt;// result is 8&amp;#39;hE0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Note that result after shift operation should be 8'hE0 (-32) instead of 8'h0E.&lt;/p&gt;
&lt;h2&gt;Tip #27&lt;/h2&gt;
&lt;p&gt;I was always wondering if a state machine which has a transition from
&lt;code&gt;default&lt;/code&gt; case defined is equivalent to state machine generated with
&lt;code&gt;-safe_implementation&lt;/code&gt; switch.&lt;/p&gt;
&lt;h2&gt;Tip #29&lt;/h2&gt;
&lt;p&gt;I enjoyed the discussion about various reset mechanism. When I stared
developing with FPGAs I always started writing process blocks the same way
(&lt;code&gt;if !reset_n init_regs_to_something else my_logic_here&lt;/code&gt;) not realizing that
often a reset is not needed. This is especially true in data-processing
pipelines.&lt;/p&gt;
&lt;h2&gt;Tip #34&lt;/h2&gt;
&lt;p&gt;When initializing Block RAM I would suggest using &lt;code&gt;readmemb&lt;/code&gt; and &lt;code&gt;readmemh&lt;/code&gt;
system calls instead of proposed Xilinx custom format, since &lt;code&gt;readmemb&lt;/code&gt; and 
&lt;code&gt;readmemh&lt;/code&gt; work also for simulation.&lt;/p&gt;
&lt;h2&gt;Tips #45-#55&lt;/h2&gt;
&lt;p&gt;These tips discuss ASIC prototyping with FPGA, which is not my area of
interest.&lt;/p&gt;
&lt;h2&gt;Tip #61&lt;/h2&gt;
&lt;p&gt;Here it could also be mentioned that Altera offers a free version of ModelSim,
called ModelSim-Altera Starter Edition. Compared to Xilinx ISIM and Vivado
Simulator, the ModelSim-ASE is stripped-down version of a full ModelSim.
Therefore there is a possibility to easily migrate from free to paid version
if the need for additional features (such as code coverage) arises.&lt;/p&gt;
&lt;h2&gt;Tip #62&lt;/h2&gt;
&lt;p&gt;The figure with the basic testbench components it is a good starting point
even for the testbenches which do not use any verification methodology, such
as UVM. Several points of what I consider a good testbench (especially for
non-UVM, handcrafted testbenches):&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;There is a main procedure which performs the setup, runs the driver and
monitor and at the end runs the checker.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The setup should be done exactly as software would do it, e.g. if there is
an AXI4-Lite port with configuration and status registers AXI4-Lite Master BFM
should load the configurations settings on that port. Additionally it is also
good to try reading back the configuration values and check them against the
values written.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The driver and the monitor should operate completely asynchronously one
from another. When simulating a module which operates on data stream (e.g. a
DSP module with  AXI-Stream slave port for input and AXI-Stream master port
for output) I like to additionally throttle the output port, to observe how
the module behaves when the upstream module is not able to temporarily keep up
with data flow.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The checker (as the name suggest) checks the data received on the monitor
with the reference implementation. When there is a mismatch between the
received and expected value, the checker should clearly show the received and
expected value (SystemVerilog assertions are a nice way to do this).&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Tip #63&lt;/h2&gt;
&lt;p&gt;The example which shows the delta cycle delays is fantastic and it also
demonstrates that Verilog is very powerful, but also very dangerous language.
The example is also based on two different registers being clocked by two
different clock. If not absolutely necessary I would advise agains using
different clock in the same modules, and to use the FPGA vendor provided FIFOs
for synchronization between clock domains.&lt;/p&gt;
&lt;p&gt;At the end of this tip there is a Verilog code which stores the state name
string in a separated variable. This is one possible source of errors if the
names of states are changed or if new states are added. Much better solution
would be to use SystemVerilog &lt;code&gt;enum&lt;/code&gt;s, which add a little bit of type
strictness to this otherwise type non-strict language.&lt;/p&gt;
&lt;h2&gt;Tip #67&lt;/h2&gt;
&lt;p&gt;I would agree with the observation that the IP, TCP and UDP protocols were not
designed to be implemented in hardware. Most problematic is the position of the
checksum word in packet header, which does not help neither transmitting neither
receiving side. However, &lt;a href="UDP protocol"&gt;https://www.ietf.org/rfc/rfc768.txt&lt;/a&gt;
foresees sending the packets without the checksum calculation, all bits in the
checksum field must be 0. In some cases this may offer an improvement in link
latency (if there is some other method to check the data correctness).&lt;/p&gt;
&lt;h2&gt;Tip #70&lt;/h2&gt;
&lt;p&gt;This tip describes various FPGA interconnect buses. Due to the Xilinx shift
from PowerPC to ARM and with introduction of Vivado, the bus of choice for an
FPGA designer should be one of 3 version of AXI buses, either AXI4, AXI4-Lite
for configuration registers and AXI4-Stream for streaming data.&lt;/p&gt;
&lt;p&gt;Missing on this list is Avalon bus, which is widely used with Altera QSys. There
are two versions (memory-mapped and streaming) and provide a very convenient way
to interface registers to CPU. Only needed signals need to be specified, while
others are automatically added by QSys during the "compilation".&lt;/p&gt;
&lt;h2&gt;Tip #76&lt;/h2&gt;
&lt;p&gt;With the new FPGA family, UltraScale, Xilinx provides a new PCIe DMA
controller (https://www.youtube.com/watch?v=TzzzM97L4HI). This saves a lot of
work to FPGA designers or significantly reduces the price of IPs. By providing
various AXI interfaces the PCIe DMA controller enables easy integration with
Vivado Block Diagrams. The interfaces on PCIe DMA controller are also similar
to the one on embeeded ARM in Zynq FPGAs. Two different form-factors of same
product can be easily develop by using either PCIe DMA or Zynq ARM core. A
tabletop instrument can use a Zynq to run Linux and provide an interface to
the world by touch-screen display or TCP/UDP server. On the other hand, a
mezzanine card based solution (e.g. MicroTCA) can use PCIe DMA to provide a
link to the main CPU in crate.&lt;/p&gt;
&lt;p&gt;Altera also provides similar modules, such as Cyclone V Avalon-MM Interface
(https://www.altera.com/literature/ug/ug_c5_pcie_avmm.pdf).&lt;/p&gt;
&lt;h2&gt;Tip #86&lt;/h2&gt;
&lt;p&gt;When talking about ChipScope the signal attribute &lt;code&gt;keep&lt;/code&gt; should be mentioned,
since it often provides a way to find the post-synthesis nets. The &lt;code&gt;keep&lt;/code&gt;
attribute could also be combined with &lt;code&gt;debug&lt;/code&gt; attribute for easier identification
of signals.&lt;/p&gt;
&lt;p&gt;Verilog example:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c"&gt;(* keep = &amp;quot;true&amp;quot;, mark_debug = &amp;quot;true&amp;quot; *)&lt;/span&gt; &lt;span class="n"&gt;wire&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;19&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="n"&gt;signal_to_be_dbged&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;h2&gt;Tip #90&lt;/h2&gt;
&lt;p&gt;UCF constants are completely out of date, and were replaced by an equivalent of
Synopsys Design Contraints in Vivado.&lt;/p&gt;
&lt;h2&gt;Tip #93&lt;/h2&gt;
&lt;p&gt;There is a parameter called &lt;code&gt;cost table&lt;/code&gt; between the options for placer, but the
description would suggest that this may be better called seed, since it provides
a starting point for randomized algorithm (placement).&lt;/p&gt;
&lt;h2&gt;Tip #97&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://jenkins.io/"&gt;Jenkins&lt;/a&gt; is a more-popular fork of Hudson CI. With a good
support for tcl, Vivado offers a reasonable easy way to automate the compilation
of the entire project directly from the source code in the repository.&lt;/p&gt;
&lt;p&gt;Another tools which could be added to this list is &lt;a href="http://www.ohwr.org/projects/hdl-make"&gt;hdlmake&lt;/a&gt;
which is meant to be an equivalent of Make for FPGA projects. Currently it is
not able to tackle more complex compilation procedures, such as re-compiling
vendor IP cores or handling Vivado Block Diagrams.&lt;/p&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="Projects"></category><category term="FPGA"></category><category term="Xilinx"></category><category term="Books"></category></entry><entry><title>Debugging Linux start-up on Altera Cyclone V SoC with OpenOCD</title><link href="www.j-marjanovic.io/debugging-linux-start-up-on-altera-cyclone-v-soc-with-openocd.html" rel="alternate"></link><published>2016-07-12T22:00:00+02:00</published><updated>2016-07-12T22:00:00+02:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2016-07-12:www.j-marjanovic.io/debugging-linux-start-up-on-altera-cyclone-v-soc-with-openocd.html</id><summary type="html">&lt;p&gt;This blog post will show you how one can use the OpenOCD debugger with Altera
Cyclone V SoC. Altera Cyclone V SoC is a very interesting integrated circuit,
combining dual-core ARM processor and a decent FPGA, allowing a wide variety
of possibilities to partition the application between the two.&lt;/p&gt;
&lt;p&gt;Xilinx …&lt;/p&gt;</summary><content type="html">&lt;p&gt;This blog post will show you how one can use the OpenOCD debugger with Altera
Cyclone V SoC. Altera Cyclone V SoC is a very interesting integrated circuit,
combining dual-core ARM processor and a decent FPGA, allowing a wide variety
of possibilities to partition the application between the two.&lt;/p&gt;
&lt;p&gt;Xilinx offers Xilinx SDK as the tool to program and debug their MicroBlaze
soft-core and ARM cores in their Zynq FPGAs. Altera on the other hand has two
different tools to program and debug their portfolio of processors. There is
Nios II EDS which provides support for Nios soft-core processor and there is 
ARM DS-5 Development Studio, which provides support for ARM cores in Altera SoCs. While I believe
DS-5 can be useful tool, unfortunately the free-as-a-beer &lt;a href="https://developer.arm.com/products/software-development-tools/ds-5-development-studio/editions/customized-editions/altera/community-edition"&gt;DS-5 Community Edition&lt;/a&gt;
only allows debugging Linux user-space applications. In order to use it, the 
Linux should be up and running in order to run gdbserver on processor.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Debug options in DS-5 Altera Community Edition" src="www.j-marjanovic.io/images/debugging_cyclone_soc_openocd/ds5_debug_options.png" style="max-width:100%; width: auto; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;When doing initial bring-up or experimenting this may not be the case. If
there is something wrong with a kernel, a device tree or the drivers, one can
easily find himself with a non-responsive system. Even in this case the JTAG
debugger offers a side-door access to the system. This greatly simplifies
determining the cause which lead to the system halt.&lt;/p&gt;
&lt;h1&gt;Installation of OpenOCD&lt;/h1&gt;
&lt;h2&gt;OpenOCD&lt;/h2&gt;
&lt;p&gt;&lt;a href="http://openocd.org/"&gt;OpenOCD&lt;/a&gt; is a free and open-source on-chip debugger. It
provides a link between hardware components and a command line interface,
which can be used to control and monitor the hardware over JTAG interface. It
can also be interfaced with GDB (GNU Debugger) integrated with Eclipse, to
provide a graphical way to debug programs. If you want to know more, at the
bottom of &lt;a href="http://openocd.org/documentation/"&gt;OpenOCD Documentation page&lt;/a&gt; is
a link to the presentation on FOSDEM 2006.&lt;/p&gt;
&lt;p&gt;First we need to install all needed tools:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;sudo apt-get install libtool autotools-dev automake libusb-1.0 libhidapi-dev pkg-config git&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Then we get the source code from OpenOCD SourceForge repository. I have used the
latest available commit in master, which was the one with a git tree-ish value
of &lt;code&gt;12ff09f&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;git&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;git&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;git&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;net&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;openocd&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;openocd&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="nv"&gt;@eee2f562f8f2&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;openocd&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;code2&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;git&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;HEAD&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="k"&gt;commit&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="n"&gt;ff09f7f27a707fe42226262f55b8ce8351cbf9&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="nl"&gt;Author&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Esben&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Haabendal&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;esben&lt;/span&gt;&lt;span class="nv"&gt;@haabendal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dk&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;Fri&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Nov&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;27&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;09&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;36&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2015&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;0100&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="nl"&gt;cfi&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;Add&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;support&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;strangely&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;endianness&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;broken&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SoC&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;implementations&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Perform all needed steps to compile the code (have a look in &lt;code&gt;INSTALL&lt;/code&gt; for
detailed instructions):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;cd openocd-code&lt;/span&gt;
&lt;span class="err"&gt;aclocal&lt;/span&gt;
&lt;span class="err"&gt;./bootstrap&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;At the end of configure step make sure that support for Altera USB-Blaster II
and CMSIS-DAP Debugger are configured. The output should look something like
this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;configure&lt;/span&gt;

&lt;span class="p"&gt;...&lt;/span&gt;

&lt;span class="n"&gt;OpenOCD&lt;/span&gt; &lt;span class="n"&gt;configuration&lt;/span&gt; &lt;span class="n"&gt;summary&lt;/span&gt;
&lt;span class="c1"&gt;--------------------------------------------------&lt;/span&gt;
&lt;span class="n"&gt;MPSSE&lt;/span&gt; &lt;span class="k"&gt;mode&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;FTDI&lt;/span&gt; &lt;span class="n"&gt;based&lt;/span&gt; &lt;span class="n"&gt;devices&lt;/span&gt;        &lt;span class="n"&gt;yes&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;auto&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;Segger&lt;/span&gt; &lt;span class="n"&gt;J&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;Link&lt;/span&gt; &lt;span class="n"&gt;JTAG&lt;/span&gt; &lt;span class="n"&gt;Programmer&lt;/span&gt;           &lt;span class="n"&gt;yes&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;auto&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ST&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;Link&lt;/span&gt; &lt;span class="n"&gt;JTAG&lt;/span&gt; &lt;span class="n"&gt;Programmer&lt;/span&gt;                 &lt;span class="n"&gt;yes&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;auto&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;TI&lt;/span&gt; &lt;span class="n"&gt;ICDI&lt;/span&gt; &lt;span class="n"&gt;JTAG&lt;/span&gt; &lt;span class="n"&gt;Programmer&lt;/span&gt;                 &lt;span class="n"&gt;yes&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;auto&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;Keil&lt;/span&gt; &lt;span class="n"&gt;ULINK&lt;/span&gt; &lt;span class="n"&gt;JTAG&lt;/span&gt; &lt;span class="n"&gt;Programmer&lt;/span&gt;              &lt;span class="n"&gt;yes&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;auto&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;Altera&lt;/span&gt; &lt;span class="n"&gt;USB&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;Blaster&lt;/span&gt; &lt;span class="n"&gt;II&lt;/span&gt; &lt;span class="n"&gt;Compatible&lt;/span&gt;        &lt;span class="n"&gt;yes&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;auto&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;Versaloon&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;Link&lt;/span&gt; &lt;span class="n"&gt;JTAG&lt;/span&gt; &lt;span class="n"&gt;Programmer&lt;/span&gt;          &lt;span class="n"&gt;yes&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;auto&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;OSBDM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;JTAG&lt;/span&gt; &lt;span class="k"&gt;only&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Programmer&lt;/span&gt;            &lt;span class="n"&gt;yes&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;auto&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;eStick&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;opendous&lt;/span&gt; &lt;span class="n"&gt;JTAG&lt;/span&gt; &lt;span class="n"&gt;Programmer&lt;/span&gt;         &lt;span class="n"&gt;yes&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;auto&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;Andes&lt;/span&gt; &lt;span class="n"&gt;JTAG&lt;/span&gt; &lt;span class="n"&gt;Programmer&lt;/span&gt;                   &lt;span class="n"&gt;yes&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;auto&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;USBProg&lt;/span&gt; &lt;span class="n"&gt;JTAG&lt;/span&gt; &lt;span class="n"&gt;Programmer&lt;/span&gt;                 &lt;span class="k"&gt;no&lt;/span&gt;
&lt;span class="n"&gt;Raisonance&lt;/span&gt; &lt;span class="n"&gt;RLink&lt;/span&gt; &lt;span class="n"&gt;JTAG&lt;/span&gt; &lt;span class="n"&gt;Programmer&lt;/span&gt;        &lt;span class="k"&gt;no&lt;/span&gt;
&lt;span class="n"&gt;Olimex&lt;/span&gt; &lt;span class="n"&gt;ARM&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;JTAG&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;EW&lt;/span&gt; &lt;span class="n"&gt;Programmer&lt;/span&gt;           &lt;span class="k"&gt;no&lt;/span&gt;
&lt;span class="n"&gt;CMSIS&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;DAP&lt;/span&gt; &lt;span class="n"&gt;Compliant&lt;/span&gt; &lt;span class="n"&gt;Debugger&lt;/span&gt;            &lt;span class="n"&gt;yes&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;auto&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Then we just need to compile everything and install the openocd binary.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;make&lt;/span&gt;

&lt;span class="n"&gt;sudo&lt;/span&gt; &lt;span class="n"&gt;make&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;h2&gt;&lt;code&gt;udev&lt;/code&gt; rules for USB-Blaster&lt;/h2&gt;
&lt;p&gt;After the OpenOCD is installed, we must take care to set the correct &lt;code&gt;udev&lt;/code&gt;
rules (access permisions for USB device). As a workaround I have been chmod-ing
the &lt;code&gt;/dev/bus/usb/002/&lt;/code&gt; folder to &lt;code&gt;0666&lt;/code&gt; and that gave me correct permission
to use the USB-Blaster from Altera Quartus software.&lt;/p&gt;
&lt;p&gt;The more elegant solution is described in the comment section of &lt;a href="http://www.fpga-dev.com/altera-usb-blaster-with-
ubuntu/"&gt;ALTERA USB-BLASTER
WITH UBUNTU 14.04&lt;/a&gt;. The USB-Blaster has multiple personalities (one for FPGA JTAG and
one for ARM JTAG), the udev rule therefore needs to specify both 6010 and
6810 as the targeted devices.&lt;/p&gt;
&lt;p&gt;Create /etc/udev/rules.d/51-usbblaster.rules with the following content:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;# For Altera USB-Blaster on SoCkit&lt;/span&gt;
&lt;span class="err"&gt;SUBSYSTEM==&amp;quot;usb&amp;quot;,\&lt;/span&gt;
&lt;span class="err"&gt;ENV{DEVTYPE}==&amp;quot;usb_device&amp;quot;,\&lt;/span&gt;
&lt;span class="err"&gt;ATTR{idVendor}==&amp;quot;09fb&amp;quot;,\&lt;/span&gt;
&lt;span class="err"&gt;ATTR{idProduct}==&amp;quot;6010|6810&amp;quot;,\&lt;/span&gt;
&lt;span class="err"&gt;MODE=&amp;quot;0666&amp;quot;,\&lt;/span&gt;
&lt;span class="err"&gt;NAME=&amp;quot;bus/usb/$env{BUSNUM}/$env{DEVNUM}&amp;quot;,\&lt;/span&gt;
&lt;span class="err"&gt;RUN+=&amp;quot;/bin/chmod 0666 %c&amp;quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;&lt;code&gt;udev&lt;/code&gt; rules should be reloaded with the following command to take effect
immediately:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;sudo udevadm control --reload&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;h2&gt;First checks&lt;/h2&gt;
&lt;p&gt;Now we can try running Altera &lt;code&gt;jtagconfig&lt;/code&gt; program to check if the permissions
are OK. When the SoCkit board is attached, the output should look something
like this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;➜  ~ ~/altera/16.0/quartus/bin/jtagconfig                                                                                            &lt;/span&gt;
&lt;span class="err"&gt;1) CV SoCKit [1-1.1]                          &lt;/span&gt;
&lt;span class="err"&gt;  02D020DD   5CSEBA6(.|ES)/5CSEMA6/..&lt;/span&gt;
&lt;span class="err"&gt;  4BA00477   SOCVHPS&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Now we can also try running OpenOCD:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;➜  ~ openocd -f interface/altera-usb-blaster2.cfg -f target/altera_fpgasoc.cfg&lt;/span&gt;
&lt;span class="err"&gt;Open On-Chip Debugger 0.10.0-dev-00324-g12ff09f (2016-06-26-19:19)&lt;/span&gt;
&lt;span class="err"&gt;Licensed under GNU GPL v2&lt;/span&gt;
&lt;span class="err"&gt;For bug reports, read&lt;/span&gt;
&lt;span class="c"&gt;http://openocd.org/doc/doxygen/bugs.html&lt;/span&gt;
&lt;span class="err"&gt;Warn : Adapter driver &amp;#39;usb_blaster&amp;#39; did not declare which transports it allows; assuming legacy JTAG-only&lt;/span&gt;
&lt;span class="err"&gt;Info : only one transport option; autoselect &amp;#39;jtag&amp;#39;&lt;/span&gt;
&lt;span class="err"&gt;adapter speed: 1000 kHz&lt;/span&gt;
&lt;span class="err"&gt;cycv_dbginit&lt;/span&gt;
&lt;span class="err"&gt;Info : Altera USB-Blaster II (uninitialized) found&lt;/span&gt;
&lt;span class="err"&gt;Info : Loading firmware...&lt;/span&gt;
&lt;span class="err"&gt;Info : Waiting for renumerate...&lt;/span&gt;
&lt;span class="err"&gt;Info : Waiting for renumerate...&lt;/span&gt;
&lt;span class="err"&gt;Info : Altera USB-Blaster II found (Firm. rev. = 1.36)&lt;/span&gt;
&lt;span class="err"&gt;Info : This adapter doesn&amp;#39;t support configurable speed&lt;/span&gt;
&lt;span class="err"&gt;Info : JTAG tap: fpgasoc.dap tap/device found: 0x4ba00477 (mfg: 0x23b (ARM Ltd.), part: 0xba00, ver: 0x4)&lt;/span&gt;
&lt;span class="err"&gt;Info : JTAG tap: fpgasoc.fpga.tap tap/device found: 0x02d020dd (mfg: 0x06e (Altera), part: 0x2d02, ver: 0x0)&lt;/span&gt;
&lt;span class="err"&gt;Info : DAP transaction stalled (WAIT) - slowing down&lt;/span&gt;
&lt;span class="err"&gt;Info : DAP transaction stalled (WAIT) - slowing down&lt;/span&gt;
&lt;span class="err"&gt;Info : DAP transaction stalled (WAIT) - slowing down&lt;/span&gt;
&lt;span class="err"&gt;Info : fpgasoc.cpu.0: hardware has 6 breakpoints, 4 watchpoints&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;This gives us information that ARM core was recognized, this is therefore the
correct command to use it later with the GDB debugger.&lt;/p&gt;
&lt;h2&gt;Eclipse plug-in&lt;/h2&gt;
&lt;p&gt;Now we just have to set-up the Eclipse with the OpenOCD plug-in. I have used the
newest version of Eclipse available at the moment, Eclipse Neon.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Eclipse Neon" src="www.j-marjanovic.io/images/debugging_cyclone_soc_openocd/eclipse_neon.png" style="width:500px; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://gnuarmeclipse.github.io/plugins/install/"&gt;This page&lt;/a&gt; describes how to
install the OpenOCD plug-in for Eclipse, the easiest way is to drag and drop
install icon into a running instance of Eclipse.&lt;/p&gt;
&lt;p&gt;Once this is set, we can create a new project. For simpler (bare-metal)
project it would probably make sense to go with "Makefile Project with
Existing Code". For a Linux kernel debugging the plain Project will be enough.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Creation of new project in Eclipse" src="www.j-marjanovic.io/images/debugging_cyclone_soc_openocd/eclipse_new_project.png" style="width:800px; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;&lt;img alt="Creation of new project in Eclipse" src="www.j-marjanovic.io/images/debugging_cyclone_soc_openocd/eclipse_new_project_name.png" style="width:800px; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;Now let's find the original Linux binary. When building kernel with Yocto, the
created file in deploy directory is &lt;code&gt;zimage&lt;/code&gt; file. This is a compressed image
with all debugging symbols stripped out. It is optimized to be used in
embedded environment. We need to find the original &lt;code&gt;vmlinux&lt;/code&gt; image before the
debugging symbols were stripped out:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;➜  tmp git:(jethro) find . -name vmlinux                                       &lt;/span&gt;
&lt;span class="err"&gt;./work/cyclone5-poky-linux-gnueabi/linux-altera/4.3+gitAUTOINC+5938523338-r0/&lt;/span&gt;
&lt;span class="err"&gt;linux-cyclone5-standard-build/arch/arm/boot/vmlinux&lt;/span&gt;
&lt;span class="err"&gt;./work/cyclone5-poky-linux-gnueabi/linux-altera/4.3+gitAUTOINC+5938523338-r0/&lt;/span&gt;
&lt;span class="err"&gt;linux-cyclone5-standard-build/arch/arm/boot/compressed/vmlinux&lt;/span&gt;
&lt;span class="err"&gt;./work/cyclone5-poky-linux-gnueabi/linux-altera/4.3+gitAUTOINC+5938523338-r0/&lt;/span&gt;
&lt;span class="err"&gt;linux-cyclone5-standard-build/vmlinux&lt;/span&gt;
&lt;span class="err"&gt;./work/cyclone5-poky-linux-gnueabi/linux-altera/4.4+gitAUTOINC+969478b841-r0/&lt;/span&gt;
&lt;span class="err"&gt;linux-cyclone5-standard-build/arch/arm/boot/vmlinux&lt;/span&gt;
&lt;span class="err"&gt;./work/cyclone5-poky-linux-gnueabi/linux-altera/4.4+gitAUTOINC+969478b841-r0/&lt;/span&gt;
&lt;span class="err"&gt;linux-cyclone5-standard-build/arch/arm/boot/compressed/vmlinux&lt;/span&gt;
&lt;span class="err"&gt;./work/cyclone5-poky-linux-gnueabi/linux-altera/4.4+gitAUTOINC+969478b841-r0/&lt;/span&gt;
&lt;span class="err"&gt;linux-cyclone5-standard-build/vmlinux&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;There are various images from various different runs (and different versions
of the kernel).&lt;/p&gt;
&lt;p&gt;In Eclipse we then select: &lt;code&gt;Run -&amp;gt; Debug Configurations...&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;And then we just need to setup the correct parameters for debug. Under "Main
tab" we need to select the right Linux binary image:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Creation of new project in Eclipse" src="www.j-marjanovic.io/images/debugging_cyclone_soc_openocd/eclipse_debug_main.png" style="width:800px; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;Under "Debugger tab" we need to set-up the OpenOCD setting and use the correct
gdb (the one produced by Yocto):&lt;/p&gt;
&lt;p&gt;&lt;img alt="Creation of new project in Eclipse" src="www.j-marjanovic.io/images/debugging_cyclone_soc_openocd/eclipse_debug_debugger.png" style="width:800px; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;If we want to attach to a running kernel, we should un-check "Inital Reset"
and "Load executable" fields under "Startup tab".&lt;/p&gt;
&lt;p&gt;&lt;img alt="Creation of new project in Eclipse" src="www.j-marjanovic.io/images/debugging_cyclone_soc_openocd/eclipse_debug_startup.png" style="width:800px; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;And we are good to go. After pressing the "Debug" button, the debugging
perspective will show up. Now we can access &lt;code&gt;__log_buf&lt;/code&gt; buffer to determine
what is stopping the kernel boot.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Creation of new project in Eclipse" src="www.j-marjanovic.io/images/debugging_cyclone_soc_openocd/eclipse_debug_final.png" style="width:800px; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="misc"></category><category term="FPGA"></category><category term="Altera"></category><category term="Linux"></category><category term="OpenOCD"></category></entry><entry><title>Books read: 97 Things Every Programmer Should Know</title><link href="www.j-marjanovic.io/books-read-97-things-every-programmer-should-know.html" rel="alternate"></link><published>2016-06-05T22:00:00+02:00</published><updated>2016-06-05T22:00:00+02:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2016-06-05:www.j-marjanovic.io/books-read-97-things-every-programmer-should-know.html</id><summary type="html">&lt;p&gt;I have seen this book referenced on several occasions when the software
engineering was discussed, but I never had the time to read from first page to
the last. I bought the dead-tree version, which gives you a better motivation
to read the whole book. Here are my comments and …&lt;/p&gt;</summary><content type="html">&lt;p&gt;I have seen this book referenced on several occasions when the software
engineering was discussed, but I never had the time to read from first page to
the last. I bought the dead-tree version, which gives you a better motivation
to read the whole book. Here are my comments and thoughts about each
contribution in this book.&lt;/p&gt;
&lt;p style="width:70%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;img alt="97 Things Every Programmer Should Know" src="www.j-marjanovic.io/images/97_thing_every_programmer_should_know.jpg"&gt;&lt;/p&gt;
&lt;h2&gt;(1) Act with Prudence by Seb Rose&lt;/h2&gt;
&lt;p&gt;Technical debt is (similarly to a cash debt) a good way to kickstart the
development, but I agree with the author that is should be kept under control
and paid back as soon as possible. I much enjoy when a sub-optimal solution
gets fixed and the worries about possible problems with it vanish away.&lt;/p&gt;
&lt;h2&gt;(2) Apply Functional Programming Principles by Edward Garson&lt;/h2&gt;
&lt;p&gt;Yes, I am already excited about this book. The functional programming
principles (even in OO code) greatly improve testability of smaller components.
It also gives a programmer a different insight on how algorithms work, which
its a little more detached from how a classical computer works. Programming in
C feels like explaining complicated science to a 7 year old ("lets have a
number i, which is is set to 0. each time we will check if this number is less
than some other number n..."), while programming in Haskell feels like a
discussion with a philosopher ("there is this function fact for which the
value for 1 is 1. - oh, i see. - and for every other positive number, the
value is the argument itself multiplied by the value of function for an
argument minus one. - well, this is everything i needed to know about this
function, i think i know how to calculate it."&lt;/p&gt;
&lt;h2&gt;(3) Ask, "What Would the User Do?" (You Are Not the User) by Giles Colborne&lt;/h2&gt;
&lt;p&gt;This contribution is quite similar to Joel Spolsky
&lt;a href="http://www.joelonsoftware.com/uibook/chapters/fog0000000064.html"&gt;comment&lt;/a&gt; on
how to test the use interface: "One good way to evaluate the usability of a
program or dialog you've never seen before is to act a little stupid".  The
important reminder to take away from this is to try being conservative with
the user interface, to minimize the learning how to use your program or
application.&lt;/p&gt;
&lt;h2&gt;(4) Automate Your Coding Standard by Filip van Laenen&lt;/h2&gt;
&lt;p&gt;Nicely formatted code demonstrates that the person who wrote it has put in a
little effort to make it nicer and that the same person is somebody who like
to keep things organized. On the contrary, when you see a messy code with
sections of the code commented out, you know that there is almost surely
something wrong with its behavior as well. Clearly defined language-prescribed
formating standard (such as PEP8) and the tools to support it (such as pep8
tool and pylint) are much welcomed. Meanwhile, the &lt;a href="htt
ps://github.com/isocpp/CppCoreGuidelines/blob/5eb0b587af06dc5d1d9cda0dd8c6b399
bdc1e230/CppCoreGuidelines.md"&gt;C++ coding guidelines&lt;/a&gt; currently looks more like a brainstorming
session (why is there std::endl if you are not supposed to use it?)&lt;/p&gt;
&lt;h2&gt;(5) Beauty Is in Simplicity by Jørn Ølmheim&lt;/h2&gt;
&lt;p&gt;Simple solution are often the ones who work better, not only in software but
also in other fields, such as rock music. Trying to write the most legendary
rock riff of all time? You only need 4 notes (Yngwie does not agree with this:
&lt;a href="https://www.youtube.com/watch?v=QHZ48AE3TOI"&gt;More is more&lt;/a&gt; ). As an example
lets look at one  of the biggest code bases ever made: Linux kernel. The style
itself (tabs are 8 spaces, line is 80 characters) prevent you from writing too
complicated solution. The style guide for Linux kernel is a little extreme,
but since the kernel should be as lean as possible it produces indisputably
good results.&lt;/p&gt;
&lt;h2&gt;(6) Before You Refactor by Rajith Attapattu&lt;/h2&gt;
&lt;p&gt;This contribution is again very similar to one of Joel Spolsky's &lt;a href="http://www.joelonsoftware.com/articles/fog0000000069.html"&gt;blog
posts&lt;/a&gt;. Since
reading code is harder than writing it everybody assumes that a complete
rewrite will create better code. All the ugly patches in the code are actual
bugfixes and the software (although with ugly code) works.&lt;/p&gt;
&lt;h2&gt;(7) Beware the Share by Udi Dahan&lt;/h2&gt;
&lt;p&gt;Using already written libraries is a nice practice, and this contribution
talks about creating unwanted dependencies in the code base. The discussion
here regards the internally developed libraries. The case described in this
contribution can be also analyzed in terms of the time spent maintaining
reusable code, something which is described in The Mythical Man-Month. Use of
public (external) stable libraries should be encouraged, although I would in
the controversial case of leftPad function prefer to have the leftPad function
embedded directly in code.&lt;/p&gt;
&lt;h2&gt;(8) The Boy Scout Rule by Robert C. Martin&lt;/h2&gt;
&lt;p&gt;This is one of the most famous contribution in this book, I don't think there
is much to add to it. It is always nice when you look back at the clean
campground (I sometimes like to admire the cleanup in side-by-side diff).&lt;/p&gt;
&lt;h2&gt;(9) Check Your Code First Before Looking to Blame Others by Allan Kelly&lt;/h2&gt;
&lt;p&gt;I would also like to add that a really good understanding of the language
(read the standard) and the compiler (read  the manual) can help you resolve
the problem without blaming the compiler. Personally I know about some miner
differences between Verilog compilers from Xilinx and Altera, which could let
someone think that the compiler is buggy.&lt;/p&gt;
&lt;h2&gt;(10) Choose Your Tools with Care by Giovanni Asproni&lt;/h2&gt;
&lt;p&gt;The author of this contribution mentions that he likes to isolate external
tools from his own business logic. I see this approach sometimes in open
source FPGA development when a wrapper around some commonly used IP cores is
build (e.g. wrapper around block RAM for both Altera and Xilinx which gets
chosen based on a parameter or a macro).&lt;/p&gt;
&lt;h2&gt;(11) Code in the Language of the Domain by Dan North&lt;/h2&gt;
&lt;p&gt;This contribution could be summarized as a call to use sensible
variable/functions/object/methods names. This could be as well extended to
writing informative comments, and not have comments like this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;if (maxLim &amp;lt; 0) // checks if max limit is negative&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;h2&gt;(12) Code Is Design by Ryan Brush&lt;/h2&gt;
&lt;p&gt;The abstract process of making virtual artifacts (software) from nothing
cannot be compared with construction process. The possibility to experiment
with the real product gives direct feedback, which can be easily incorporated
into design process. I often see everyday physical objects which weren't
optimized to perfection, probably because of high iteration cost. The location
of power button on my HTC Desire is one of this design flaws, it cannot be
reached without changing the grip on the phone.&lt;/p&gt;
&lt;h2&gt;(13) Code Layout Matters by Steve Freeman&lt;/h2&gt;
&lt;p&gt;Indentation of the code is important, it provides the overview of the
complexity on the first glance. I look forward to GCC6 new warning on wrongly
indented C/C++ code.&lt;/p&gt;
&lt;h2&gt;(14) Code Reviews by Mattias Karlsson&lt;/h2&gt;
&lt;p&gt;Code reviews are important because they introduce all-seeing-eye in the code
writing process, and this makes programmers care more about code quality. One
cannot just make a ugly patch and hide in somewhere in the code base. The
ultimate code review is the release of the code to the public or to the
clients. In this case one can be sure that the company will be judged by the
quality of the released code. The same is true for the examples which
demonstrates the use of someone own tools.&lt;/p&gt;
&lt;p&gt;To be continued in the next blog post...&lt;/p&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="misc"></category><category term="Books"></category><category term="Software Engineering"></category></entry><entry><title>HDL data type for Python parser implementations</title><link href="www.j-marjanovic.io/hdl-data-type-for-python-parser-implementations.html" rel="alternate"></link><published>2015-11-15T22:00:00+01:00</published><updated>2015-11-15T22:00:00+01:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2015-11-15:www.j-marjanovic.io/hdl-data-type-for-python-parser-implementations.html</id><summary type="html">&lt;p&gt;Recently I had to implement a parser for the PCIe protocol. The data was
captured with Xilinx ChipScope and saved as TSV (tab-separated value) text file.
I wanted to implement a parser in Python, my favorite language for this kind of
tasks. I have stumbled to a problem when I …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Recently I had to implement a parser for the PCIe protocol. The data was
captured with Xilinx ChipScope and saved as TSV (tab-separated value) text file.
I wanted to implement a parser in Python, my favorite language for this kind of
tasks. I have stumbled to a problem when I needed an elegant way to represent
the vector of bits of arbitrary length. I have found several libraries but none
of them satisfied my needs, so I put together a small class, which mimics
SystemVerilog vectors.&lt;/p&gt;
&lt;script src="https://gist.github.com/j-marjanovic/348499e6cae3622554a4.js"&gt;&lt;/script&gt;

&lt;p&gt;Let's have a look at other alternatives which were available but did not
completely suit my needs. I wanted a vector slicing syntax which is similar to
the one in SystemVerilog and it allows to catch the typos quickly.&lt;/p&gt;
&lt;h2&gt;bitstring&lt;/h2&gt;
&lt;p&gt;From their site: &lt;a href="https://pypi.python.org/pypi/bitstring/3.1.3"&gt;bitstring&lt;/a&gt; is a
pure Python module designed to help make the creation and analysis of binary
data as simple and natural as possible.&lt;/p&gt;
&lt;p&gt;It quick test finds two things which I would did not like: taking slice wider
than vector length pads the resulting vector with zeros and the slice indexes
are inverted compared to more used [higher_limit:lower_limit] notation in HDLs.
The output of the slicing is a closed interval, which is the behavior I would
expect.&lt;/p&gt;
&lt;h2&gt;BitArray&lt;/h2&gt;
&lt;p&gt;The first thing which comes in mind is that there is not an easy way to create a
bitarray and initialize it from int in a single step (using constructor). The
only way to initialize BitArray is to use binary-formated string. This requires
a call of bin() function and dropping first two characters if your data is
stored as an int. At this point one can already start thinking of implementing
it's own class. The slicing has the same behavior as bitstring, which I did not
like for the application I need.&lt;/p&gt;
&lt;h2&gt;A simple solution on Stack Overflow&lt;/h2&gt;
&lt;p&gt;There is a &lt;a href="http://stackoverflow.com/a/150411/4059686"&gt;similar solution&lt;/a&gt; already
posted on Stack Overflow, however it lacks an equality operator.&lt;/p&gt;
&lt;h2&gt;MyHDL&lt;/h2&gt;
&lt;p&gt;Since MyHDL is a way to write HDL with Python it comes as obvious choice to use
it in a simple Python parser. MyHDL has a &lt;em&gt;intbv&lt;/em&gt; data type which is very
similar to vectors in Verilog and VHDL. However, there are some minor things
which discouraged me from using it in my parser.&lt;/p&gt;
&lt;p&gt;Lets have a look at a modified version of the VerilogBits unit test:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;unittest&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;myhdl&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Testmyhdlintbv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;unittest&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TestCase&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_equality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assertEqual&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;myhdl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intbv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0xAB&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;myhdl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intbv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0xAB&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assertEqual&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;myhdl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intbv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0xAB&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;myhdl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intbv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0x0AB&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assertNotEqual&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;myhdl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intbv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0xAB&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;myhdl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intbv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0xCD&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_slicing&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;ab&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;myhdl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intbv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0xAB&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assertEqual&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ab&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;myhdl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intbv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0xA&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assertEqual&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ab&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;myhdl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intbv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0xB&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_unpack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;abcd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;myhdl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intbv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0xABCD&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;abcd&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;abcd&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;abcd&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;abcd&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assertEqual&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;myhdl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intbv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0xA&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assertEqual&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;myhdl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intbv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0xB&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assertEqual&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;myhdl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intbv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0xC&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assertEqual&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;myhdl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intbv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0xD&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_slice_up_vect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assertRaises&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ne"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;dummy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;myhdl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intbv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0xAB&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_invalid_slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assertRaises&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ne"&gt;IndexError&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;dummy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;myhdl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intbv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0xAB&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="vm"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;__main__&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;unittest&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;This results are 3 failing tests: test_invalid_slice, test_slicing, and
test_unpack. test_invalid_slice fails because taking a slice wider than a vector
width produces fills the missing bits with zero. This is similar to
SystemVerilog vector of bits, which is a 2-level data type (it can be only 0 or
1). I prefer more rigorous behavior when slicing vectors, since errors like that
can be quite hard to catch. The VerilogBits throws an exception when an invalid
slice is requested.&lt;/p&gt;
&lt;p&gt;If the zero padding problem with MyHDL is something I could live with, the other
two failing test are much more discouraging for someone who sometimes dreams
(System)Verilog. The &lt;a href="http://docs.myhdl.org/en/stable/manual/hwtypes.html#bit-slicing"&gt;bit slicing in
MyHDL&lt;/a&gt; is half-
open as is expected in Python and not a closed interval as expected from HDLs
(e.g.  to get the LSB one should write [8:0] instead of [7:0]). Again, this is
just a convention and the software world is using the half-open interval for
decades (&lt;a href="https://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF"&gt;E.W.Dijkstra: Why numbering should start at
zero&lt;/a&gt;). But if your
parser in Python is there to find bugs in your SystemVerilog code, it makes much
more sense to use the same notation in both languages.&lt;/p&gt;
&lt;h2&gt;SystemVerilog&lt;/h2&gt;
&lt;p&gt;The SystemVerilog provide all the necessary tools to effectively manipulate bits
(duh), but the Python with the generators, list comprehensions and dictionaries
(well, SystemVerilog does have associative array) is much more elegant language.
The ability to test commands on-the-fly in the interpreter is also much
welcomed.&lt;/p&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="Projects"></category><category term="Python"></category><category term="Verilog"></category><category term="FPGA"></category><category term="HDL"></category></entry><entry><title>Compilation of Linux kernel for Raspberry Pi</title><link href="www.j-marjanovic.io/compilation-of-linux-kernel-for-raspberry-pi.html" rel="alternate"></link><published>2015-03-01T19:00:00+01:00</published><updated>2015-03-01T19:00:00+01:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2015-03-01:www.j-marjanovic.io/compilation-of-linux-kernel-for-raspberry-pi.html</id><summary type="html">&lt;p&gt;Yesterday I got my Raspberry Pi 2, the evolution of the legendary Raspberry Pi. 
The evolution is the right word to describe what has changed compared to 
the previous version. The processor it is now a quad-core, it runs faster,
it has got a newer instruction set (ARMv7) and the …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Yesterday I got my Raspberry Pi 2, the evolution of the legendary Raspberry Pi. 
The evolution is the right word to describe what has changed compared to 
the previous version. The processor it is now a quad-core, it runs faster,
it has got a newer instruction set (ARMv7) and the board now incorporates 1GB
of RAM and 4 USB connector. The RCA jack is no longer present, the additional
place is being occupied by a larger extension header instead. A welcomed update
of already great hardware, in my opinion (for people complaining that it
can't play 4K video, run desktop version of the Windows, ..., well, consider
buying proper computer, RPi was meant to be cheep enough to tinker with
without fear of breaking something).&lt;/p&gt;
&lt;p&gt;I am the kind of the person which thinks that compiling Linux kernel would be 
great a way to spend the Sunday. Since one of my projects with RPi includes
custom driver for communication with FPGA, I needed Linux headers and &lt;em&gt;kbuild&lt;/em&gt;
system for building kernel modules (those who are not familiar with the procedure,
imagine that you are building a program linked with libraries; in this case a 
program is a kernel module and kernel and its modules represent the libraries).&lt;/p&gt;
&lt;p&gt;On the Ubuntu (and I think on other Debian systems but I am not 100% sure) you
can get kernel headers and source directly from the Debian repository, something
like:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;apt-get install linux-headers-&lt;span class="k"&gt;$(&lt;/span&gt;uname -r&lt;span class="k"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;The above command will install the headers the headers for the currently running 
kernel. It copies the header files to /usr/src and creates link in /lib/modules/,
this gives the user tools needed to compile kernel modules.&lt;/p&gt;
&lt;h2&gt;The hard way&lt;/h2&gt;
&lt;p&gt;Before you continue reading I would just like to inform you that the procedure 
described in this section did not yield desired result. Although I managed to 
get a kernel image, the &lt;em&gt;kbuild&lt;/em&gt; system did not work. In Thomas Alva Edison style
 I want to present a method which does not work, but at least we know it does not work. The working (and easier) procedure is described in the next section.&lt;/p&gt;
&lt;p&gt;Initially I wanted to get the headers in the same way as I would on Ubuntu, by
using apt (Advanced Packaging Tool). Searching for available packages did not
show the package needed (the Raspbian was running the 3.18.7-v7+ kernel).&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;pi@raspberrypi ~ $&lt;/span&gt; sudo aptitude search linux-headers
&lt;span class="go"&gt;v   linux-headers                                   -                                                           &lt;/span&gt;
&lt;span class="go"&gt;p   linux-headers-3.10-3-all                        - All header files for Linux 3.10 (meta-package)            &lt;/span&gt;
&lt;span class="go"&gt;p   linux-headers-3.10-3-all-armhf                  - All header files for Linux 3.10 (meta-package)            &lt;/span&gt;
&lt;span class="go"&gt;p   linux-headers-3.10-3-common                     - Common header files for Linux 3.10-3                      &lt;/span&gt;
&lt;span class="go"&gt;p   linux-headers-3.10-3-rpi                        - Header files for Linux 3.10-3-rpi                         &lt;/span&gt;
&lt;span class="go"&gt;p   linux-headers-3.12-1-all                        - All header files for Linux 3.12 (meta-package)            &lt;/span&gt;
&lt;span class="go"&gt;p   linux-headers-3.12-1-all-armhf                  - All header files for Linux 3.12 (meta-package)            &lt;/span&gt;
&lt;span class="go"&gt;p   linux-headers-3.12-1-common                     - Common header files for Linux 3.12-1                      &lt;/span&gt;
&lt;span class="go"&gt;p   linux-headers-3.12-1-rpi                        - Header files for Linux 3.12-1-rpi                         &lt;/span&gt;
&lt;span class="go"&gt;p   linux-headers-3.18.0-trunk-all                  - All header files for Linux 3.18 (meta-package)            &lt;/span&gt;
&lt;span class="go"&gt;p   linux-headers-3.18.0-trunk-all-armhf            - All header files for Linux 3.18 (meta-package)            &lt;/span&gt;
&lt;span class="go"&gt;p   linux-headers-3.18.0-trunk-common               - Common header files for Linux 3.18.0-trunk                &lt;/span&gt;
&lt;span class="go"&gt;p   linux-headers-3.18.0-trunk-rpi                  - Header files for Linux 3.18.0-trunk-rpi                   &lt;/span&gt;
&lt;span class="go"&gt;p   linux-headers-3.18.0-trunk-rpi2                 - Header files for Linux 3.18.0-trunk-rpi2                  &lt;/span&gt;
&lt;span class="go"&gt;p   linux-headers-3.2.0-4-all                       - All header files for Linux 3.2 (meta-package)             &lt;/span&gt;
&lt;span class="go"&gt;p   linux-headers-3.2.0-4-all-armhf                 - All header files for Linux 3.2 (meta-package)             &lt;/span&gt;
&lt;span class="go"&gt;p   linux-headers-3.2.0-4-common                    - Common header files for Linux 3.2.0-4                     &lt;/span&gt;
&lt;span class="go"&gt;p   linux-headers-3.2.0-4-rpi                       - Header files for Linux 3.2.0-4-rpi                        &lt;/span&gt;
&lt;span class="go"&gt;p   linux-headers-3.6-trunk-all                     - All header files for Linux 3.6 (meta-package)             &lt;/span&gt;
&lt;span class="go"&gt;p   linux-headers-3.6-trunk-all-armhf               - All header files for Linux 3.6 (meta-package)             &lt;/span&gt;
&lt;span class="go"&gt;p   linux-headers-3.6-trunk-common                  - Common header files for Linux 3.6-trunk                   &lt;/span&gt;
&lt;span class="go"&gt;p   linux-headers-3.6-trunk-rpi                     - Header files for Linux 3.6-trunk-rpi                      &lt;/span&gt;
&lt;span class="go"&gt;p   linux-headers-rpi                               - Header files for Linux rpi configuration (meta-package)   &lt;/span&gt;
&lt;span class="go"&gt;p   linux-headers-rpi-rpfv                          - This metapackage will pull in the headers for the raspbian&lt;/span&gt;
&lt;span class="go"&gt;p   linux-headers-rpi2-rpfv                         - This metapackage will pull in the headers for the raspbian&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;I tried my luck with &lt;em&gt;linux-headers-rpi2-rpfv&lt;/em&gt; which installed headers for
kernel 3.18.0. That might sound close enough, but Linux refuses modules which
have not been compiled with exactly the same same kernel. &lt;/p&gt;
&lt;p&gt;So I decided to recompile the kernel on my laptop running Ubuntu. The procedure
of compiling on one type of machine (in this case Intel x86_64 processor) for
another (in this case ARM processor) is called cross-compilation.&lt;/p&gt;
&lt;p&gt;The process of cross-compilation is described here: &lt;a href="http://elinux.org/Raspberry_Pi_Kernel_Compilation"&gt;Raspberry Pi Kernel Compilation&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Everything went fine, I got the kernel image and /lib/modules directory. The only
thing that the guide does not mention is that &lt;em&gt;make modules_install&lt;/em&gt; creates 
symbolic links to the original directory. I copied the entire kernel directory
to /usr/src and recreated symbolic links, so everything should be fine.&lt;/p&gt;
&lt;p&gt;At this point if everything went OK we would have a working kernel and its 
&lt;em&gt;kbuild&lt;/em&gt; system which gives us the tools to compile the kernel modules out of the
tree (in a separate directory). The RPi boots up, which is a good sign, meaning
that there is nothing wrong with the kernel image. &lt;/p&gt;
&lt;p&gt;However, when I try to compile the kernel module, I get the error saying that 
there is a syntax error in script in kernel module 
(the same problem as described &lt;a href="http://stackoverflow.com/questions/17282461/scripts-recordmcount-syntax-error-when-i-try-to-build-a-linux-kernel-module-o"&gt;here&lt;/a&gt;. The answer by Joe C states:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Anyway, if you cross compile you don't get a /usr/src/linux-header-x.x.x/scripts dir that's usable on your target system.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Since I previously downloaded another kernel headers from Debian repository,
I tired copying the scripts directory from there:&lt;/p&gt;
&lt;p&gt;pi@raspberrypi /usr/src/linux-headers-3.18.0-trunk-rpi2 $ sudo cp -rv scripts/ ../linux-sources-3.18.8-v7+/&lt;/p&gt;
&lt;p&gt;This fixed the compilation problem just to cause complete disaster when I
tried inserting the module. The following kernel message (&lt;em&gt;dmesg&lt;/em&gt; command)
shows nothing promising:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.164155&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Unable&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;kernel&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;paging&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;at&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;virtual&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;fe220b30&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.176292&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;pgd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b5dd8000&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.183812&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;fe220b30&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;pgd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.192196&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Internal&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;Oops&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;#1&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;PREEMPT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SMP&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ARM&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.202324&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Modules&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;linked&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ow"&gt;in&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;pi64_dev&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;O&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;snd_bcm2835&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;snd_soc_pcm512x_i2c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;snd_soc_wm8804&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;snd_soc_pcm512x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;snd_soc_tas5713&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;regmap_spi&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;regmap_i2c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;snd_soc_bcm2708_i2s&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;regmap_mmio&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;snd_soc_core&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;snd_compress&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;snd_pcm_dmaengine&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;snd_pcm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;snd_seq&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;snd_seq_device&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;snd_timer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;r8712u&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;snd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;spi_bcm2708&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i2c_bcm2708&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.239611&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;CPU&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;PID&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3468&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;Comm&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;insmod&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;Tainted&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;G&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;O&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="mf"&gt;3.18.8&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;v7&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;#1&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.252306&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b87aa840&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;ti&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b760a000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nl"&gt;ti&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b760a000&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.262985&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;PC&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;at&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;load_module&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mh"&gt;0x1a64&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mh"&gt;0x1f0c&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.272699&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;LR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;at&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;load_module&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mh"&gt;0x1a50&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mh"&gt;0x1f0c&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.282373&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;pc&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;&amp;lt;800940d4&amp;gt;&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;lr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;&amp;lt;800940c0&amp;gt;&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nl"&gt;psr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;90000013&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.282373&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sp&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b760be88&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="n"&gt;f110770&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;fp&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b760bf44&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.304401&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;r10&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;r9&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="n"&gt;f110600&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;r8&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;80536&lt;/span&gt;&lt;span class="n"&gt;d44&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.314883&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;r7&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b86dc8c4&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;r6&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;fe220b1c&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;r5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="n"&gt;f11060c&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;r4&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b760bf48&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.326685&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;r3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;r2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b760be70&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;r1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b7555280&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;r0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;807&lt;/span&gt;&lt;span class="n"&gt;f2a3c&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.338495&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;Flags&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;NzcV&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;IRQs&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;on&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;FIQs&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;on&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;Mode&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SVC_32&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;ISA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ARM&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;Segment&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;user&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.350916&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;Control&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="n"&gt;c5387d&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nc"&gt;Table&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;35&lt;/span&gt;&lt;span class="n"&gt;dd806a&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;DAC&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000015&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.361932&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Process&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;insmod&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nl"&gt;pid&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3468&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;limit&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xb760a238&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.373206&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;Stack&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0xb760be88&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xb760c000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.382808&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;be80&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;                   &lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="n"&gt;f11060c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00007&lt;/span&gt;&lt;span class="n"&gt;fff&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8009168&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ffffffff&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b760bee4&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bd949000&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.396352&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;bea0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="n"&gt;f11060c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b760bedc&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;807e4&lt;/span&gt;&lt;span class="n"&gt;d78&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="n"&gt;f110648&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b760a000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="n"&gt;f110770&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.409938&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;bec0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00001&lt;/span&gt;&lt;span class="n"&gt;f4b&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;806&lt;/span&gt;&lt;span class="n"&gt;c9bdc&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ba7b961c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b760a000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b9330000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;808&lt;/span&gt;&lt;span class="n"&gt;ab8fc&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;76&lt;/span&gt;&lt;span class="n"&gt;fd3000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.423538&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;bee0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.437137&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;bf00&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000080&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00001&lt;/span&gt;&lt;span class="n"&gt;f4b&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.450707&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;bf20&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;76&lt;/span&gt;&lt;span class="n"&gt;fd3000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;76&lt;/span&gt;&lt;span class="n"&gt;f91948&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000080&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8000&lt;/span&gt;&lt;span class="n"&gt;f0a4&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b760a000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b760bfa4&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b760bf48&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.464292&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;bf40&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;80094664&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8009267&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bd949000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00001&lt;/span&gt;&lt;span class="n"&gt;f4b&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bd949f3c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bd949de3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bd94acc4&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000850&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.477932&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;bf60&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000940&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0000001&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000020&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000017&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000014&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.491605&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;bf80&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000010&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="n"&gt;eb7861c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;7768&lt;/span&gt;&lt;span class="n"&gt;b038&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b760bfa8&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.505297&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;bfa0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8000&lt;/span&gt;&lt;span class="n"&gt;ee20&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;80094588&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="n"&gt;eb7861c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;76&lt;/span&gt;&lt;span class="n"&gt;fd3000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00001&lt;/span&gt;&lt;span class="n"&gt;f4b&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;76&lt;/span&gt;&lt;span class="n"&gt;f91948&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;76&lt;/span&gt;&lt;span class="n"&gt;fd3000&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.519052&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;bfc0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="n"&gt;eb7861c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;7768&lt;/span&gt;&lt;span class="n"&gt;b038&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000080&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;7768&lt;/span&gt;&lt;span class="n"&gt;af80&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00001&lt;/span&gt;&lt;span class="n"&gt;f4b&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;76&lt;/span&gt;&lt;span class="n"&gt;f91948&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.532810&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;bfe0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="n"&gt;eb785c4&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;76&lt;/span&gt;&lt;span class="n"&gt;f88fb4&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;76&lt;/span&gt;&lt;span class="n"&gt;ef3ab4&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;60000010&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;76&lt;/span&gt;&lt;span class="n"&gt;fd3000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d466662e&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d5f7abd0&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.546635&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;&amp;lt;800940d4&amp;gt;&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;load_module&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;&amp;lt;80094664&amp;gt;&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SyS_init_module&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mh"&gt;0xe8&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mh"&gt;0xfc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.560087&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;&amp;lt;80094664&amp;gt;&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SyS_init_module&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;&amp;lt;8000ee20&amp;gt;&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ret_fast_syscall&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mh"&gt;0x0&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mh"&gt;0x48&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.573935&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;Code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;e51bc088&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;e15c0006&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;e2466008&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="n"&gt;a000009&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e5963014&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 4246.604593&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;---[ end trace 8e4bd0982f9f1fd0 ]---&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;It basically says that there was an error trying to access the wrong part of the
memory from the &lt;em&gt;insmod&lt;/em&gt; process.&lt;/p&gt;
&lt;p&gt;At this point I decided to stop experimenting with this approach. I was getting
frustrated with constant fighting with all kind of errors. When I initially started
writing this blog post I wanted to describe the method to cross-compile Linux
kernel image and &lt;em&gt;kbuild&lt;/em&gt; build system. Unfortunately, although it may sound
appealing to speed up the process by cross compiling on x86 machine, the 
complication with various stuff make it more time consuming that compiling directly
on Raspberry Pi. I know myself good enough that I know I wont quit this easy.
I will probably return to this problem on some other occasion and tried again,
be assured that I will write a blog post If I succeed.&lt;/p&gt;
&lt;h2&gt;The easy way&lt;/h2&gt;
&lt;p&gt;At the end I dropped the cross-compilation idea and resorted to compiling kernel
on the RPi itself. This can also serve as a performance test of the new RPi. 
I overclocked it to 1000MHz (using raspi-config I selected the RPi2 setting).&lt;/p&gt;
&lt;p&gt;The compilation of kernel took something less than 2 hours, quite decent result
compared to 17 minutes it takes on my laptop with not-so-new i3 and 6GB of RAM.&lt;/p&gt;
&lt;p&gt;The following figure shows the temperature during the compilation. The temperature
was captured every minute with a Cron job which read from 
&lt;strong&gt;/sys/class/thermal/thermal_zone0/temp&lt;/strong&gt;&lt;/p&gt;
&lt;p style="width:90%; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;img alt="RPi temperature" src="www.j-marjanovic.io/images/pi_temp_during_compile.png"&gt;&lt;/p&gt;
&lt;p&gt;Finally, using freshly compiled kernel on RPi, I managed to get compile my 
module and to load it in kernel (debug message from kernel):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="go"&gt;[   38.324756] ============================================&lt;/span&gt;
&lt;span class="go"&gt;[   38.324786]        Pi64 driver by Jan Marjanovic        &lt;/span&gt;
&lt;span class="go"&gt;[   38.324796] &lt;/span&gt;
&lt;span class="go"&gt;[   38.324808]   built: Mar  1 2015 11:38:10&lt;/span&gt;
&lt;span class="go"&gt;[   38.324818] ============================================&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="Projects"></category><category term="Linux"></category><category term="Raspberry Pi"></category></entry><entry><title>Lattice iCEcube2 on Ubuntu 14.04</title><link href="www.j-marjanovic.io/lattice-icecube2-on-ubuntu-1404.html" rel="alternate"></link><published>2015-03-01T19:00:00+01:00</published><updated>2015-03-01T19:00:00+01:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2015-03-01:www.j-marjanovic.io/lattice-icecube2-on-ubuntu-1404.html</id><summary type="html">&lt;p&gt;/home/jan/opt/lscc/iCEcube2.2014.12/synpbase/bin/synplify_pro: 186: [: unexpected operator
/home/jan/opt/lscc/iCEcube2.2014.12/synpbase/bin/synplify_pro: 200: [: !=: argument expected
/home/jan/opt/lscc/iCEcube2.2014.12/synpbase/bin/c_hdl: 186: [: unexpected operator
/home/jan/opt/lscc/iCEcube2.2014.12/synpbase/bin/c_hdl: 200 …&lt;/p&gt;</summary><content type="html">&lt;p&gt;/home/jan/opt/lscc/iCEcube2.2014.12/synpbase/bin/synplify_pro: 186: [: unexpected operator
/home/jan/opt/lscc/iCEcube2.2014.12/synpbase/bin/synplify_pro: 200: [: !=: argument expected
/home/jan/opt/lscc/iCEcube2.2014.12/synpbase/bin/c_hdl: 186: [: unexpected operator
/home/jan/opt/lscc/iCEcube2.2014.12/synpbase/bin/c_hdl: 200: [: !=: argument expected
/home/jan/opt/lscc/iCEcube2.2014.12/synpbase/bin/syn_nfilter: 186: [: unexpected operator
/home/jan/opt/lscc/iCEcube2.2014.12/synpbase/bin/syn_nfilter: 200: [: !=: argument expected
/home/jan/opt/lscc/iCEcube2.2014.12/synpbase/bin/m_generic: 186: [: unexpected operator
/home/jan/opt/lscc/iCEcube2.2014.12/synpbase/bin/m_generic: 200: [: !=: argument expected
/home/jan/opt/lscc/iCEcube2.2014.12/synpbase/bin/m_generic: 186: [: unexpected operator&lt;/p&gt;
&lt;p&gt;sed -i 's/\/bin\/sh/\/bin\/bash/g' *&lt;/p&gt;
&lt;p&gt;jan@jan-ThinkPad-T510 ~/opt/lscc/iCEcube2.2014.12/synpbase/bin
 % for file in $(ls); do sed -i 's/\/bin\/sh/\/bin\/bash/g' $file; done    19:33:27 on 2015-03-07 
sed: couldn't edit config: not a regular file
jan@jan-ThinkPad-T510 ~/opt/lscc/iCEcube2.2014.12/synpbase/bin
 % &lt;/p&gt;
&lt;p&gt;for file in $(ls -la | grep -E '^[^d]' | awk -F "/" '{print $NF}' ); do echo $file; done &lt;/p&gt;
&lt;p&gt;for file in $(ls -la | tail -n +2 | grep -E '^[^d]' | awk '{print $NF}' ); do sed -i 's/\/bin\/sh/\/bin\/bash/g' $file; done &lt;/p&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="Projects"></category><category term="Lattice"></category><category term="iCE40"></category><category term="Linux"></category><category term="Ubuntu"></category></entry><entry><title>Lattice iCE40 configuration using Raspberry Pi</title><link href="www.j-marjanovic.io/lattice-ice40-configuration-using-raspberry-pi.html" rel="alternate"></link><published>2015-01-18T22:00:00+01:00</published><updated>2015-01-18T22:00:00+01:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2015-01-18:www.j-marjanovic.io/lattice-ice40-configuration-using-raspberry-pi.html</id><summary type="html">&lt;p&gt;As I mentioned in previous post, I started playing around with Lattice iCE40 FPGA. 
In the last post I did a quick overview of the 
iCE40 tools. The iCEcube2 cannot compete with Xilinx ISE and Altera Quartus II, 
not to mention the Vivado, but since this is a low-cost FPGA …&lt;/p&gt;</summary><content type="html">&lt;p&gt;As I mentioned in previous post, I started playing around with Lattice iCE40 FPGA. 
In the last post I did a quick overview of the 
iCE40 tools. The iCEcube2 cannot compete with Xilinx ISE and Altera Quartus II, 
not to mention the Vivado, but since this is a low-cost FPGA the current tool 
offers all you need to do this kind of simple projects (I would definitely recommend
the beginners to stay away from Lattice as it is not as user friendly as vendor X
or vendor A, you need to have some experience to master the work flow). &lt;/p&gt;
&lt;p&gt;Last time I found out that iCEcube2 Programmer runs only on Windows, on GNU/Linux 
you need to find other solutions. How the Programmer works is another interesting
thing. One would expect that is uses JTAG port on FPGA to configure it, but that is
not the case. The Programmer communicates with Atmel microcontroller which programs
Serial NOR Flash memory. Then it reset the FPGA which boots in SPI Master mode, and 
it reads configuration from Flash. &lt;/p&gt;
&lt;p&gt;A quick look at &lt;a href="http://www.latticesemi.com/~/media/Documents/ApplicationNotes/IK/iCE40ProgrammingandConfiguration.pdf?document_id=46502"&gt;TN1248: iCE40 Programming and Configuration&lt;/a&gt; 
shows that it can be programmed also by writing from 
another device (e.g. microprocessor to the SPI port). This is called SPI Slave 
programming mode and it is enabled by holding the line CS_n low at the reset of
the FPGA. &lt;/p&gt;
&lt;p style="width:600px; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;img alt="iCE40 development board and Raspberry Pi" src="www.j-marjanovic.io/images/ice40_rpi_conf.jpg"&gt;&lt;/p&gt;
&lt;p&gt;So I tried programming it using Raspberry Pi. I connected:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;grounds together&lt;/li&gt;
&lt;li&gt;SI on iCE40 to MOSI on RPi&lt;/li&gt;
&lt;li&gt;SO on iCE40 to MISO on RPi (this one is actually not needed)&lt;/li&gt;
&lt;li&gt;SCK on iCE40 to CLK on RPi&lt;/li&gt;
&lt;li&gt;pin GPIO25 on RPi to SS on iCE40 (this one is needed to enter the slave mode)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I wrote the following script and used one of the .bin files from one of the project
in iCEcube2. &lt;/p&gt;
&lt;script src="https://gist.github.com/j-marjanovic/cb271e861d279a31775d.js"&gt;&lt;/script&gt;

&lt;p&gt;This is the output (well, the real output it is the configured board which blinks the LEDs while
the DONE led is lit):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;pi@raspberrypi ~/Jan/ice40/test1 $&lt;/span&gt; sudo bash conf_FPGA.sh proto1_top_bitmap.bin
&lt;span class="go"&gt;GPIO 25 not exported, trying to export...&lt;/span&gt;

&lt;span class="go"&gt;spidev does not exist&lt;/span&gt;
&lt;span class="go"&gt;SPI driver not loaded, try to load it...&lt;/span&gt;
&lt;span class="go"&gt;OK: SPI driver loaded&lt;/span&gt;

&lt;span class="go"&gt;Changing direction to out&lt;/span&gt;
&lt;span class="go"&gt;out&lt;/span&gt;
&lt;span class="go"&gt;Setting output to low&lt;/span&gt;
&lt;span class="go"&gt;1&lt;/span&gt;

&lt;span class="go"&gt;Please power cycle the iCE40 FPGA board&lt;/span&gt;
&lt;span class="go"&gt;Press any key...&lt;/span&gt;

&lt;span class="go"&gt;Continuing with configuration procedure&lt;/span&gt;
&lt;span class="go"&gt;63+1 records in&lt;/span&gt;
&lt;span class="go"&gt;63+1 records out&lt;/span&gt;
&lt;span class="go"&gt;32300 bytes (32 kB) copied, 0.606931 s, 53.2 kB/s&lt;/span&gt;
&lt;span class="go"&gt;Setting output to high&lt;/span&gt;
&lt;span class="go"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;I find this type of configuration very useful when the FPGA is not the main
chip in the system (when there is as in the previous example an RPi). The 
configuration file can be stored on Raspberry Pi SD card and at each start-up
the FPGA gets programmed. The image update can be done very easily and there 
is no way a user can brick the FPGA (which can easily happen if the FPGA writes
the configuration image and boots from its Flash, in this case two images 
(factory and user) are recommended).&lt;/p&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="Projects"></category><category term="Lattice"></category><category term="iCE40"></category><category term="FPGA"></category><category term="Raspberry Pi"></category><category term="bash"></category></entry><entry><title>My first encounter with Lattice Semiconductor</title><link href="www.j-marjanovic.io/my-first-encounter-with-lattice-semiconductor.html" rel="alternate"></link><published>2014-12-24T20:00:00+01:00</published><updated>2014-12-24T20:00:00+01:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2014-12-24:www.j-marjanovic.io/my-first-encounter-with-lattice-semiconductor.html</id><summary type="html">&lt;p&gt;The FPGA market is one of those classical markets where there are two players 
who have nearly 100% market share, e.g. PC processors (Intel and AMD), graphic
cards (NVIDIA and ATI), ... The two mayor players on the FPGA market are Xilinx
and Altera. Both of this two companies follows …&lt;/p&gt;</summary><content type="html">&lt;p&gt;The FPGA market is one of those classical markets where there are two players 
who have nearly 100% market share, e.g. PC processors (Intel and AMD), graphic
cards (NVIDIA and ATI), ... The two mayor players on the FPGA market are Xilinx
and Altera. Both of this two companies follows their rival very closely.
When one company announces the next big thing, support for new technology or
improvement to their current products, the other will shortly follow with
similar announcement.&lt;/p&gt;
&lt;p&gt;This tempo of development has brought the two main programs of these two rivals
at very high level. There is a tool (Altera Qsys and Xilinx Vivado) which lets 
you build a system from standardized building blocks with standardized 
interfaces (Altera Avalon and ARM AXI). They both offer a large palette of IP
ready to be used with their program. The debugging of the designs is simplified
with good integration with simulators (Altera by default uses Mentor Graphic 
ModelSim, Xilinx has its own XSim, but it can also use ModelSim) and integrated 
logic analysers (Altera SignalTap and Xilinx ILA) are a great help when 
simulation works but real world FPGA does not.&lt;/p&gt;
&lt;p&gt;I have an idea for my next hobby project which needs an FPGA (actually, CPLD will
do just fine) for translating one communication protocol to another. The board
will be an expansion board for Raspberry Pi, so the cost should be really low 
(I have in mind something bellow 5$). Altera and Xilinx do not offer anything in
this price range, so I recalled Lattice ads. The iCE40 series are small FPGAs
with few K logic cells and low prices. Another advantage for my project is 
possibility to configure it by SPI, something that most CPLDs does not support.&lt;/p&gt;
&lt;p&gt;So I ordered &lt;a href="http://www.latticesemi.com/iceblink40-hx1k"&gt;iCEblink40HX1K Evaluation Kit&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;First I tried installing the iCEcube2 software on Centos 7 which is my usual
working environment. The installation went smoothly. The problem started to occur 
when the licence manager was unable to find network adapter MAC address:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="nv"&gt;Error&lt;/span&gt;&lt;span class="s s-Atom"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;License&lt;/span&gt; &lt;span class="s s-Atom"&gt;checkout&lt;/span&gt; &lt;span class="s s-Atom"&gt;failed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;


&lt;span class="nv"&gt;Invalid&lt;/span&gt; &lt;span class="s s-Atom"&gt;host&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
 &lt;span class="nv"&gt;The&lt;/span&gt; &lt;span class="s s-Atom"&gt;hostid&lt;/span&gt; &lt;span class="s s-Atom"&gt;of&lt;/span&gt; &lt;span class="s s-Atom"&gt;this&lt;/span&gt; &lt;span class="s s-Atom"&gt;system&lt;/span&gt; &lt;span class="s s-Atom"&gt;does&lt;/span&gt; &lt;span class="o"&gt;not&lt;/span&gt; &lt;span class="s s-Atom"&gt;match&lt;/span&gt; &lt;span class="s s-Atom"&gt;the&lt;/span&gt; &lt;span class="s s-Atom"&gt;hostid&lt;/span&gt;
 &lt;span class="s s-Atom"&gt;specified&lt;/span&gt; &lt;span class="s s-Atom"&gt;in&lt;/span&gt; &lt;span class="s s-Atom"&gt;the&lt;/span&gt; &lt;span class="s s-Atom"&gt;license&lt;/span&gt; &lt;span class="s s-Atom"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="nv"&gt;Feature&lt;/span&gt;&lt;span class="s s-Atom"&gt;:&lt;/span&gt;       &lt;span class="nv"&gt;LSC_ICECUBE2_A&lt;/span&gt;
&lt;span class="nv"&gt;Hostid&lt;/span&gt;&lt;span class="s s-Atom"&gt;:&lt;/span&gt;        &lt;span class="s s-Atom"&gt;xxxxxxxxxxxx&lt;/span&gt;
&lt;span class="nv"&gt;License&lt;/span&gt; &lt;span class="nn"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="s s-Atom"&gt;home&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="s s-Atom"&gt;jan&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="s s-Atom"&gt;opt&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="s s-Atom"&gt;lscc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="s s-Atom"&gt;license&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nn"&gt;dat&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="nv"&gt;FLEXnet&lt;/span&gt; &lt;span class="nv"&gt;Licensing&lt;/span&gt; &lt;span class="nn"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;57.&lt;/span&gt;  &lt;span class="nv"&gt;System&lt;/span&gt; &lt;span class="nv"&gt;Error&lt;/span&gt;&lt;span class="s s-Atom"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;19&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;(null)&amp;quot;&lt;/span&gt;
&lt;span class="nv"&gt;For&lt;/span&gt; &lt;span class="s s-Atom"&gt;further&lt;/span&gt; &lt;span class="s s-Atom"&gt;information&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s s-Atom"&gt;refer&lt;/span&gt; &lt;span class="s s-Atom"&gt;to&lt;/span&gt; &lt;span class="s s-Atom"&gt;the&lt;/span&gt; &lt;span class="nv"&gt;FLEXnet&lt;/span&gt; &lt;span class="nv"&gt;Licensing&lt;/span&gt; &lt;span class="nv"&gt;End&lt;/span&gt; &lt;span class="nv"&gt;User&lt;/span&gt; &lt;span class="nv"&gt;Guide&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="s s-Atom"&gt;available&lt;/span&gt; &lt;span class="s s-Atom"&gt;at&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;www.macrovision.com&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;The problem is that the FLEXnet software searches for the MAC address of the eth*
interface, while Centos uses completely different naming. Being familiar with the
problem, I used a trick to create a new interface named eth0 (you should replace
the x's with desired MAC address):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;modprobe dummy&lt;/span&gt;
&lt;span class="err"&gt;ip link set name eth0 dev dummy0&lt;/span&gt;
&lt;span class="err"&gt;ifconfig eth0 hw ether xx:xx:xx:xx:xx:xx&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;After setting everything up, a view on the iCEcube2 showed before me.&lt;/p&gt;
&lt;p&gt;&lt;img alt="iCEcube2" src="www.j-marjanovic.io/images/icestudio.png"&gt;&lt;/p&gt;
&lt;p&gt;It is a little Spartan but it has all basic functions the FPGA developer needs. The
text editor lacks code coloring, auto-complete but there is a chance to use a 3rd
party text editor. The text in the output window is light violet on white background
and it should be changed to something more visible.&lt;/p&gt;
&lt;p&gt;The iCE family of FPGAs lacks JTAG port. Instead, user programs SPI Flash memory
through an USB to SPI converter and then the FPGA boots from SPI memory. That means
that debugging is done or by using 4 LEDs on the side or with oscilloscope, not a 
very user friendly approach.&lt;/p&gt;
&lt;p&gt;Under Linux, there is additional problem:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;Note&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;The&lt;/span&gt; &lt;span class="n"&gt;Integrated&lt;/span&gt; &lt;span class="n"&gt;Aldec&lt;/span&gt; &lt;span class="n"&gt;Active&lt;/span&gt; &lt;span class="n"&gt;HDL&lt;/span&gt; &lt;span class="n"&gt;simulation&lt;/span&gt; &lt;span class="n"&gt;software&lt;/span&gt; &lt;span class="n"&gt;and&lt;/span&gt; &lt;span class="n"&gt;iCEcube2&lt;/span&gt; &lt;span class="n"&gt;programming&lt;/span&gt; 
&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="n"&gt;are&lt;/span&gt; &lt;span class="n"&gt;only&lt;/span&gt; &lt;span class="n"&gt;available&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;windows&lt;/span&gt; &lt;span class="n"&gt;platform&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Well this is it for now, i need to switch to Windows. Merry Christmas to everybody!&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;All trademarks and registered trademarks are the property of their respective owners.&lt;/p&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="Projects"></category><category term="Lattice"></category><category term="iCE40"></category><category term="FPGA"></category></entry><entry><title>Theremin First Demo</title><link href="www.j-marjanovic.io/theremin-first-demo.html" rel="alternate"></link><published>2014-11-18T23:00:00+01:00</published><updated>2014-11-18T23:00:00+01:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2014-11-18:www.j-marjanovic.io/theremin-first-demo.html</id><content type="html">&lt;p&gt;During the weekend I was able to take some time to do first test of the theremin.
Here are two recordings of my friend Luka playing.&lt;/p&gt;
&lt;iframe width="100%" height="450" scrolling="no" frameborder="no" src="https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/177559922&amp;amp;auto_play=false&amp;amp;hide_related=false&amp;amp;show_comments=true&amp;amp;show_user=true&amp;amp;show_reposts=false&amp;amp;visual=true"&gt;&lt;/iframe&gt;

&lt;iframe width="100%" height="450" scrolling="no" frameborder="no" src="https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/177561150&amp;amp;auto_play=false&amp;amp;hide_related=false&amp;amp;show_comments=true&amp;amp;show_user=true&amp;amp;show_reposts=false&amp;amp;visual=true"&gt;&lt;/iframe&gt;

&lt;p&gt;There are some more articles explaining how digital theremin works coming,
 remember to check out my blog.&lt;/p&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="Projects"></category><category term="Theremin"></category></entry><entry><title>I just got a Nintendo 64</title><link href="www.j-marjanovic.io/i-just-got-a-nintendo-64.html" rel="alternate"></link><published>2014-11-16T18:00:00+01:00</published><updated>2014-11-16T18:00:00+01:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2014-11-16:www.j-marjanovic.io/i-just-got-a-nintendo-64.html</id><content type="html">&lt;p&gt;I just got a Nintendo 64, my friend Rok was kind enough to lend it.&lt;/p&gt;
&lt;p style="width:700px; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;img alt="Nintendo 64" src="www.j-marjanovic.io/images/N64.png"&gt;&lt;/p&gt;
&lt;p&gt;I have an interesting project in mind, I will keep you updated.
Rok will for sure be the first one to get the alpha version.&lt;/p&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="Projects"></category><category term="Nintendo 64"></category></entry><entry><title>Theremin Antenna Measurements</title><link href="www.j-marjanovic.io/theremin-antenna-measurements.html" rel="alternate"></link><published>2014-11-09T09:00:00+01:00</published><updated>2014-11-09T09:00:00+01:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2014-11-09:www.j-marjanovic.io/theremin-antenna-measurements.html</id><summary type="html">&lt;p&gt;Last week I briefly explained how theremin works. I also presented my idea to develop a 
digital version, using FPGA as a detector of distance between hand and antenna.&lt;/p&gt;
&lt;p&gt;You have probably already heard a joke about theory and practice. &lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Theory is when you know everything but nothing works. Practice …&lt;/p&gt;&lt;/blockquote&gt;</summary><content type="html">&lt;p&gt;Last week I briefly explained how theremin works. I also presented my idea to develop a 
digital version, using FPGA as a detector of distance between hand and antenna.&lt;/p&gt;
&lt;p&gt;You have probably already heard a joke about theory and practice. &lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Theory is when you know everything but nothing works. Practice is when everything works
 but no one knows why. Here we combine the two, nothing works and no one knows why.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Jokes aside, today we will try to measure the time constant of the antenna. In this 
implementation of theremin the measured time constant will be a control for tone pitch. 
Having a good measurement of time constant is therefore very important. A good instrument 
will produce stable tone when hand is hold still. Also, the quality of measurement of 
time constant determines a difference between two consecutive tones and fine control 
of the pitch is also desired.&lt;/p&gt;
&lt;p&gt;Here you can see this highly advanced test - a ruler strapped to the antenna.
This will allow us to measure the relationship between time constant and distance of
the hand. &lt;/p&gt;
&lt;p style="width:441px; display: block; margin-left: auto; margin-right: auto;"&gt;&lt;img alt="Test setup" src="www.j-marjanovic.io/images/theremin_antena_meas.jpg"&gt;&lt;/p&gt;
&lt;p&gt;Since the ruler is made out of plastic (we should say dielectric, when studying electric
fields) it won't affect the antenna field.&lt;/p&gt;
&lt;p&gt;The FPGA is producing a square wave signal at 10kHz, which is then sent to antenna through
a 2.2 MOhm resistor. The resistor is on the other side connected to antenna, thus creating an
RC circuit. The voltage on antenna is feed back to FPGA through a Schmitt Trigger to improve
the measured value. The module in FPGA measures how much time did it take for voltage on
antenna to reach certain value (determined by Schmitt Trigger). This time is directly 
correlated to time constant and therefore to capacitance of the antenna.&lt;/p&gt;
&lt;p&gt;Here we see what the FPGA measured when I placed hand on different distance from antenna.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Time constant without filtering" src="www.j-marjanovic.io/images/theremin_antenna_direct.png"&gt;&lt;/p&gt;
&lt;p&gt;The blue dots denote mean value, the green lines denote one standard deviation and the
black lines denote minimum and maximum value.&lt;/p&gt;
&lt;p&gt;Right now, the measurements do not look very promising, we can see some increase of time
constant as hand approaches antenna, but the noise is extremely high.&lt;/p&gt;
&lt;p&gt;If we have a look at the frequency spectrum of the measurement, we see the reason for the
noise.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Time constant spectral analysis" src="www.j-marjanovic.io/images/theremin_antenna_direct_fft.png"&gt;&lt;/p&gt;
&lt;p&gt;The antenna is 45 cm long copper rod and acts not only as a capacitor for theremin but also
as radio antenna. All the frequency components above Nyquist frequency (which is 5000 kHz
in our case) are being aliased to lower frequencies in our frequency range. A 50Hz signal
from mains is also being picked up.&lt;/p&gt;
&lt;p&gt;We need a filter! A filter will reject all the undesired frequencies and that will greatly
improve.&lt;/p&gt;
&lt;p&gt;I will write another article on filtering in FPGA, meanwhile let's enjoy much improved 
results:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Time constant with filtering" src="www.j-marjanovic.io/images/theremin_antenna_filtered.png"&gt;&lt;/p&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="Projects"></category><category term="Theremin"></category></entry><entry><title>Theremin Basics</title><link href="www.j-marjanovic.io/theremin-basics.html" rel="alternate"></link><published>2014-11-06T22:00:00+01:00</published><updated>2014-11-06T22:00:00+01:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2014-11-06:www.j-marjanovic.io/theremin-basics.html</id><summary type="html">&lt;p&gt;Theremin, a first electronic instrument. Leon Theremin invented it in 1928. 
Try to imagine people seeing somebody waving hand in the middle or the air
and producing an extraterrestrial sounds. Theremin, should be considered a 
true pioneer of electronic music.&lt;/p&gt;
&lt;p&gt;The operational principle is quite simple, however an good implementation …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Theremin, a first electronic instrument. Leon Theremin invented it in 1928. 
Try to imagine people seeing somebody waving hand in the middle or the air
and producing an extraterrestrial sounds. Theremin, should be considered a 
true pioneer of electronic music.&lt;/p&gt;
&lt;p&gt;The operational principle is quite simple, however an good implementation is
not so trivial. I recall playing on it in Deutsches Museum in Munich and that
particular model had a sphere as an antenna. The tone frequency is dependant
on antenna capacitance. To achive maximum control, the relationship between
capacitance and distance between hand and antenna should be linear. However,
the model in Munich had reciprocal relationship and that made hitting right 
notes quite hard.&lt;/p&gt;
&lt;p&gt;I am going to make a new version of theremin, a reinterpretation for 21st
century. I will still use antenna to control the pitch, but the "back-end"
will be completely different. &lt;/p&gt;
&lt;p&gt;The antenna acts as one plate of the capacitor and the hand is the other.
I did a quick simulation in Python to be able to show some numbers. This is
meant to be a demonstration of operating principle, so it lacks few
not-so-minor details. The calculations were done in 2D, where antenna in 
real world exist in 3 dimensions. &lt;/p&gt;
&lt;p&gt;&lt;img alt="Theremin simulation" src="www.j-marjanovic.io/images/theremin_antenna.gif"&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="/drafts/theremin-simulation.html"&gt;Code&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Now we know that antenna acts as a capacitor, so next challenge is to
produce a tone with the pitch related to capacitance. The original Theremin
uses antenna as a capacitor of the LC resonator. Here is the first big
difference between my implementation and original one, I will be measuring
time constant of the RC circuit with an FPGA. &lt;/p&gt;
&lt;p&gt;The FPGA will generate a square wave and feed it to antenna through a
resistance. This will create a current which will charge capacitor (antenna).
The other circuit (module) in FPGA will measure time needed for voltage on 
capacitor to reach certain level. A bigger capacitance (smaller distance 
between hand and antenna)  will result in longer time and a smaller
capacitance (longer distance between hand and antenna) will result in shorter
time. This difference will generate a different pitch.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Theremin settling time" src="www.j-marjanovic.io/images/theremin_settling_time.png"&gt;&lt;/p&gt;
&lt;p&gt;This is all for this part, stay tuned for more.&lt;/p&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="Projects"></category><category term="Theremin"></category></entry><entry><title>Hello world</title><link href="www.j-marjanovic.io/hello-world.html" rel="alternate"></link><published>2014-11-02T13:00:00+01:00</published><updated>2014-11-02T13:00:00+01:00</updated><author><name>Jan Marjanovic</name></author><id>tag:None,2014-11-02:www.j-marjanovic.io/hello-world.html</id><summary type="html">&lt;p&gt;Hi and welcome to my blog.&lt;/p&gt;
&lt;p&gt;This is a "hello world" style post, I am curently testing pelican static blog generator.
Since I don't know how much time I will have to manage this blog I prefer that everything 
is static. If I some day stop writing this blog, at …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Hi and welcome to my blog.&lt;/p&gt;
&lt;p&gt;This is a "hello world" style post, I am curently testing pelican static blog generator.
Since I don't know how much time I will have to manage this blog I prefer that everything 
is static. If I some day stop writing this blog, at least there won't be any security threats
due to out-of-date web page.&lt;/p&gt;
&lt;p&gt;This blog will be about one of my biggest passions, electronics and technology in general.
Since I was a kid I was curious about electronic devices and to this day I always enjoy
taking thing appart (and sometimes also together, that is what I am being paid for). &lt;/p&gt;
&lt;p&gt;This is all for now, let's see if it "compiles"...&lt;/p&gt;&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;</content><category term="Misc"></category></entry></feed>