1
0
mirror of https://github.com/antonblanchard/microwatt.git synced 2026-01-13 07:09:54 +00:00

1008 Commits

Author SHA1 Message Date
Michael Neuling
2224b28c2c
Merge pull request #324 from paulusmack/master
Performance and timing improvements
2021-09-13 12:11:24 +10:00
Paul Mackerras
54b0e8b8c8 core: Predict not-taken conditional branches using BTC
This adds a bit to the BTC to store whether the corresponding branch
instruction was taken last time it was encountered.  That lets us pass
a not-taken prediction down to decode1, which for backwards direct
branches inhibits it from redirecting fetch to the target of the
branch.  This increases coremark by about 2%.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-09-11 13:29:02 +10:00
Paul Mackerras
0cdaa2778f xilinx-mult: Move some registers later in the data flow
This changes s0 to use the P register rather than the A/B/C input
registers, thus improving the timing of the multiplier output.  The
m00, m02 and m03 multipliers now use their P registers rather than the
M registers, moving the addition they do from the second cycle to the
first.

Also, the XOR that inverts the 32 LSBs is moved before the output
register.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-09-11 13:27:46 +10:00
Paul Mackerras
77d9891d2f
Merge pull request #326 from antonblanchard/dcache-nc-fix
dcache: Loads from non-cacheable PTEs load entire 64 bits
2021-09-11 11:57:04 +10:00
Anton Blanchard
39e1d10069
Merge pull request #325 from paulusmack/fixes
decode1: Fix form of isel marked as single-issue
2021-09-10 21:04:08 +10:00
Anton Blanchard
b29c58f3d1 dcache: Loads from non-cacheable PTEs load entire 64 bits
A non-cacheable load should only load the data requested and no more. We
do the right thing for real mode cache inhibited storage instructions,
but when loading through a non-cacheable PTE we load the entire 64 bits
regardless of the size.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2021-09-10 20:51:53 +10:00
Paul Mackerras
d4cfdb1bfe decode1: Fix form of isel marked as single-issue
The row in the decode table for isel with BC=0 was inadvertently left
marked as single-issue by commit 813f8340127f ("Add CR hazard
detection", 2019-10-15).  Fix it.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-09-10 09:38:49 +10:00
Michael Neuling
09bd01a49e
Merge pull request #323 from paulusmack/fixes
More instruction fixes
2021-09-06 17:17:37 +10:00
Paul Mackerras
06e07c69a8 decode1: Fix maddld and maddhdu to not set CR0
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-09-06 09:16:16 +10:00
Paul Mackerras
a68921edca core: Fix mcrxrx, addpcis and bpermd
- mcrxrx put the bits in the wrong order

- addpcis was setting CR0 if the instruction bit 0 = 1, which it
  shouldn't

- bpermd was producing 0 always and additionally had the wrong bit
  numbering

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-09-06 09:16:13 +10:00
Michael Neuling
18eb029f0a
Merge pull request #322 from paulusmack/fixes
Fix bug with load hitting two previous stores
2021-09-03 12:07:06 +10:00
Paul Mackerras
ba34914465 tests/misc: Add a test for a load that hits two preceding stores
This checks that the store forwarding machinery in the dcache
correctly combines forwarded stores when they are partial stores
(i.e. only writing part of the doubleword, as for a byte store).

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-09-03 09:09:44 +10:00
Paul Mackerras
0b23a5e760 dcache: Simplify data input to improve timing
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-09-03 09:08:51 +10:00
Paul Mackerras
1a9834c506 dcache: Fix bug with forwarding of stores
We have two stages of forwarding to cover the two cycles of latency
between when something is written to BRAM and when that new data can
be read from BRAM.  When the writes to BRAM result from store
instructions, the write may write only some bytes of a row (8 bytes)
and not others, so we have a mask to enable only the written bytes to
be forwarded.  However, we only forward written data from either the
first stage of forwarding or the second, not both.  So if we have
two stores in succession that write different bytes of the same row,
and then a load from the row, we will only forward the data from the
second store, and miss the data from the first store; thus the load
will get the wrong value.

To fix this, we make the decision on which forward stage to use for
each byte individually.  This results in a 4-input multiplexer feeding
r1.data_out, with its inputs being the BRAM, the wishbone, the current
write data, and the 2nd-stage forwarding register.  Each byte of the
multiplexer is separately controlled.  The code for this multiplexer
is moved to the dcache_fast_hit process since it is used for cache
hits as well as cache misses.

This also simplifies the BRAM code by ensuring that we can use the
same source for the BRAM address and way selection for writes, whether
we are writing store data or cache line refill data from memory.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-09-03 08:59:10 +10:00
Paul Mackerras
f812832ad7 dcache: Move way selection and forwarding earlier
This moves the way multiplexer for the data from the BRAM, and the
multiplexers for forwarding data from earlier stores or refills,
before a clock edge rather than after, so that now the data output
from the dcache comes from a clean latch.  To do this we remove the
extra latch on the output of the data BRAM (i.e. ADD_BUF=false) and
rearrange the logic.  The choice whether to forward or not now depends
not on way comparisons but rather on a tag comparisons, for the sake
of timing.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-08-28 20:11:55 +10:00
Michael Neuling
a9e6263aab
Merge pull request #319 from antonblanchard/verilator-ci
Add some Verilator CI tests
2021-08-17 12:31:12 +10:00
Michael Neuling
8bbb0018b4
Merge pull request #318 from paulusmack/pmu
PMU enhancements
2021-08-16 13:08:24 +10:00
Michael Neuling
05fe709ffc
Merge pull request #320 from antonblanchard/litedram-regenerate
litedram: Regenerate from upstream litex
2021-08-16 09:23:27 +10:00
Anton Blanchard
c0f7f54276 litedram: Regenerate from upstream litex
This adds Joel's sdcard feature reporting to the firmware.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2021-08-15 06:20:22 +10:00
Anton Blanchard
ee38a31152 ci: Add verilator tests
Now we have some verilator tests, add them to the CI.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2021-08-14 19:47:03 +10:00
Anton Blanchard
c81583c128 makefile: Check environment for MEMORY_SIZE/RAM_INIT_FILE
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2021-08-14 19:46:48 +10:00
Anton Blanchard
efb387b0d2 makefile: Add some verilator micropython tests
These are the same micropython tests we use against the ghdl
simulation.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2021-08-14 19:36:29 +10:00
Anton Blanchard
8acd5a5607 verilator: Specify top level module
While verilator finds the correct top level module with the current
setup, if we start adding simulation models it can get confused.

Explicitly specify the top level module.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2021-08-14 19:35:58 +10:00
Anton Blanchard
7e2de602ee makefile: Simplify microwatt-verilator target, add Docker image
Recent versions of verilator support the --build option, allowing
us to remove a step.

Also add a Docker image for verilator.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2021-08-14 19:35:53 +10:00
Paul Mackerras
65c43b488b PMU: Add several more events
This implements most of the architected PMU events.  The ones missing
are mostly the ones that depend on which level of the cache hierarchy
data is fetched from.  The events implemented here, and their raw
event codes, are:

    Floating-point operation completed (100f4)
    Load completed (100fc)
    Store completed (200f0)
    Icache miss (200fc)
    ITLB miss (100f6)
    ITLB miss resolved (400fc)
    Dcache load miss (400f0)
    Dcache load miss resolved (300f8)
    Dcache store miss (300f0)
    DTLB miss (300fc)
    DTLB miss resolved (200f6)
    No instruction available and none being executed (100f8)
    Instruction dispatched (200f2, 300f2, 400f2)
    Taken branch instruction completed (200fa)
    Branch mispredicted (400f6)
    External interrupt taken (200f8)

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-08-14 17:35:55 +10:00
Paul Mackerras
8cdb00652b
Merge pull request #316 from antonblanchard/verilator-fix
Rename 'do' signal to avoid verilator System Verilog warning
2021-08-14 10:18:42 +10:00
Paul Mackerras
71af8016da
Merge pull request #317 from antonblanchard/gpio-fix
gpio: Add HAS_GPIO to avoid verilator build errors
2021-08-14 10:18:12 +10:00
Paul Mackerras
e33fb26e7a PMU: Fix PMC5/6 behaviour when MMCR0[PMCC] = 11
The architecture states that when MMCR0[PMCC] = 0b11, PMC5 and PMC6
are not part of the Performance Monitor, meaning that they are not
controlled by bits in MMCRs, and counter negative conditions in PMCs 5
and 6 don't generate Performance Monitor alerts, exceptions or
interrupts.  It doesn't say that PMC5 and PMC6 are frozen in this
case, so presumably they should continue to count run instructions and
run cycles.

This implements that behaviour.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-08-13 19:50:59 +10:00
Anton Blanchard
591e96d1a2 gpio: Add HAS_GPIO to avoid verilator build errors
The verilator build fails with warnings and errors, because NGPIO
is 0 and we do things like:

        gpio_out : out std_ulogic_vector(NGPIO - 1 downto 0);

Set NGPIO to something reasonable (eg 32) and add HAS_GPIO to avoid
building the macro entirely if it isn't in use.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2021-08-13 14:51:15 +10:00
Anton Blanchard
bc0f7cf236 Rename 'do' signal to avoid verilator System Verilog warning
Experimenting with using ghdl to do VHDL to Verilog conversion (instead
of ghdl+yosys), verilator complains that a signal is a SystemVerilog
keyword:

%Error: microwatt.v:15013:18: Unexpected 'do': 'do' is a SystemVerilog keyword misused as an identifier.
        ... Suggest modify the Verilog-2001 code to avoid SV keywords, or use `begin_keywords or --language.

We could probably make this go away by disabling SystemVerilog, but
it's easy to rename the signal in question. Rename di at the same
time.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2021-08-13 13:52:51 +10:00
Michael Neuling
2bd00f5119
Merge pull request #315 from paulusmack/pmu
Add basic PMU implementation
2021-08-12 08:24:05 +10:00
Paul Mackerras
1896e5f803
Merge pull request #314 from antonblanchard/yosys-go-fast-bits
Reduce Yosys ECP5 cell usage by 30% with -abc9 -nowidelut
2021-08-11 16:26:29 +10:00
Michael Neuling
400e481ffa
Merge pull request #313 from paulusmack/fixes
Fix bug causing FP unavailable interrupts to be missed
2021-08-11 16:25:35 +10:00
Paul Mackerras
a7873b45f7 core: Add a basic performance monitor unit (PMU) implementation
This is the start of an implementation of a PMU according to PowerISA
v3.0B.  Things not implemented yet include most architected events,
the BHRB, event-based branches, thresholding, MMCR0[TBCC] field, etc.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-08-11 16:05:47 +10:00
Anton Blanchard
6254bb5ee9 Reduce Yosys ECP5 cell usage by 30% with -abc9 -nowidelut
We've been investigating why the barrel rotator uses an enormous
number of cells on the yosys ECP5 target. Eventually it was narrowed
down to the -abc9 -nowidelut options, which see the cell count go from
4985 cells to 841 cells.

Using the same options on an Orange Crab build reduces the cell count
from 50864 to 36085. The main differences:

     LUT4                        31040 -> 25270
     PFUMX                        6956 ->     0
     L6MUX21                      1746 ->     0
     CCU2C                        2066 ->  1759

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2021-08-11 14:49:30 +10:00
Michael Neuling
aa4e4e77c4
Merge pull request #311 from antonblanchard/litesdcard-nexys-video
Update litesdcard from upstream and add Nexys Video support
2021-08-11 09:48:07 +10:00
Michael Neuling
65c131e89f
Merge pull request #312 from shenki/sdcard-soc-features
litedram: Add sdcard to soc features
2021-08-11 09:33:36 +10:00
Paul Mackerras
f40842d9b2 tests/fpu: Test FPU unavailable interrupt following a load
This adds a load before a floating-point load which should generate a
floating-point unavailable interrupt, to test for the bug where
unavailability interrupts can get dropped while loadstore1 is
executing instructions.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-08-10 20:13:26 +10:00
Joel Stanley
bc3995804f litedram: Add sdcard to soc features
Signed-off-by: Joel Stanley <joel@jms.id.au>
2021-08-10 18:49:29 +09:30
Paul Mackerras
64e3ce7134 execute1: Handle interrupts during sequences of load/store operations
At present the logic prevents any interrupts from being handled while
there is a load/store instruction (one that has unit=LDST) being
executed.  However, load/store instructions can still get sent to
loadstore1.  Thus an instruction which should generate an interrupt
such as a floating-point unavailable interrupt will instead get
executed.

To fix this, when we detect that an interrupt should be generated but
loadstore1 is still executing a previous instruction, we don't execute
any new instructions, and set a new r.intr_pending flag.  That results
in busy_out being asserted (meaning that no further instructions will
come in from decode2).  When loadstore1 has finished the instructions
it has, the interrupt gets sent to writeback.  If one of the
instructions in loadstore1 generates an interrupt in the meantime, the
l_in.interrupt signal gets asserted and that clears r.intr_pending, so
the interrupt we detected gets discarded.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-08-10 19:18:45 +10:00
Anton Blanchard
7cfbcd5514 litesdcard: Add Nexys Video support
This board has a reset line that needs to be held low to power up the
SD card hardware.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2021-08-10 14:08:22 +10:00
Anton Blanchard
9caaa3fc46 litesdcard: Use vendor not board type
litesdcard provides a macro per vendor (eg xilinx, lattice) and not per
board, so modify the fusesoc generator to take a vendor. This will make
it easier to add litesdcard to more boards.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2021-08-10 14:08:22 +10:00
Paul Mackerras
c198b2b82e
Merge pull request #310 from antonblanchard/liteeth-update-2
Update liteeth from upstream and add Nexys Video support
2021-08-10 14:04:55 +10:00
Anton Blanchard
34e10cc52c liteeth: Regenerate from upstream litex
Unfortunately the CSR layout has shifted on upstream litex, so this
is built with the following litex patch backed out:

aad56a047a33 ("integration/soc: Use CSR automatic allocation.")

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2021-08-10 11:49:36 +10:00
Anton Blanchard
12efb51bcc liteeth: Update yaml config
csr_data_width is no longer required. Add ntxslots and nrxslots
parameters but set them to the default value.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2021-08-09 21:21:08 +10:00
Anton Blanchard
458dfe01a6 Add liteeth support to Nexys Video
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2021-08-09 21:21:04 +10:00
Michael Neuling
cf6df4f17f
Merge pull request #307 from antonblanchard/litedram-update
Updating to latest upstream litedram and some cleanups to associated scripts
2021-08-09 15:42:05 +10:00
Michael Neuling
69a1440204
Merge pull request #309 from antonblanchard/clk-cleanup
Small cleanups to clock definitions
2021-08-09 15:31:09 +10:00
Michael Neuling
b885ee7ed1
Merge pull request #308 from antonblanchard/small-fixes
Fix some whitespace issues
2021-08-09 14:25:06 +10:00
Anton Blanchard
75e06a1e30 Remove -add from xdc files
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2021-08-09 13:30:26 +10:00