1
0
mirror of https://github.com/antonblanchard/microwatt.git synced 2026-01-26 11:52:09 +00:00
Commit Graph

316 Commits

Author SHA1 Message Date
Benjamin Herrenschmidt
79101041d6 wishbone: Add stall signal
Pipelined wishbone needs it

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2019-10-30 13:18:58 +11:00
Benjamin Herrenschmidt
559b3bcf2d pp_uart: reformat
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2019-10-30 13:18:58 +11:00
Anton Blanchard
9620a76281 Merge pull request #115 from antonblanchard/reduce-wishbone
Reduce wishbone
2019-10-25 17:10:01 +11:00
Anton Blanchard
247d7d4aa0 Merge pull request #113 from mikey/exec-sim-remove
Remove SIM generic from execute1
2019-10-25 15:52:24 +11:00
Anton Blanchard
1b6c246379 Merge pull request #114 from antonblanchard/dcache
Dcache from Ben
2019-10-25 15:49:33 +11:00
Michael Neuling
bd4ac06243 Remove SIM generic from execute1
This does nothing, so remove.

Signed-off-by: Michael Neuling <mikey@neuling.org>
2019-10-25 10:10:34 +11:00
Benjamin Herrenschmidt
6dd0b514ac Reduce wishbone address size to 32-bit
For now ... it reduces the routing pressure on the FPGA

This needs manual adjustment of the address decoder in soc.vhdl, at
least until I can figure out how to deal with std_match

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

# Conflicts:
#	soc.vhdl

# Conflicts:
#	soc.vhdl
2019-10-23 12:37:16 +11:00
Benjamin Herrenschmidt
1a63c39704 Make it possible to change wishbone address size
All that needs to be changed now is the size in wishbone_types.vhdl
and the address decoder in soc.vhdl

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2019-10-23 12:37:16 +11:00
Benjamin Herrenschmidt
cb4451498f dcache: Add testbench
A very simple one for now...

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2019-10-23 12:37:00 +11:00
Benjamin Herrenschmidt
742b21480e insn: Simplistic implementation of icbi
We don't yet have a proper snooper for the icache, so for now make
icbi just flush the whole thing

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2019-10-23 12:31:20 +11:00
Benjamin Herrenschmidt
a0d95e791e insn: Implement isync instruction
The instruction works by redirecting fetch to nia+4 (hopefully using
the same adder used to generate LR) and doing a backflush. Along with
being single issue, this should guarantee that the next instruction
only gets fetched after the pipe's been emptied.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2019-10-23 12:31:16 +11:00
Benjamin Herrenschmidt
6e0ee0b0db icache & dcache: Fix store way variable
We used the variable "way" in the wrong state in the cache when
updating a line valid bit after the end of the wishbone transactions,
we need to use the latched "store_way".

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2019-10-23 12:30:49 +11:00
Benjamin Herrenschmidt
587a5e3c45 dcache: Cleanup (mostly cosmetic)
Clearly separate the 2 stages of load hits, improve naming and
comments, clarify the writeback controls etc...

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2019-10-23 12:30:49 +11:00
Benjamin Herrenschmidt
265fbf894b icache/dcache: Make both caches 32 lines, 2 ways
Adding lines seems to add only little extra as the BRAMs aren't
full, 2 ways is our current comprimise to limit pressure on small
FPGAs. We could go to 64 lines for a little more, but timing is
becoming a bit too right to my linking on the tags/LRU path of
the icache, so let's leave it at 32 for now.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2019-10-23 12:30:49 +11:00
Benjamin Herrenschmidt
174378b190 dcache: Introduce an extra cycle latency to make timing
This makes the BRAMs use an output buffer, introducing an extra
cycle latency. Without this, Vivado won't make timing at 100Mhz.

We stash all the necessary response data in delayed latches, the
extra cycle is NOT a state in the state machine, thus it's fully
pipelined and doesn't involve stalling.

This introduces an extra non-pipelined cycle for loads with update
to avoid collision on the writeback output between the now delayed
load data and the register update. We could avoid it by moving
the register update in the pipeline bubble created by the extra
update state, but it's a bit trickier, so I leave that for a latter
optimization.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2019-10-23 12:30:49 +11:00
Benjamin Herrenschmidt
b513f0fb48 dcache: Add a dcache
This replaces loadstore2 with a dcache

The dcache unit is losely based on the icache one (same basic cache
layout), but has some significant logic additions to deal with stores,
loads with update, non-cachable accesses and other differences due to
operating in the execution part of the pipeline rather than the fetch
part.

The cache is store-through, though a hit with an existing line will
update the line rather than invalidate it.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2019-10-23 12:30:49 +11:00
Benjamin Herrenschmidt
7b3df7cb05 icache: Reduce simulation warnings
This might slightly increase the logic in synthesis but avoids
us looking at uninitialized tags when not servicing an active
request

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2019-10-23 12:30:49 +11:00
Benjamin Herrenschmidt
a38ae503ff cache_ram: Add write-enables
They will be needed by the dcache

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2019-10-23 12:30:49 +11:00
Benjamin Herrenschmidt
e598188aca plru: Improve sensitivity list
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2019-10-23 12:30:49 +11:00
Anton Blanchard
b963f8a6af Merge pull request #112 from hughhalf/patch-1
Minor tweaks to README.md
2019-10-21 20:15:37 +11:00
Hugh
96b7f17e52 Minor tweaks to README.md
Few tweaks based on a newcomers experience getting an Arty A7-100 up and running

Forgot to add DCO in initial PR, now corrected.

Signed-off-by: Hugh Blemings <hugh@blemings.org>
2019-10-21 17:56:53 +11:00
Anton Blanchard
326dec4b3b Merge pull request #110 from antonblanchard/misc
icache_tb: Improve test and include test file
2019-10-20 10:09:42 +11:00
Benjamin Herrenschmidt
f74e8a4f79 icache_tb: Improve test and include test file
The icache_test.bin file was missing. This adds it (along with a python3
script to generate it).

We also add better reporting on errors

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2019-10-18 16:41:05 +11:00
Anton Blanchard
900c131083 Merge pull request #109 from antonblanchard/misc
Misc updates from Ben
2019-10-17 17:37:49 +11:00
Anton Blanchard
e67924f55e isel takes a CR bit, not a CR field
Fix a GHDL assert in isel.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2019-10-17 17:16:09 +11:00
Benjamin Herrenschmidt
60b05ee1e5 common: Reformat
No code change

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2019-10-17 10:08:05 +11:00
Benjamin Herrenschmidt
bddc9327cc execute1: Remove mux on "write_data" and "rc" outputs
Only "write_enable" needs to change, this shrinks the core a bit more

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2019-10-17 10:08:04 +11:00
Benjamin Herrenschmidt
da0bd89c43 crhelpers: Constraint "crnum" integer
This seems to save quite a few LUTs

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2019-10-17 10:06:10 +11:00
Benjamin Herrenschmidt
4437487ad0 execute1: Reformat
No functional change

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2019-10-17 10:06:09 +11:00
Benjamin Herrenschmidt
858b1e7930 writeback: Remove a mux leg on data_in
Initialize to 0 forces the mux to have an extra leg fed with zeros.

Instead initialize data_in to one of the mux inputs

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2019-10-17 10:04:25 +11:00
Anton Blanchard
4433118c91 Merge pull request #105 from paulusmack/writeback
Writeback
2019-10-17 07:40:36 +11:00
Paul Mackerras
57b200d6cb writeback: Eliminate inferred latch
This initializes data_in to all zeroes so that it doesn't become a
set of 64 inferred latches.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-16 07:56:15 +11:00
Anton Blanchard
640af89e72 Merge pull request #106 from paulusmack/master
wishbone_debug_master: Improve timing
2019-10-15 21:05:10 +11:00
Paul Mackerras
a27ed0ec27 wishbone_debug_master: Improve timing
The current code has the possibility that we could set reg_addr
or reg_ctrl and then increment reg_addr in the same cycle, resulting
in some long timing paths.  Rearrange the code to make it clear
that we are not trying to add an auto-increment to data from
outside the module; in any given cycle we either set one of
reg_addr and reg_ctrl, or we possibly increment reg_addr.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-15 18:16:07 +11:00
Paul Mackerras
f49a5a99a5 Remove execute2 stage
Since the condition setting got moved to writeback, execute2 does
nothing aside from wasting a cycle.  This removes it.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-15 16:37:22 +11:00
Anton Blanchard
63f5dce820 Merge pull request #104 from paulusmack/master
Implement neg using OP_ADD
2019-10-15 16:17:12 +11:00
Paul Mackerras
9646fe28b0 Do sign-extension instructions in writeback instead of execute1
This makes the exts[bhw] instructions do the sign extension in the
writeback stage using the sign-extension logic there instead of
having unique sign extension logic in execute1.  This requires
passing the data length and sign extend flag from decode2 down
through execute1 and execute2 and into writeback.  As a side bonus
we reduce the number of values in insn_type_t by two.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-15 15:27:00 +11:00
Paul Mackerras
374f4c536d writeback: Do data formatting and condition recording in writeback
This adds code to writeback to format data and test the result
against zero for the purpose of setting CR0.  The data formatter
is able to shift and mask by bytes and do byte reversal and sign
extension.  It can also put together bytes from two input
doublewords to support unaligned loads (including unaligned
byte-reversed loads).

The data formatter starts with an 8:1 multiplexer that is able
to direct any byte of the input to any byte of the output.  This
lets us rotate the data and simultaneously byte-reverse it.
The rotated/reversed data goes to a register for the unaligned
cases that overlap two doublewords.  Then there is per-byte logic
that does trimming, sign extension, and splicing together bytes
from a previous input doubleword (stored in data_latched) and the
current doubleword.  Finally the 64-bit result is tested to set
CR0 if rc = 1.

This removes the RC logic from the execute2, multiply and divide
units, and the shift/mask/byte-reverse/sign-extend logic from
loadstore2.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-15 15:23:28 +11:00
Anton Blanchard
45271acb35 Merge pull request #103 from paulusmack/divider
Divider
2019-10-15 15:20:34 +11:00
Paul Mackerras
86c53aa3f7 Implement neg using OP_ADD
We have all the machinery in place to implement the neg instruction
as OP_ADD.  Doing that means we can ditch OP_NEG, and saves about
66 slice LUTs on the A7-100.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-15 15:06:50 +11:00
Paul Mackerras
82c19d4e7a divider: Reduce delay in detecting 32-bit overflow
Timing analysis showed that even with the output register, timing
was still a bit tight in the output stage, where the carry has to
propagate all the way through the 64-bit negater, and we were then
testing the top 33 bits to determine if a 32-bit operation had
overflowed.

Instead of detecting overflow at the end, we watch for any 1
bits getting shifted into the top 32 bits of the quotient register
as we are doing the division.  That is relatively easy to do and
simplifies the output stage.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-15 14:59:15 +11:00
Anton Blanchard
6c4edf80ae Merge pull request #102 from antonblanchard/gpr-hazard-5-c
Add CR hazard detection
2019-10-15 12:49:06 +11:00
Anton Blanchard
813f834012 Add CR hazard detection
To keep things simple we treat the CR as a single entity.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2019-10-15 12:05:30 +11:00
Anton Blanchard
58b348deae Merge pull request #101 from antonblanchard/gpr-hazard-5-b
Add GPR hazard detection
2019-10-15 11:22:48 +11:00
Paul Mackerras
c7025f9f28 divider: Add an output register
This puts the output of the divider through a register.  With the
addition of the logic to detect overflow, the combinatorial output
logic of the divider was becoming a critical path.  Adding the
output register adds a cycle to the latency of the divider but
helps make timing at 100MHz on the A7-100.

This also makes the valid, write_reg_enable and write_cr_enable
fields of the output be registered, which eliminates warnings
about register/latch pins with no clock.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-15 10:29:53 +11:00
Anton Blanchard
bb65d0b899 Remove issue restrictions on a number of instructions
Anything that isn't a load or store and anything that doesn't read the
CR can go as soon as its inputs are ready.

While we could also allow SPR read/write and carry read/write, we plan
to change them to be read in decode2 and written in writeback soon and
they will need separate hazard detection to be added.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2019-10-15 09:38:06 +11:00
Anton Blanchard
bdc26b7527 Add GPR hazard detection
Check GPRs against any writers in the pipeline.

All instructions are still marked single in pipeline at
this stage.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2019-10-15 09:03:57 +11:00
Anton Blanchard
e4c98dce36 Merge pull request #100 from antonblanchard/gpr-hazard-5-a
Separate issue control into its own unit
2019-10-15 09:02:56 +11:00
Anton Blanchard
f181bf31e2 Merge pull request #99 from paulusmack/logical
Logical
2019-10-14 13:14:04 +11:00
Anton Blanchard
d5346d0abf Separate issue control into its own unit
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2019-10-14 13:05:30 +11:00