1
0
mirror of https://github.com/antonblanchard/microwatt.git synced 2026-01-11 23:43:15 +00:00

1129 Commits

Author SHA1 Message Date
Paul Mackerras
3510071d9a Add a second execute stage to the pipeline
This adds a second execute stage to the pipeline, in order to match up
the length of the pipeline through loadstore and dcache with the
length through execute1.  This will ultimately enable us to get rid of
the 1-cycle bubble that we currently have when issuing ALU
instructions after one or more LSU instructions.

Most ALU instructions execute in the first stage, except for
count-zeroes and popcount instructions (which take two cycles and do
some of their work in the second stage) and mfspr/mtspr to "slow" SPRs
(TB, DEC, PVR, LOGA/LOGD, CFAR).  Multiply and divide/mod instructions
take several cycles but the instruction stays in the first stage (ex1)
and ex1.busy is asserted until the operation is complete.

There is currently a bypass from the first stage but not the second
stage.  Performance is down somewhat because of that and because this
doesn't yet eliminate the bubble between LSU and ALU instructions.

The forwarding of XER common bits has been changed somewhat because
now there is another pipeline stage between ex1 and the committed
state in cr_file.  The simplest thing for now is to record the last
value written and use that, unless there has been a flush, in which
case the committed state (obtained via e_in.xerc) is used.

Note that this fixes what was previously a benign bug in control.vhdl,
where it was possible for control to forget an instructions dependency
on a value from a previous instruction (a GPR or the CR) if this
instruction writes the value and the instruction gets to the point
where it could issue but is blocked by the busy signal from execute1.
In that situation, control may incorrectly not indicate that a bypass
should be used.  That didn't matter previously because, for ALU and
FPU instructions, there was only one previous instruction in flight
and once the current instruction could issue, the previous instruction
was completing and the correct value would be obtained from
register_file or cr_file.  For loadstore instructions there could be
two being executed, but because there are no bypass paths, failing to
indicate use of a bypass path is fine.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2022-07-22 22:19:05 +10:00
Paul Mackerras
521a5403a9 execute1: Rename 'r' to 'ex1'
Maybe this will give us slightly better names in critical path reports
and the like.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2022-07-22 12:10:21 +10:00
Paul Mackerras
813e2317bf execute1: Restructure to separate out execution of side effects
We now have a record that represents the actions taken in executing an
instruction, and a process that computes that for the incoming
instruction.  We no longer have 'current' or 'r.cur_instr', instead
things like the destination register are put into r.e in the first
cycle of an instruction and not reinitialized in subsequent busy
cycles.

For mfspr and mtspr, we now decode "slow" SPR numbers (those SPRs that
are not stored in the register file) to a new "spr_selector" record
in decode1 (excluding those in the loadstore unit).  With this, the
result for mfspr is determined in the data path.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2022-07-22 12:10:06 +10:00
Paul Mackerras
204fedc63f Move XER low bits out of register file
Besides the overflow and status carry bits, XER has 18 bits which need
to retain the value written by mtxer (in case software wants to
emulate the move-assist instructions (lswi, lswx, stswi, stswx).
Until now these bits (and others) have been stored in the GPR file as
a "fast" SPR, but this causes complications because XER is not really
a fast SPR.

Instead, we now store these 18 bits in the 'ctrl' signal, which exists
in execute1.  This will enable us to simplify the data path in future,
and has the added bonus that with a little bit of plumbing, we can get
the full XER value printed when dumping registers at the end of a
simulation.

Therefore this changes scripts/run_test.sh to remove the greps which
exclude XER from the comparison of actual and expected register
results.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2022-06-29 20:02:36 +10:00
Paul Mackerras
bdd4d04162 Simplify flow control in the dcache and loadstore units
Simplify the flow control by stalling the whole upstream pipeline when
a stage can't proceed, instead of trying to let each stage progress
independently when it can.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2022-06-29 10:55:03 +10:00
Paul Mackerras
35e0dbed34
Merge pull request #353 from tianrui-wei/master
fix: fix icache_tb not finishing correctly
2022-06-17 09:46:57 +10:00
Michael Neuling
cd52390bf1
Merge pull request #373 from antonblanchard/icache-insn-u-state
icache: Don't output X on i_out.insn
2022-06-17 09:13:49 +10:00
Michael Neuling
b983d5080e
Merge pull request #376 from antonblanchard/loadstore-init
loadstore1: reduce U state being output
2022-06-16 16:47:33 +10:00
Michael Neuling
d4db331467
Merge pull request #374 from antonblanchard/icache-unused-sig
core: Remove unused icache_inv signal
2022-06-16 16:45:41 +10:00
Michael Neuling
ee5e3778ed
Merge pull request #364 from shenki/readme-updates
Readme updates
2022-06-16 14:38:12 +10:00
Michael Neuling
c43692f4c7
Merge pull request #372 from antonblanchard/dcache-unused-sig
dcache: remove unused do_write signal
2022-06-16 14:36:50 +10:00
Michael Neuling
956df2c863
Merge pull request #371 from antonblanchard/unused-sig
execute1: sub_mux_sel and result_mux_sel are unused
2022-06-16 14:35:10 +10:00
Michael Neuling
3627f102db
Merge pull request #370 from antonblanchard/divider-init
divider: Fix d_out.overflow U state issue
2022-06-16 14:33:45 +10:00
Paul Mackerras
6e1e763c02
Merge pull request #368 from antonblanchard/icache-pmu-events
icache: Hook up PMU events
2022-06-15 11:02:58 +10:00
Anton Blanchard
1047239a37
Merge pull request #377 from antonblanchard/fpu-init
fpu: Reduce uninitialised signals
2022-06-14 18:10:37 +10:00
Anton Blanchard
9d35340bb1 fpu: Reduce uninitialised signals
Reduce uninitialised signals coming out of the FPU.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2022-06-14 15:14:19 +10:00
Michael Neuling
b82eea5933
Merge pull request #366 from antonblanchard/hello-world-bss
Zero BSS in hello world test
2022-06-14 13:09:57 +10:00
Anton Blanchard
d3aff67fa7
Merge pull request #375 from antonblanchard/core_debug-init
core_debug: Initialise gspr_index
2022-06-13 07:15:55 +10:00
Anton Blanchard
b47b71821e loadstore1: reduce U state being output
While these signals should only be read when valid is true, they
are only a small number of bits and we want to reduce the amount of
U/X state bouncing around the chip.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2022-06-12 22:15:11 +10:00
Anton Blanchard
71d4b5ed20 core_debug: Initialise gspr_index
Another case of U state being driven out of a module.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2022-06-12 21:49:13 +10:00
Anton Blanchard
a527d9b959 core: Remove unused icache_inv signal
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2022-06-12 21:04:16 +10:00
Anton Blanchard
e7f0a7c7ac icache: Don't output X on i_out.insn
decode1 has a lot of logic that uses i_out.insn without first looking at
i_iout.valid. Play it safe and never output X state.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2022-06-12 11:42:32 +10:00
Anton Blanchard
39220be311 dcache: remove unused do_write signal
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2022-06-12 11:39:31 +10:00
Anton Blanchard
843361f2be execute1: sub_mux_sel and result_mux_sel are unused
Remove them.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2022-06-12 10:51:56 +10:00
Anton Blanchard
d3a7517318 divider: Fix d_out.overflow U state issue
While we should only look at this when d_out.valid = 1, we may as remove
some U state across interfaces.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2022-06-12 10:34:20 +10:00
Anton Blanchard
1ff852b012
Merge pull request #369 from antonblanchard/loadstore-pmu-init
loadstore1: Initialise PMU events
2022-06-12 10:24:54 +10:00
Anton Blanchard
e2438071a1 loadstore1: Initialise PMU events
The loadstore1 PMU events are U state until a load and a store completes.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2022-06-12 09:34:22 +10:00
Anton Blanchard
b7c4d3c5c3
Merge pull request #367 from antonblanchard/fpu-typo
fpu: Fix capitalisation of Execute1ToFPUType
2022-06-12 09:32:59 +10:00
Anton Blanchard
f06abb67ad icache: Hook up PMU events
We weren't connecting the icache PMU events up.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2022-06-12 09:21:56 +10:00
Anton Blanchard
64d2def0c6 fpu: Fix capitalisation of Execute1ToFPUType
While this is not an issue in VHDL, I noticed this when running
a script over the source and we may as well fix it.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2022-06-10 08:10:27 +10:00
Anton Blanchard
ff442d1bdb Zero BSS in hello world test
While trying to reduce U/X state issues, I notice that our BSS is not
being initialised in the hello world test.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2022-06-08 15:20:07 +10:00
Anton Blanchard
b8fc5636a4
Merge pull request #365 from antonblanchard/less-fpga-init
Remove some FPGA style signal inits
2022-06-08 14:54:48 +10:00
Anton Blanchard
ebdddcc402 Remove some FPGA style signal inits
These don't work on the ASIC flow, so remove them and initialise
them explicitly where required.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2022-06-07 20:01:14 +10:00
Anton Blanchard
a750365ffa Remove some FPGA style signal inits
These don't work on the ASIC flow, so remove them and initialise
them explicitly where required.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2022-06-07 17:38:24 +10:00
Joel Stanley
9ec22af256 README: Add Linux on Microwatt instructions
These instructions are similar to those at

 https://ozlabs.org/~joel/microwatt/README

except they describe how to build the artifacts from scratch instead of
downloading them.

Signed-off-by: Joel Stanley <joel@jms.id.au>
2022-06-07 13:26:59 +09:30
Joel Stanley
a31725d989 README: Add uart to fusesoc instructions
The SoC defaults to using the uart16550 so provide instructions on how
to fetch that library when seetting up fusesoc.

Also remove the text about a working directory; fusesoc doesn't need
one.

Signed-off-by: Joel Stanley <joel@jms.id.au>
2022-06-07 12:48:42 +09:30
Michael Neuling
f5e06c2d4b
Merge pull request #361 from antonblanchard/alt-reset-address
Allow ALT_RESET_ADDRESS to be overridden
2022-03-22 11:55:54 +11:00
Anton Blanchard
948f6f43a7 Allow ALT_RESET_ADDRESS to be overridden
This allows us to boot from flash for example.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2022-03-22 09:35:17 +11:00
Michael Neuling
8bf48ac094
Merge pull request #360 from antonblanchard/log2ceil-issue
wishbone_bram_wrapper ram_addr_bits is 1 bit off
2022-03-18 18:28:34 +11:00
Anton Blanchard
b5accb78b2 wishbone_bram_wrapper ram_addr_bits is 1 bit off
log2ceil() returns the number of bits required to store a value, so we
need to pass in memory_size-1, not memory_size.

Every other user of log2ceil() gets this right.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2022-03-17 18:10:36 +11:00
Michael Neuling
30fd936c12
Merge pull request #358 from antonblanchard/unused-sig
Remove unused sequential signal from Fetch1ToIcacheType
2022-03-16 10:49:47 +11:00
Michael Neuling
af1b76d944
Merge pull request #356 from antonblanchard/fpu-constant
fpu: Make inverse_table a constant
2022-03-16 10:49:29 +11:00
Michael Neuling
9b96ab730c
Merge pull request #357 from antonblanchard/xics-warning
xics: Fix warning when comparing two std_ulogic_vectors
2022-03-16 10:48:59 +11:00
Anton Blanchard
0b39947f8d Remove unused sequential signal from Fetch1ToIcacheType
GHDL synthesis is flagging a warning about this.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2022-03-15 18:27:48 +11:00
Anton Blanchard
00bf0af21c xics: Fix warning when comparing two std_ulogic_vectors
Use unsigned() to make it clear what we are doing.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2022-03-15 16:04:18 +11:00
Anton Blanchard
50b4cb9423 fpu: Make inverse_table a constant
GHDL synthesis is complaining that inverse_table is never stored to.
Change it to a constant.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2022-03-15 16:03:34 +11:00
Tianrui Wei
844ca0e6b5
fix: fix icache_tb not finishing correctly
Setting icache to be privileged and accessing physical memory directly.
And set big_endian to 0 to correspond to the testbench result.

Signed-off-by: Tianrui Wei <tianrui@tianruiwei.com>
2022-03-01 23:54:24 +08:00
Michael Neuling
f01f3d233a
Merge pull request #352 from mkj/static-urjtag
mw_debug: Add STATIC_URJTAG flag
2022-02-28 08:17:50 +11:00
Matt Johnston
c0c00d05bc mw_debug: Add STATIC_URJTAG flag
Revert to linking dynamically by default, can statically link with
`make STATIC_URJTAG=1`

Fixes #351

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
2022-02-25 17:43:28 +08:00
Michael Neuling
ffcdaaa92d
Update the README Issues (#350)
We've had these for a while now:
 - D/I cache
 - GPR bypassing
 - Supervisor state (and can boot linux)

We still need Vector/VMX/VSX (and probably some other things)

Signed-off-by: Michael Neuling <mikey@neuling.org>
2022-02-25 13:18:38 +11:00