antonblanchard.microwatt

mirror of https://github.com/antonblanchard/microwatt.git synced 2026-04-10 22:31:45 +00:00

Author	SHA1	Message	Date
Paul Mackerras	e08ca4ab8e	countzero: Add a register to help make timing This adds a register in the middle of the countzero computation, so that we now have two cycles to count leading or trailing zeroes instead of just one. Execute1 now outputs a one-cycle stall signal when it encounters a cntlz* or cnttz* instruction. With this, the countzero path no longer fails timing on the Artix-7 at 100MHz. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2020-01-14 22:45:05 +11:00
Paul Mackerras	5422007f83	Plumb loadstore1 input from execute1 not decode2 This allows us to use the bypass at the input of execute1 for the address and data operands for loadstore1. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2020-01-14 22:44:59 +11:00
Paul Mackerras	b14d982011	execute: Implement bypass from output of execute1 to input This enables back-to-back execution of integer instructions where the first instruction writes a GPR and the second reads the same GPR. This is done with a set of multiplexers at the start of execute1 which enable any of the three input operands to be taken from the output of execute1 (i.e. r.e.write_data) rather than the input from decode2 (i.e. e_in.read_data[123]). This also requires changes to the hazard detection and handling. Decode2 generates a signal indicating that the GPR being written is available for bypass, which is true for instructions that are executed in execute1 (rather than loadstore1/dcache). The gpr_hazard module stores this "bypassable" bit, and if the same GPR needs to be read by a subsequent instruction, it outputs a "use_bypass" signal rather than generating a stall. The use_bypass signal is then latched at the output of decode2 and passed down to execute1 to control the input multiplexer. At the moment there is no bypass on the inputs to loadstore1, but that is OK because all load and store instructions are marked as single-issue. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2020-01-14 22:42:50 +11:00
Paul Mackerras	0c714f1be6	execute: Move popcnt and prty instructions into the logical unit This implements logic in the logical entity to calculate the results of the popcnt* and prty* instructions. We now have one insn_type_t value for the 3 popcnt variants and one for the two prty variants, using the length field of the decode_rom_t to distinguish between them. The implementations in logical.vhdl using recursive algorithms rather than the simple functions in ppc_fx_insns.vhdl. This gives a saving of about 140 slice LUTs on the A7-100 and improves timing slightly. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2020-01-14 22:40:39 +11:00
Paul Mackerras	d2ca625b3b	execute: Do comparisons using the main adder This handles OP_CMP like a subtraction; the main adder computes ~RA + RB + 1, and the condition codes are computed from the results. A direct comparison of the two input operands is used to calculate the EQ bit of the condition result. The LT and GT bits are computed from the MSB of the subtraction result, the carry out from the subtraction, and the MSBs of the operands. For a 32-bit comparison, the 32-bit carry and bit 31 of the result and input operands are used; for a 64-bit comparison, the 64-bit carry and bit 63 of the operands and result are used. It turns out to be more convenient to use the 'signed' field of the decode table to distinguish signed from unsigned comparisons, rather than the insn_type. Therefore this uses OP_CMP for both cmp and cmpl, which also has the benefit of reducing the number of values in insn_type_t. Doing this saves over 200 slice LUTs on the Arty A7-100 and improves timing slightly as well. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2020-01-14 22:29:07 +11:00
Paul Mackerras	d956846667	execute1: Move EXTS* instruction back into execute1 This moves the sign extension done by the extsb, extsh and extsw instructions back into execute1. This means that we no longer need any data formatting in writeback for results coming from execute1, so this modifies writeback so the data formatter inputs come directly from the loadstore unit output. The condition code updates for RC=1 form instructions are now done on the value from execute1 rather than the output of the data formatter, which should help timing. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2020-01-14 17:20:52 +11:00
Paul Mackerras	c9a2076dd3	execute1: Remember dest GPR, RC, OE, XER for slow operations For multiply and divide operations, execute1 now records the destination GPR number, RC and OE from the instruction, and the XER value. This means that the multiply and divide units don't need to record those values and then send them back to execute1. This makes the interface to those units a bit simpler. They simply report an overflow signal along with the result value, and execute1 takes care of updating XER if necessary. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2020-01-14 17:20:52 +11:00
Paul Mackerras	39d18d2738	Make divider hang off the side of execute1 With this, the divider is a unit that execute1 sends operands to and which sends its results back to execute1, which then send them to writeback. Execute1 now sends a stall signal when it gets a divide or modulus instruction until it gets a valid signal back from the divider. Divide and modulus instructions are no longer marked as single-issue. The data formatting step that used to be done in decode2 for div and mod instructions is now done in execute1. We also do the absolute value operation in that same cycle instead of taking an extra cycle inside the divider for signed operations with a negative operand. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2020-01-14 17:20:52 +11:00
Paul Mackerras	2167186b5f	Make multiplier hang off the side of execute1 With this, the multiplier isn't a separate pipe that decode2 issues instructions to, but rather is a unit that execute1 sends operands to and which sends the result back to execute1, which then sends it to writeback. Execute1 now sends a stall signal when it gets a multiply instruction until it gets a valid signal back from the multiplier. This all means that we no longer need to mark the multiply instructions as single-issue. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2020-01-14 17:20:52 +11:00
Anton Blanchard	969245e379	Merge pull request #133 from antonblanchard/ghdl-synth Ghdl synth	2020-01-11 22:40:44 +11:00
Anton Blanchard	729a35967a	Merge pull request #132 from antonblanchard/bin2hex-move Move bin2hex.py to scripts/	2020-01-11 22:07:42 +11:00
Anton Blanchard	9362f2dd10	Move bin2hex.py to scripts/ Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	2020-01-11 21:31:48 +11:00
Anton Blanchard	f1d0382587	Fix a ghdlsynth issue in fast_spr_num I've submitted a bug report for this, but we can work around it easily for now. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	2020-01-11 17:13:23 +11:00
Anton Blanchard	dcee60a729	Fix a ghdlsynth issue in icache ghdlsynth doesn't like the debug statement, so wrap it in a generate. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	2020-01-11 14:51:11 +11:00
Anton Blanchard	3ad3e2abfd	Removed unused core_terminated signal Right now it's unused. We can add it back when we add an LED to signify the core has terminated. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	2020-01-11 14:43:50 +11:00
Anton Blanchard	14c5cf3b83	Fix some ghdlsynth issues with fpga_bram Use to_integer() instead of conv_integer(). Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	2020-01-11 14:34:25 +11:00
Anton Blanchard	b0212b0bf9	Fix ghdlsynth issue in register file We need to drive sim_dump_done to keep ghdlsynth happy. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	2020-01-11 14:29:39 +11:00
Anton Blanchard	f37ef56d79	Remove unused signal Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	2020-01-11 14:28:20 +11:00
Anton Blanchard	25968951e4	Fix a ghdysynth inferred latch error in writeback Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	2020-01-11 14:20:35 +11:00
Anton Blanchard	ad3db18dce	Fix a ghdysynth inferred latch error in execute It should never happen in practise, but ghdlsynth is complaining about an inferred latch here. Fix it Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	2020-01-11 13:24:14 +11:00
Anton Blanchard	0a6fd0adb5	Merge pull request #131 from antonblanchard/new-tests Dump CTR, LR and CR on sim termination, and update our tests	2020-01-11 12:32:57 +11:00
Anton Blanchard	cc8a9e7893	Upper 32 bits of XER should read as 0s From the architecture: bits 0:31 and 35:43 are treated as reserved and return 0s when read using mfxer Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	2020-01-11 12:16:21 +11:00
Anton Blanchard	467630573c	Dump CTR, LR and CR on sim termination, and update our tests Right now our test cases fold the SPRs into the GPRs. That makes debugging fails more difficult than it needs to be, so print out the CTR, LR and CR. We still need to print the XER, but that is in two spots in microwatt and will take some more work. This also adds many instructions to the tests that we have added lately including overflow instructions, CR logicals and mt/mfxer. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	2020-01-11 11:09:38 +11:00
Anton Blanchard	115d63eaf3	Merge pull request #127 from tomtor/CR-PR Implement CRNOR and friends	2020-01-11 10:10:13 +11:00
Anton Blanchard	320fc88d56	Merge pull request #130 from antonblanchard/build-fix control: Fix build issue with Fedora 31 version of GHDL	2020-01-11 09:02:43 +11:00
Anton Blanchard	72aac38581	Merge pull request #129 from antonblanchard/update-micropython Point to upstream micropython	2020-01-11 07:35:37 +11:00
Anton Blanchard	1aec1a4b0e	Point to upstream micropython Our changes are now merged upstream, so point there instead. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	2020-01-11 07:20:21 +11:00
Tom Vijlbrief	c05441bf47	Implement CRNOR and friends Signed-off-by: Tom Vijlbrief <tvijlbrief@gmail.com>	2020-01-06 11:09:14 +01:00
Anton Blanchard	9a67e3b4fe	Merge pull request #126 from sharkcz/docs document packaged fusesoc for Fedora users	2020-01-04 17:05:37 +11:00
Dan Horák	f552021d19	document packaged fusesoc for Fedora users Signed-off-by: Dan Horák <dan@danny.cz>	2020-01-03 15:11:51 +01:00
Anton Blanchard	1c05f330c6	control: Fix build issue with Fedora 31 version of GHDL I'm hitting an issue with the Fedora 31 version of GHDL that appears to be fixed upstream: control.vhdl:105:39:error: actual expression must be globally static Add a signal to get rid of error. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	2019-12-11 12:02:06 +11:00
Anton Blanchard	1a826f077b	Merge pull request #122 from paulusmack/benh-sprs Benh sprs	2019-12-09 22:36:29 +11:00
Anton Blanchard	f5ca58b3c4	Merge pull request #123 from antonblanchard/spi-conf Add SPI configuration to Xilinx constraint files	2019-12-09 20:35:24 +11:00
Anton Blanchard	20674e0d65	Add SPI configuration to Xilinx constraint files Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	2019-12-09 16:12:37 +11:00
Paul Mackerras	23ade0b1c3	decode2: Minor cleanup Remove unused variable is_reg in decode_input_reg_a. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2019-12-07 15:30:44 +11:00
Benjamin Herrenschmidt	e4f475e17f	sprs: Store common SPRs in register file This stores the most common SPRs in the register file. This includes CTR and LR and a not yet final list of others. The register file is set to 64 entries for now. Specific types are defined that can represent a GPR index (gpr_index_t) or a GPR/SPR index (gspr_index_t) along with conversion functions between the two. On order to deal with some forms of branch updating both LR and CTR, we introduced a delayed update of LR after a branch link. Note: We currently stall the pipeline on such a delayed branch, but we could avoid stalling fetch in that specific case as we know we have a branch delay. We could also limit that to the specific case where we need to update both CTR and LR. This allows us to make bcreg, mtspr and mfspr pipelined. decode1 will automatically force the single issue flag on mfspr/mtspr to a "slow" SPR. [paulus@ozlabs.org - fix direction of decode2.stall_in] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2019-12-07 15:30:40 +11:00
Benjamin Herrenschmidt	afdd593502	spr: Add translation from SPR to special GPR number We will want to store some SPRs in the register file using a set of "extra" registers. This provides a function for doing the translation along with some SPR definitions. This isn't used yet Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2019-12-07 15:30:36 +11:00
Paul Mackerras	5a0458dec1	divider: Fix overflow calculation We were signalling overflow when neg_result=1 but the result was zero. Fix this. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2019-12-07 15:30:07 +11:00
Paul Mackerras	d04887fdcd	decode1: Add OE=1 forms of add/sub, mul and div instructions Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2019-12-07 15:29:52 +11:00
Paul Mackerras	ec9b27660f	execute: Copy XER[SO] to CR for cmp[i] and cmpl[i] instructions We were copying in XER[SO] for the dot-form instructions but not the explicit compare instructions. Fix this. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2019-12-07 15:29:17 +11:00
Benjamin Herrenschmidt	501b6daf9b	Add basic XER support The carry is currently internal to execute1. We don't handle any of the other XER fields. This creates type called "xer_common_t" that contains the commonly used XER bits (CA, CA32, SO, OV, OV32). The value is stored in the CR file (though it could be a separate module). The rest of the bits will be implemented as a separate SPR and the two parts reconciled in mfspr/mtspr in latter commits. We always read XER in decode2 (there is little point not to) and send it down all pipeline branches as it will be needed in writeback for all type of instructions when CR0:SO needs to be updated (such forms exist for all pipeline branches even if we don't yet implement them). To avoid having to track XER hazards, we forward it back in EX1. This assumes that other pipeline branches that can modify it (mult and div) are running single issue for now. One additional hazard to beware of is an XER:SO modifying instruction in EX1 followed immediately by a store conditional. Due to our writeback latency, the store will go down the LSU with the previous XER value, thus the stcx. will set CR0:SO using an obsolete SO value. I doubt there exist any code relying on this behaviour being correct but we should account for it regardless, possibly by ensuring that stcx. remain single issue initially, or later by adding some minimal tracking or moving the LSU into the same pipeline as execute. Missing some obscure XER affecting instructions like addex or mcrxrx. [paulus@ozlabs.org - fix CA32 and OV32 for OP_ADD, fix order of arguments to set_ov] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2019-12-07 15:27:53 +11:00
Benjamin Herrenschmidt	f291efa266	decode1: Mark ALU ops using carry as pipelined There is no reason not to that I can think of Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2019-12-06 08:53:39 +11:00
Benjamin Herrenschmidt	1249a11349	cr_file: Check write_cr_enable Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2019-12-06 08:53:13 +11:00
Anton Blanchard	ac7df6fc04	Merge pull request #120 from antonblanchard/spr-decode-cleanup spr: Cleanup decoding of SPR numbers	2019-11-18 14:07:16 +11:00
Anton Blanchard	726e4db66a	Merge pull request #119 from antonblanchard/reduce-pipe-depth control: Reduce pipeline depth to 1	2019-11-18 14:05:48 +11:00
Anton Blanchard	9b1394e236	Merge pull request #118 from antonblanchard/bus-pipeline Bus pipeline	2019-11-15 16:02:57 +11:00
Benjamin Herrenschmidt	98bd8b73c0	control: Reduce pipeline depth to 1 To match our one stage execute. This might change back if we end up adding 2 stages to match the LSU, but in that case we'll want forwards as well. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2019-11-14 15:10:10 +11:00
Benjamin Herrenschmidt	83a8bb0238	spr: Cleanup decoding of SPR numbers Use a function to obtain the integer number and use constants with the architected numbers. Replace std_match with a case statement. This also has the side effect of returning 0 instead of some random previous result on mfspr of an unknown SPR. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2019-11-14 15:09:37 +11:00
Benjamin Herrenschmidt	cff4b13a9b	wb_arbiter: Early master selection This flips the arbiter muxes on the same cycle as a new request comes in, thus avoiding a cycle latency. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2019-10-31 19:57:06 +11:00
Benjamin Herrenschmidt	bc2acfde2f	wb_arbiter: Make arbiter size parametric Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2019-10-30 13:18:58 +11:00

1 2 3 4 5 ...

383 Commits