subv / master

Tree @master (Download .tar.gz) @masterview markup · raw · history · blame


There are four different jumping techniques possible with the two non-conditional jump instructions in the RISC-V Base ISA:

  • short jumps: relative (6f/jal)
  • far jumps: absolute (37/lui + 67/jalr)
  • far jumps: relative (17/auipc + 67/jalr)
  • "zero page" jumps: (67/jalr 0/rs/x0)

short jumps: relative (6f/jal)

@TODO: will probably need to explain "0th bit assumed 0" up here already

far jumps: absolute (37/lui + 67/jalr)

@TODO: will need to explain slices up here already

far jumps: relative (17/auipc + 67/jalr)

Consider the following scenario, where execution begins at main/0x13000 and after performing some other work (two noops) we want to jump to target/0x02000:

== code (0x02080)
target: # we want to jump here
  # infinite loop
  6f/jal 0/rd/x0 target/off20

== code (0x13000)
  13/opi 0/subop/add 0/rd/x0 0/rs/x0 0/imm12 # nop
  13/opi 0/subop/add 0/rd/x0 0/rs/x0 0/imm12 # nop

  # now we are at address 0x13008
  # and we want to jump up to "target"

The offset is 0x02080 - 0x13008 = -0x10f88, which is outside of the 20-bit range of the short-jump instruction 6f/jal.

Since instructions are 32-bit wide, we cannot fit the whole jump offset, which is already 32-bit wide on its own, into an instruction. Instead we have to split it up into two parts, and load it using two instructions:

MSB                             LSB
  SHHHHHHHHHHHHHHHHHHHH____________  upper 20 bits
  S____________________LLLLLLLLLLL0  sign bit + lower 12 bits (0th bit assumed 0)

The two parts are both signed numbers that when added together sum up to the complete offset we want to jump. In the case given above we can split the number like so: -0x10f88 = (-0x10000) + (-0xf88).

The first instruction is 17/auipc (Add Upper Immediate (to) PC), which takes the upper 20 bits of our offset and adds them to the program counter register (pc). By adding to PC, 17/auipc already turns our relative offset literal into an absolute address. The result of 17/auipc is stored in a register of our choice, we will use t0 (register x5) here.

We also have to trim off the lower 12 bits of our first part -0x10000, to get it to fit into the 20-bit immediate argument, -0x10000 >> 12 becomes -0x10.

17/auipc 5/rd/t0 -0x10/off20

This will store pc + (-0x13 << 12) = 0x13008 - 0x10000 = 0x3008 into the register. At this point the remaining offset is 0x2080 - 0x3008 = -0xf88 - exactly the second part of the offset we have yet to add.

We can add this offset to the register value and jump at once with the 67/jalr instruction, which takes a register and an immediate value, adds them, and jumps to the resulting address.

We will use the prepared offset in our chosen register t0 and our remaining offset -0xf88 as the immediate value. However the instruction only has a 12-bit immediate field, whereas we need to pass our lower 12 bits and a sign bit, a total of 13 bits! Similar to 6f/jal, 67/jalr accounts for this by having us drop off the lowest bit of our offset, which is always going to be 0 anyway, since instructions have to be aligned to 16 bit. Therefore our immediate becomes -0xf88 >> 1 = -0x7c4:

67/jalr 0/rd/x0 0/func 5/rs/t0 -0x7c4/off12

note: the rd output register is used to save the return address when calling, functions, it is set to x0 here since we are not using it. > The 0/func particle is a fixed part of the 67/jalr instruction that is always 0.

note: for convenience, SubV also allows passing a 13-bit offset and will trim off the excess lowest bit (after validating that it is in fact zero), so we could have also written the following:

67/jalr 0/rd/x0 0/func 5/rs/t0 -0xf88/off13

This will now add our register value to our literal (shifted back up to the 13-bit range we intended) and jump to the resulting address: t0 + (-0x7c4 << 1) = 0x3008 - 0xf88 = 0x2080 - exactly where we intended to go :)

label slicing

Obviously doing all this arithmetic when the addresses change is not something we want to do, we would rather use labels. The SubV syntax affords us some convenience here, so lets work up from the most to the least explicit form:

  17/auipc 5/rd/t0 target/off32/[31:12]
  67/jalr 0/rd/x0 0/func 5/rs/t0 target+4/off32/[1:11:1]

In this most explicit variant, both offsets are specified as 32-bit (off32). This is important because SubV will verify that the given offset fits in the given field width. If we had used off12 for the 67/jalr immediate, the offset value would not fit into that range and an error would occur.

Instead, we use the slice syntax [H:L]/[S:H:L] to specify which bits we want to extract from each of the 32-bit values to form our shorter immediates.

The H and L indices specify the range of bits to slice out. Both limits are inclusive (the highest bit taken is H and the lowest is L), so the slice [15:8] has a size of 1 + H - L = 8 and includes the bits with indices 15, 14, …, 9, 8 of the original value.

When S is specified as 1, the sign bit (the highest bit) of the original value is also copied as the highest bit of the resulting value. This takes up an extra bit in the result value, so the size also increases by one in this case; [1:14:8] has a size of 8 (S + 1 + H - L = 1 + 1 + 14 - 8) and includes the bits 31, 14, 13, …, 9, 8 for a 31-bit input.

When S is not specified, it defaults to 0 (no sign bit).

Using this slice syntax, we can slice the labels as required:

  • target/off32/[31:12] is a 20-bit slice
  • target+4/off32/[1:11:1] is a 12-bit slice including the original sign bit, the low bits 11 through 1, dropping the lowest bit.

label offsets

You may have noticed the +4 literal offset that is added to the label address. Since our "jump" consists of two instructions, the offset to the target label is different when calculated relative to the first instruction and to the second instruction respectively.

However when building our two offset values (the top 20 and the lower 12 bits), we need to use the offset relative to the 17/auipc instruction both times. This is because the final address is calculated as (hi + pc) + lo, where the (hi + pc) part is calculated by 17/auipc, and the (…) + lo part is done in 67/jalr.

Using the example above, if we left off the +4 offset, our two parts would be:

# 17/auipc at addr 0x13008
= (0x02080 - 0x13008)[31:12]
= (-0x10f88)[31:12]
= -0x10

# 67/jalr at addr 0x1300c
= (0x02080 - 0x1300c)[1:11:1]
= (-0x10f84)[1:11:1]
= -0xf84 >> 1

If we walk through the instructions, we now get (0x13008 - (0x10 << 12)) - 0xf84 as our jump destination, which resolves to 0x2084 - four bytes too far! The four bytes extra are the address difference between the 17/auipc and 67/jalr instructions. By adding 4 to the label address in the 67/jalr immediate, we can cancel out this difference:

# 67/jalr at addr 0x1300c
= (0x02080 + 4 - 0x1300c)[1:11:1]
= (-0x10f88)[1:11:1]
= -0xf88 >> 1

NOTE: While this is probably very rare, you could have an offset other than +4 if there are other instructions between 17/auipc and 67/jalr:

  17/auipc 5/rd/t0 target/off32/[31:12]
  13/opi 0/subop/add 0/rd/x0 0/rs/x0 0/imm12 # nop
  13/opi 0/subop/add 0/rd/x0 0/rs/x0 0/imm12 # nop
  67/jalr 0/rd/x0 0/func 5/rs/t0 target+12/off32/[1:11:1]

Here there are three 32-bit instructions between the start of 17/auipc and 67/jalr, hence a 3 * 4 = 12 byte offset.

"zero page" jumps: (67/jalr 0/rs/x0)

@TODO: Need to test and understand this properly first