jumping
=======

There are four different jumping techniques possible with the two
non-conditional jump instructions in the RISC-V Base ISA:

-  short jumps: relative (``6f/jal``)
-  far jumps: absolute (``37/lui`` + ``67/jalr``)
-  far jumps: relative (``17/auipc`` + ``67/jalr``)
-  “zero page” jumps: (``67/jalr 0/rs/x0``)

short jumps: relative (6f/jal)
------------------------------

.. CAUTION::  will probably need to explain "0th bit assumed 0" up here already

far jumps: absolute (37/lui + 67/jalr)
--------------------------------------

.. CAUTION:: will need to explain slices up here already

far jumps: relative (17/auipc + 67/jalr)
----------------------------------------

Consider the following scenario, where execution begins at
``main``/``0x13000`` and after performing some other work (two noops) we
want to jump to ``target``/``0x02000``::

   == code (0x02080)
   target: # we want to jump here
     # infinite loop
     6f/jal 0/rd/x0 target/off20

   == code (0x13000)
   main:
     13/opi 0/subop/add 0/rd/x0 0/rs/x0 0/imm12 # nop
     13/opi 0/subop/add 0/rd/x0 0/rs/x0 0/imm12 # nop

     # now we are at address 0x13008
     # and we want to jump up to "target"

The offset is ``0x02080 - 0x13008`` = ``-0x10f88``, which is outside of
the 20-bit range of the short-jump instruction ``6f/jal``.

Since instructions are 32-bit wide, we cannot fit the whole jump offset,
which is already 32-bit wide on its own, into an instruction. Instead we
have to split it up into two parts, and load it using two instructions::


   MSB                             LSB
     SHHHHHHHHHHHHHHHHHHHH____________  upper 20 bits
     S____________________LLLLLLLLLLL0  sign bit + lower 12 bits (0th bit assumed 0)

The two parts are both signed numbers that when added together sum up to
the complete offset we want to jump. In the case given above we can
split the number like so: ``-0x10f88`` = ``(-0x10000) + (-0xf88)``.

The first instruction is ``17/auipc`` (**A**\ dd **U**\ pper
**I**\ mmediate (to) **PC**), which takes the upper 20 bits of our
offset and adds them to the program counter register (pc). By adding to
PC, ``17/auipc`` already turns our relative offset literal into an
absolute address. The result of ``17/auipc`` is stored in a register of
our choice, we will use ``t0`` (register ``x5``) here.

We also have to trim off the lower 12 bits of our first part
``-0x10000``, to get it to fit into the 20-bit immediate argument,
``-0x10000 >> 12`` becomes ``-0x10``::

   17/auipc 5/rd/t0 -0x10/off20

This will store ``pc + (-0x13 << 12)`` = ``0x13008 - 0x10000`` =
``0x3008`` into the register. At this point the remaining offset is
``0x2080 - 0x3008 = -0xf88`` - exactly the second part of the offset we
have yet to add.

We can add this offset to the register value and jump at once with the
``67/jalr`` instruction, which takes a register and an immediate value,
adds them, and jumps to the resulting address.

We will use the prepared offset in our chosen register ``t0`` and our
remaining offset ``-0xf88`` as the immediate value. However the
instruction only has a 12-bit immediate field, whereas we need to pass
our lower 12 bits *and* a sign bit, a total of 13 bits! Similar to
``6f/jal``, ``67/jalr`` accounts for this by having us drop off the
lowest bit of our offset, which is always going to be 0 anyway, since
instructions have to be aligned to 16 bit. Therefore our immediate
becomes ``-0xf88 >> 1 = -0x7c4``::

   67/jalr 0/rd/x0 0/func 5/rs/t0 -0x7c4/off12

.. NOTE::
   the ``rd`` output register is used to save the return
   address when calling, functions, it is set to ``x0`` here since we
   are not using it. The ``0/func`` particle is a fixed part of the
   ``67/jalr`` instruction that is always ``0``.

.. NOTE::
   for convenience, SubV also allows passing a 13-bit offset
   and will trim off the excess lowest bit (after validating that it is
   in fact zero), so we could have also written the following::

      67/jalr 0/rd/x0 0/func 5/rs/t0 -0xf88/off13

This will now add our register value to our literal (shifted back up to
the 13-bit range we intended) and jump to the resulting address:
``t0 + (-0x7c4 << 1) = 0x3008 - 0xf88 = 0x2080`` - exactly where we
intended to go :)

label slicing
~~~~~~~~~~~~~

Obviously doing all this arithmetic when the addresses change is not
something we want to do, we would rather use labels. The SubV syntax
affords us some convenience here, so lets work up from the most to the
least explicit form:

::

     17/auipc 5/rd/t0 target/off32/[31:12]
     67/jalr 0/rd/x0 0/func 5/rs/t0 target+4/off32/[1:11:1]

In this most explicit variant, both offsets are specified as 32-bit
(``off32``). This is important because SubV will verify that the given
offset fits in the given field width. If we had used ``off12`` for the
``67/jalr`` immediate, the offset value would not fit into that range
and an error would occur.

Instead, we use the slice syntax ``[H:L]``/``[S:H:L]`` to specify which
bits we want to extract from each of the 32-bit values to form our
shorter immediates.

The ``H`` and ``L`` indices specify the range of bits to slice out. Both
limits are inclusive (the highest bit taken is ``H`` and the lowest is
``L``), so the slice ``[15:8]`` has a size of ``1 + H - L = 8`` and
includes the bits with indices ``15, 14, …, 9, 8`` of the original
value.

When ``S`` is specified as ``1``, the sign bit (the highest bit) of the
original value is also copied as the highest bit of the resulting value.
This takes up an extra bit in the result value, so the size also
increases by one in this case; ``[1:14:8]`` has a size of 8
(``S + 1 + H - L = 1 + 1 + 14 - 8``) and includes the bits
``31, 14, 13, …, 9, 8`` for a 31-bit input.

When ``S`` is not specified, it defaults to ``0`` (no sign bit).

Using this slice syntax, we can slice the labels as required:

-  ``target/off32/[31:12]`` is a 20-bit slice
-  ``target+4/off32/[1:11:1]`` is a 12-bit slice including the original
   sign bit, the low bits 11 through 1, dropping the lowest bit.

label offsets
~~~~~~~~~~~~~

You may have noticed the ``+4`` literal offset that is added to the
label address. Since our “jump” consists of two instructions, the offset
to the target label is different when calculated relative to the first
instruction and to the second instruction respectively.

However when building our two offset values (the top 20 and the lower 12
bits), we need to use the offset relative to the ``17/auipc``
instruction both times. This is because the final address is calculated
as ``(hi + pc) + lo``, where the ``(hi + pc)`` part is calculated by
``17/auipc``, and the ``(…) + lo`` part is done in ``67/jalr``.

Using the example above, if we left off the ``+4`` offset, our two parts
would be::

   # 17/auipc at addr 0x13008
     target/off32/[31:12]
   = (0x02080 - 0x13008)[31:12]
   = (-0x10f88)[31:12]
   = -0x10

   # 67/jalr at addr 0x1300c
     target/off32/[1:11:1]
   = (0x02080 - 0x1300c)[1:11:1]
   = (-0x10f84)[1:11:1]
   = -0xf84 >> 1

If we walk through the instructions, we now get
``(0x13008 - (0x10 << 12)) - 0xf84`` as our jump destination, which
resolves to ``0x2084`` - four bytes too far! The four bytes extra are
the address difference between the ``17/auipc`` and ``67/jalr``
instructions. By adding ``4`` to the label address in the ``67/jalr``
immediate, we can cancel out this difference::

   # 67/jalr at addr 0x1300c
     target+4/off32/[1:11:1]
   = (0x02080 + 4 - 0x1300c)[1:11:1]
   = (-0x10f88)[1:11:1]
   = -0xf88 >> 1

.. NOTE::
   While this is probably very rare, you could have an offset
   other than ``+4`` if there are other instructions between
   ``17/auipc`` and ``67/jalr``::

        17/auipc 5/rd/t0 target/off32/[31:12]
        13/opi 0/subop/add 0/rd/x0 0/rs/x0 0/imm12 # nop
        13/opi 0/subop/add 0/rd/x0 0/rs/x0 0/imm12 # nop
        67/jalr 0/rd/x0 0/func 5/rs/t0 target+12/off32/[1:11:1]

   Here there are three 32-bit instructions between the start of
   ``17/auipc`` and ``67/jalr``, hence a ``3 * 4 = 12`` byte offset.

"zero page" jumps: (67/jalr 0/rs/x0)
------------------------------------

.. CAUTION::  Need to test and understand this properly first