qemu-riscv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 3/7] hw/riscv/microchip_pfsoc.c: add an Icicle Kit fdt add


From: Conor Dooley
Subject: Re: [PATCH v3 3/7] hw/riscv/microchip_pfsoc.c: add an Icicle Kit fdt address function
Date: Fri, 20 Jan 2023 00:15:18 +0000

Hey Daniel,

Got through the stuff I wanted to get done tonight faster than
expected...

On Thu, Jan 19, 2023 at 05:17:33PM -0300, Daniel Henrique Barboza wrote:
> Are you testing it by using the command line
> you mentioned in the "qemu icicle kit es" thread?
> 
> $(QEMU)/qemu-system-riscv64 \
>       -M microchip-icicle-kit \
>       -m 2G -smp 5 \
>       -kernel $(vmlinux_bin) \
>       -dtb $(devkit).dtb \
>       -initrd $(initramfs) \
>       -display none \
>       -serial null \
>       -serial stdio

Yah, effectively. It's not quite that, but near enough as makes no real
difference:
qemu-icicle:
        $(QEMU)/qemu-system-riscv64 -M microchip-icicle-kit \
                -m 2G -smp 5 \
                -kernel $(vmlinux_bin) \
                -dtb $(wrkdir)/riscvpc.dtb \
                -initrd $(initramfs) \
                -display none -serial null \
                -serial stdio \
                -D qemu.log -d unimp

I just tried to make things somewhat more intelligible for that thread.

Also in case it is not obvious, I do work for Microchip. As I mentioned
to Alistair at LPC, I/we don't have the cycles at the moment to do
anything with QEMU, so the bits of fixes I have sent are things I fixed
while debugging other issues etc, mostly in the evenings.

Anways, I'll attempt to explain what the craic is here..

On Thu, Jan 19, 2023 at 04:17:24PM -0300, Daniel Henrique Barboza wrote:
> The Icicle Kit board works with 2 distinct RAM banks that are separated

Ehh, 2 isn't really true. There are 6 possible "windows" into the DDR on
MPFS, list here as with their start addresses.

32-bit cached     0x0080000000
64-bit cached     0x1000000000
32-bit non-cached 0x00c0000000
64-bit non-cached 0x1400000000
32-bit WCB        0x00d0000000
64-bit WCB        0x1800000000

These are the "bus" addresses, where the harts think the memory is, but
the memory is not actually connected there. There are some runtime
configurable registers which determine what addresses these correspond
to in the DDR itself.

When the QEMU port for MPFS was written, only two of these were in use,
the 32-bit and 64-bit non-cached regions. The config (seg) registers
were set up so that the 32-bit cached region pointed to 0x0 in DDR and
the 64-bit region pointed to 0x3000_0000 in DDR.
⢰⠒⠒⠒⠒⡖⠒⠒⠒⣶⠒0x80000000
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸⡖⠒⠒⢲⡇   ⡇ 0x40000000
⢸⡇  ⢸⡇   ⡇ ⡇         
⢸⡇  ⢸⠓⠒⠒⠒⠃ ⡇ <-- 64-bit starts here
⢸⡇  ⢸      ⡇         
⢸⡇  ⢸      ⡇         
⢸⡇  ⢸      ⡇         
⢸⡇  ⢸      ⡇         
⢸⡇  ⢸      ⡇ <-- 32-bit starts at 0x0
⠘⠓⠒0⠚⠒⠒1⠒⠒⠒0x00000000

(These diagrams are a bit crap, I'm copy pasting them from a TUI tool
for visualising these I made for myself. The ~s can be ignored.
https://github.com/ConchuOD/memory-aperature-configurator)

> by a gap. We have a lower bank with 1GiB size, a gap follows,
> then at 64GiB the high memory starts.

As you correctly pointed out, that lower region is in fact 1 GiB & hence
there is actually an overlapping region of 256 MiB.

The Devicetree at this point in time looked like:
        ddrc_cache_lo: memory@80000000 {
                device_type = "memory";
                reg = <0x0 0x80000000 0x0 0x30000000>;
                clocks = <&clkcfg CLK_DDRC>;
                status = "okay";
        };

        ddrc_cache_hi: memory@1000000000 {
                device_type = "memory";
                reg = <0x10 0x0 0x0 0x40000000>;
                clocks = <&clkcfg CLK_DDRC>;
                status = "okay";
        };

At some point, it was decided that instead we would use a configuration
with ~no memory at 32-bit addresses. I think it was this one here:

⢰⡖⠒⠒⢲⡖⠒⠒⠒⣶⠒0x80000000
⢸⡇  ⢸⡇   ⣿ ⡇         
⢸⠓⠒⠒⠚⡇   ⡟ ⡇ <-- 32-bit starts here
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ 0x40000000
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇ <-- 64-bit starts at 0x0
⠘⠒⠒0⠒⠓⠒1⠒⠓⠒0x00000000

Because of how these windows work, the 32-bit cached region was always
there, just not used as the Devicetree became:
        ddrc_cache: memory@1000000000 {
                device_type = "memory";
                reg = <0x10 0x0 0x0 0x76000000>;
                status = "okay";
        };

The remaining bit of memory is being used for some WCB buffers etc &
not for the OS itself. This was never upstreamed anywhere AFAIK as it
was a workaround.

The current Devicetree in Linux & U-Boot corresponds to a configuration
like:
⢰⠒⠒⠒⠒⡖⠒⠒⠒⣶⠒0x80000000
⢸    ⡇   ⣿ ⡇         
⢸    ⡇   ⡟ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸⡖⠒⠒⢲⡇   ⡇ 0x40000000
⢸⡇  ⢸⡇   ⡇ ⡇         
⢸⡇  ⢸⡇   ⡇ ⡇         
⢸⡇  ⢸⡇   ⡇ ⡇         
⢸⡇  ⢸⡇   ⡇ ⡇         
⢸⡇  ⢸⡇   ⡇ ⡇         
⢸⡇  ⢸⡇   ⡇ ⡇         
⢸⡇  ⢸⡇   ⡇ ⡇ <-- 32- & 64-bit start at 0x0
⠘⠓⠒0⠚⠓⠒1⠒⠓⠒0x00000000

That DT looks like:
        ddrc_cache_lo: memory@80000000 {
                device_type = "memory";
                reg = <0x0 0x80000000 0x0 0x40000000>;
                status = "okay";
        };

        ddrc_cache_hi: memory@1040000000 {
                device_type = "memory";
                reg = <0x10 0x40000000 0x0 0x40000000>;
                status = "okay";
        };

Each of these changes came as part of an FPGA reference design change &
a corresponding compatible change. I believe rtlv2203 was the second
configuration & rtlv2210 the third.

I can't boot the current configuration in QEMU, probably due to some of
the things you point out below.
To get it working, I remove the ddrc_cache_hi from my DT and boot with
the 32-bit cached memory only.
This is what the current changes have broken for me.

IMO it is a perfectly valid thing to boot a system using less than the
memory it *can* use.

I guess you read the other thread in which I stated that the HSS boot
that is documented doesn't work with recent HSSes. Ideally, and I am
most certainly _not_ expecting anyone to do this, when the HSS writes
the "seg" registers during boot to configure the memory layout as per
the FPGA bitstream QEMU would configure the memory layout it is
emulating to match.
Since direct kernel boot is a thing too, I was thinking that for that
mode, the config in the dtb should probably be used.
I don't know enough about QEMU to know if this is even possible!

The other possibility I was thinking of was just relaxing the DDR limit
entirely (and ignoring the overlaying) so that QEMU thinks there is 1
GiB at 0x8000_0000 and 16 GiB at 0x10_0000_0000.
Again, I've not had the cycles to look into any of this at all nor am I
expecting anyone else to - just while I am already typing about this
stuff there's no harm in broadcasting the other thoughts I had.

> MachineClass::default_ram_size is set to 1.5Gb and machine_init() is
> enforcing it as minimal RAM size, meaning that there we'll always have

I don't think that this is 

> at least 512 MiB in the Hi RAM area, and that the FDT will be located
> there all the time.

All the time? That's odd.
I suppose my kernel then remaps the dtb into the memory range it can
access, and therefore things keep ticking.

I don't think that machine_init() should be enforcing a minimum ram size
of 1.5 GiB - although maybe Bin Meng has a reason for that that I don't
understand.

> riscv_compute_fdt_addr() can't handle this setup because it assumes that
> the RAM is always contiguous. It's also returning an uint32_t because
> it's enforcing that fdt address is sitting on an area that is addressable
> to 32 bit CPUs, but 32 bits won't be enough to point to the Hi area of
> the Icicle Kit RAM (and to its FDT itself).
> 
> Create a new function called microchip_compute_fdt_addr() that is able
> to deal with all these details that are particular to the Icicle Kit.
> Ditch riscv_compute_fdt_addr() and use it instead.
> 
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> ---
>  hw/riscv/microchip_pfsoc.c | 46 +++++++++++++++++++++++++++++++++++---
>  1 file changed, 43 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/riscv/microchip_pfsoc.c b/hw/riscv/microchip_pfsoc.c
> index dcdbc2cac3..9b829e4d1a 100644
> --- a/hw/riscv/microchip_pfsoc.c
> +++ b/hw/riscv/microchip_pfsoc.c
> @@ -54,6 +54,8 @@
>  #include "sysemu/device_tree.h"
>  #include "sysemu/sysemu.h"
>  
> +#include <libfdt.h>
> +
>  /*
>   * The BIOS image used by this machine is called Hart Software Services 
> (HSS).
>   * See https://github.com/polarfire-soc/hart-software-services
> @@ -513,6 +515,46 @@ static void microchip_pfsoc_soc_register_types(void)
>  
>  type_init(microchip_pfsoc_soc_register_types)
>  
> +static hwaddr microchip_compute_fdt_addr(MachineState *ms)
> +{
> +    const MemMapEntry *memmap = microchip_pfsoc_memmap;
> +    hwaddr mem_low_size = memmap[MICROCHIP_PFSOC_DRAM_LO].size;
> +    hwaddr mem_high_size, fdt_base;
> +    int ret = fdt_pack(ms->fdt);
> +    int fdtsize;
> +
> +    /* Should only fail if we've built a corrupted tree */
> +    g_assert(ret == 0);
> +
> +    fdtsize = fdt_totalsize(ms->fdt);
> +    if (fdtsize <= 0) {
> +        error_report("invalid device-tree");
> +        exit(1);
> +    }
> +
> +    /*
> +     * microchip_icicle_kit_machine_init() does a validation
> +     * that guarantees that ms->ram_size is always greater
> +     * than mem_low_size and that mem_high_size will be
> +     * at least 512MiB.

Again, I don't think it should be doing this at all. I see the comment
about that size refers to DDR training, but given the overlaying of
memory it's entirely possible to train against 64-bit addresses but then
boot a kernel using only low memory addresses.
Perhaps by default & for booting via the bootloader, but I don't think
enforcing this makes sense when the bootloader is not involved.

If a dtb is used as the source for the memory layout, requiring memory
at high addresses doesn't make sense to me. I have no idea if there is a
mechanism for figuring that out though nor am I au fait with how these
memory sizes are calculated.
It is getting kinda late here, so I am sending this without having
investigated any of the detail, sorry.

Hopefully that wasn't too deranged and you can at least understand why I
have been doing what I have...

Thanks,
Conor.

> +     *
> +     * This also means that our fdt_addr will be based
> +     * on the starting address of the HI DRAM block.
> +     */
> +    mem_high_size = ms->ram_size - mem_low_size;
> +    fdt_base = memmap[MICROCHIP_PFSOC_DRAM_HI].base;
> +
> +    /*
> +     * In theory we could copy riscv_compute_fdt_addr()
> +     * and put the FDT capped at maximum 3Gb from fdt_base,
> +     * but fdt_base is set at 0x1000000000 (64GiB). We
> +     * make the assumption here that the OS is ready to
> +     * handle the FDT, 2MB aligned, at the very end of
> +     * the available RAM.
> +     */
> +    return QEMU_ALIGN_DOWN(fdt_base + mem_high_size - fdtsize, 2 * MiB);
> +}
> +
>  static void microchip_icicle_kit_machine_init(MachineState *machine)
>  {
>      MachineClass *mc = MACHINE_GET_CLASS(machine);
> @@ -640,9 +682,7 @@ static void 
> microchip_icicle_kit_machine_init(MachineState *machine)
>                                      "bootargs", machine->kernel_cmdline);
>          }
>  
> -        /* Compute the fdt load address in dram */
> -        fdt_load_addr = 
> riscv_compute_fdt_addr(memmap[MICROCHIP_PFSOC_DRAM_LO].base,
> -                                              machine->ram_size, 
> machine->fdt);
> +        fdt_load_addr = microchip_compute_fdt_addr(machine);
>          riscv_load_fdt(fdt_load_addr, machine->fdt);
>  
>          /* Load the reset vector */
> -- 
> 2.39.0
> 
> 
> 

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]