qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for


From: Cédric Le Goater
Subject: Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands
Date: Wed, 28 Apr 2021 15:54:14 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1

On 4/28/21 3:12 PM, Bin Meng wrote:
> Hi Cédric,
> 
> On Tue, Apr 27, 2021 at 10:32 PM Cédric Le Goater <clg@kaod.org> wrote:
>>
>> Hello,
>>
>> On 4/27/21 10:54 AM, Francisco Iglesias wrote:
>>> On [2021 Apr 27] Tue 15:56:10, Alistair Francis wrote:
>>>> On Fri, Apr 23, 2021 at 4:46 PM Bin Meng <bmeng.cn@gmail.com> wrote:
>>>>>
>>>>> On Mon, Feb 8, 2021 at 10:41 PM Bin Meng <bmeng.cn@gmail.com> wrote:
>>>>>>
>>>>>> On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias
>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Bin,
>>>>>>>
>>>>>>> On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote:
>>>>>>>> Hi Francisco,
>>>>>>>>
>>>>>>>> On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
>>>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Dear Bin,
>>>>>>>>>
>>>>>>>>> On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
>>>>>>>>>> Hi Francisco,
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
>>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Bin,
>>>>>>>>>>>
>>>>>>>>>>> On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
>>>>>>>>>>>> Hi Francisco,
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
>>>>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Bin,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
>>>>>>>>>>>>>> Hi Francisco,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
>>>>>>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Bin,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
>>>>>>>>>>>>>>>> Hi Francisco,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
>>>>>>>>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Bin,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
>>>>>>>>>>>>>>>>>> From: Bin Meng <bin.meng@windriver.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The m25p80 model uses s->needed_bytes to indicate how many 
>>>>>>>>>>>>>>>>>> follow-up
>>>>>>>>>>>>>>>>>> bytes are expected to be received after it receives a 
>>>>>>>>>>>>>>>>>> command. For
>>>>>>>>>>>>>>>>>> example, depending on the address mode, either 3-byte 
>>>>>>>>>>>>>>>>>> address or
>>>>>>>>>>>>>>>>>> 4-byte address is needed.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> For fast read family commands, some dummy cycles are 
>>>>>>>>>>>>>>>>>> required after
>>>>>>>>>>>>>>>>>> sending the address bytes, and the dummy cycles need to be 
>>>>>>>>>>>>>>>>>> counted
>>>>>>>>>>>>>>>>>> in s->needed_bytes. This is where the mess began.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> As the variable name (needed_bytes) indicates, the unit is 
>>>>>>>>>>>>>>>>>> in byte.
>>>>>>>>>>>>>>>>>> It is not in bit, or cycle. However for some reason the 
>>>>>>>>>>>>>>>>>> model has
>>>>>>>>>>>>>>>>>> been using the number of dummy cycles for s->needed_bytes. 
>>>>>>>>>>>>>>>>>> The right
>>>>>>>>>>>>>>>>>> approach is to convert the number of dummy cycles to bytes 
>>>>>>>>>>>>>>>>>> based on
>>>>>>>>>>>>>>>>>> the SPI protocol, for example, 6 dummy cycles for the Fast 
>>>>>>>>>>>>>>>>>> Read Quad
>>>>>>>>>>>>>>>>>> I/O (EBh) should be converted to 3 bytes per the formula (6 
>>>>>>>>>>>>>>>>>> * 4 / 8).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> While not being the original implementor I must assume that 
>>>>>>>>>>>>>>>>> above solution was
>>>>>>>>>>>>>>>>> considered but not chosen by the developers due to it is 
>>>>>>>>>>>>>>>>> inaccuracy (it
>>>>>>>>>>>>>>>>> wouldn't be possible to model exacly 6 dummy cycles, only a 
>>>>>>>>>>>>>>>>> multiple of 8,
>>>>>>>>>>>>>>>>> meaning that if the controller is wrongly programmed to 
>>>>>>>>>>>>>>>>> generate 7 the error
>>>>>>>>>>>>>>>>> wouldn't be caught and the controller will still be 
>>>>>>>>>>>>>>>>> considered "correct"). Now
>>>>>>>>>>>>>>>>> that we have this detail in the implementation I'm in favor 
>>>>>>>>>>>>>>>>> of keeping it, this
>>>>>>>>>>>>>>>>> also because the detail is already in use for catching 
>>>>>>>>>>>>>>>>> exactly above error.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I found no clue from the commit message that my proposed 
>>>>>>>>>>>>>>>> solution here
>>>>>>>>>>>>>>>> was ever considered, otherwise all SPI controller models 
>>>>>>>>>>>>>>>> supporting
>>>>>>>>>>>>>>>> software generation should have been found out seriously 
>>>>>>>>>>>>>>>> broken long
>>>>>>>>>>>>>>>> time ago!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The controllers you are referring to might lack support for 
>>>>>>>>>>>>>>> commands requiring
>>>>>>>>>>>>>>> dummy clock cycles but I really hope they work with the other 
>>>>>>>>>>>>>>> commands? If so I
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am not sure why you view dummy clock cycles as something 
>>>>>>>>>>>>>> special
>>>>>>>>>>>>>> that needs some special support from the SPI controller. For the 
>>>>>>>>>>>>>> case
>>>>>>>>>>>>>> 1 controller, it's nothing special from the controller 
>>>>>>>>>>>>>> perspective,
>>>>>>>>>>>>>> just like sending out a command, or address bytes, or data. The
>>>>>>>>>>>>>> controller just shifts data bit by bit from its tx fifo and 
>>>>>>>>>>>>>> that's it.
>>>>>>>>>>>>>> In the Xilinx GQSPI controller case, the dummy cycles can either 
>>>>>>>>>>>>>> be
>>>>>>>>>>>>>> sent via a regular data (the case 1 controller) in the tx fifo, 
>>>>>>>>>>>>>> or
>>>>>>>>>>>>>> automatically generated (case 2 controller) by the hardware.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ok, I'll try to explain my view point a little differently. For 
>>>>>>>>>>>>> that we also
>>>>>>>>>>>>> need to keep in mind that QEMU models HW, and any binary that 
>>>>>>>>>>>>> runs on a HW
>>>>>>>>>>>>> board supported in QEMU should ideally run on that board inside 
>>>>>>>>>>>>> QEMU aswell
>>>>>>>>>>>>> (this can be a bare metal application equaly well as a modified 
>>>>>>>>>>>>> u-boot/Linux
>>>>>>>>>>>>> using SPI commands with a non multiple of 8 number of dummy clock 
>>>>>>>>>>>>> cycles).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Once functionality has been introduced into QEMU it is not easy 
>>>>>>>>>>>>> to know which
>>>>>>>>>>>>> intentional or untentional features provided by the functionality 
>>>>>>>>>>>>> are being
>>>>>>>>>>>>> used by users. One of the (perhaps not well known) features I'm 
>>>>>>>>>>>>> aware of that
>>>>>>>>>>>>> is in use and is provided by the accurate dummy clock cycle 
>>>>>>>>>>>>> modeling inside
>>>>>>>>>>>>> m25p80 is the be ability to test drivers accurately regarding the 
>>>>>>>>>>>>> dummy clock
>>>>>>>>>>>>> cycles (even when using commands with a non-multiple of 8 number 
>>>>>>>>>>>>> of dummy clock
>>>>>>>>>>>>> cycles), but there might be others aswell. So by removing this 
>>>>>>>>>>>>> functionality
>>>>>>>>>>>>> above use case will brake, this since those test will not be 
>>>>>>>>>>>>> reliable.
>>>>>>>>>>>>> Furthermore, since users tend to be creative it is not possible 
>>>>>>>>>>>>> to know if
>>>>>>>>>>>>> there are other use cases that will be affected. This means that 
>>>>>>>>>>>>> in case [1]
>>>>>>>>>>>>> needs to be followed the safe path is to add functionality 
>>>>>>>>>>>>> instead of removing.
>>>>>>>>>>>>> Luckily it also easier in this case, see below.
>>>>>>>>>>>>
>>>>>>>>>>>> I understand there might be users other than U-Boot/Linux that use 
>>>>>>>>>>>> an
>>>>>>>>>>>> odd number of dummy bits (not multiple of 8). If your concern was
>>>>>>>>>>>> about model behavior changes, sure I can update
>>>>>>>>>>>> qemu/docs/system/deprecated.rst to mention that some flashes in the
>>>>>>>>>>>> m25p80 model now implement dummy cycles as bytes.
>>>>>>>>>>>
>>>>>>>>>>> Yes, something like that. My concern is that since this 
>>>>>>>>>>> functionality has been
>>>>>>>>>>> in tree for while, users have found known or unknown features that 
>>>>>>>>>>> got
>>>>>>>>>>> introduced by it. By removing the functionality (and the 
>>>>>>>>>>> known/uknown features)
>>>>>>>>>>> we are riscing to brake our user's use cases (currently I'm aware 
>>>>>>>>>>> of one
>>>>>>>>>>> feature/use case but it is not unlikely that there are more). [1] 
>>>>>>>>>>> states that
>>>>>>>>>>> "In general features are intended to be supported indefinitely once 
>>>>>>>>>>> introduced
>>>>>>>>>>> into QEMU", to me that makes very much sense because the opposite 
>>>>>>>>>>> would mean
>>>>>>>>>>> that we were not reliable. So in case [1] needs to be honored it 
>>>>>>>>>>> looks to be
>>>>>>>>>>> safer to add functionality instead of removing (and riscing the 
>>>>>>>>>>> removal of use
>>>>>>>>>>> cases/features). Luckily I still believe in this case that it will 
>>>>>>>>>>> be easier to
>>>>>>>>>>> go forward (even if I also agree on what you are saying below about 
>>>>>>>>>>> what I
>>>>>>>>>>> proposed).
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Even if the implementation is buggy and we need to keep the buggy
>>>>>>>>>> implementation forever? I think that's why
>>>>>>>>>> qemu/docs/system/deprecated.rst was created for deprecating such
>>>>>>>>>> feature.
>>>>>>>>>
>>>>>>>>> With the RFC I posted all commands in m25p80 are working for both the 
>>>>>>>>> case 1
>>>>>>>>> controller (using a txfifo) and the case 2 controller (no txfifo, as 
>>>>>>>>> GQSPI).
>>>>>>>>> Because of this, I, with all respect, will have to disagree that this 
>>>>>>>>> is buggy.
>>>>>>>>
>>>>>>>> Well, the existing m25p80 implementation that uses dummy cycle
>>>>>>>> accuracy for those flashes prevents all SPI controllers that use tx
>>>>>>>> fifo to work with those flashes. Hence it is buggy.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> don't think it is fair to call them 'seriously broken' (and 
>>>>>>>>>>>>>>> else we should
>>>>>>>>>>>>>>> probably let the maintainers know about it). Most likely the 
>>>>>>>>>>>>>>> lack of support
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I called it "seriously broken" because current implementation 
>>>>>>>>>>>>>> only
>>>>>>>>>>>>>> considered one type of SPI controllers while completely ignoring 
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> other type.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If we change view and see this from the perspective of m25p80, it 
>>>>>>>>>>>>> models the
>>>>>>>>>>>>> commands a certain way and provides an API that the SPI 
>>>>>>>>>>>>> controllers need to
>>>>>>>>>>>>> implement for interacting with it. It is true that there are SPI 
>>>>>>>>>>>>> controllers
>>>>>>>>>>>>> referred to above that do not support the portion of that API 
>>>>>>>>>>>>> that corresponds
>>>>>>>>>>>>> to commands with dummy clock cycles, but I don't think it is true 
>>>>>>>>>>>>> that this is
>>>>>>>>>>>>> broken since there is also one SPI controller that has a working 
>>>>>>>>>>>>> implementation
>>>>>>>>>>>>> of m25p80's full API also when transfering through a tx fifo (use 
>>>>>>>>>>>>> case 1). But
>>>>>>>>>>>>> as mentioned above, by doing a minor extension and improvement to 
>>>>>>>>>>>>> m25p80's API
>>>>>>>>>>>>> and allow for toggling the accuracy from dummy clock cycles to 
>>>>>>>>>>>>> dummy bytes [1]
>>>>>>>>>>>>> will still be honored as in the same time making it possible to 
>>>>>>>>>>>>> have full
>>>>>>>>>>>>> support for the API in the SPI controllers that currently do not 
>>>>>>>>>>>>> (please reread
>>>>>>>>>>>>> the proposal in my previous reply that attempts to do this). I 
>>>>>>>>>>>>> myself see this
>>>>>>>>>>>>> as win/win situation, also because no controller should need 
>>>>>>>>>>>>> modifications.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I am afraid your proposal does not work. Your proposed new device
>>>>>>>>>>>> property 'model_dummy_bytes' to select to convert the accurate 
>>>>>>>>>>>> dummy
>>>>>>>>>>>> clock cycle count to dummy bytes inside m25p80, is hard to justify 
>>>>>>>>>>>> as
>>>>>>>>>>>> a property to the flash itself, as the behavior is tightly coupled 
>>>>>>>>>>>> to
>>>>>>>>>>>> how the SPI controller works.
>>>>>>>>>>>
>>>>>>>>>>> I agree on above. I decided though that instead of posting sample 
>>>>>>>>>>> code in here
>>>>>>>>>>> I'll post an RFC with hopefully an improved proposal. I'll cc you. 
>>>>>>>>>>> About below,
>>>>>>>>>>> Xilinx ZynqMP GQSPI should not need any modication in a first step.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Wait, (see below)
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Please take a look at the Xilinx GQSPI controller, which supports 
>>>>>>>>>>>> both
>>>>>>>>>>>> use cases, that the dummy cycles can be transferred via tx fifo, or
>>>>>>>>>>>> generated by the controller automatically. Please read the example
>>>>>>>>>>>> given in:
>>>>>>>>>>>>
>>>>>>>>>>>>     table 24‐22, an example of Generic FIFO Contents for Quad I/O 
>>>>>>>>>>>> Read
>>>>>>>>>>>> Command (EBh)
>>>>>>>>>>>>
>>>>>>>>>>>> in 
>>>>>>>>>>>> https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
>>>>>>>>>>>>
>>>>>>>>>>>> If you choose to set the m25p80 device property 
>>>>>>>>>>>> 'model_dummy_bytes' to
>>>>>>>>>>>> true when working with the Xilinx GQSPI controller, you are bound 
>>>>>>>>>>>> to
>>>>>>>>>>>> only allow guest software to use tx fifo to transfer the dummy 
>>>>>>>>>>>> cycles,
>>>>>>>>>>>> and this is wrong.
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> You missed this part. I looked at your RFC, and as I mentioned above
>>>>>>>>>> your proposal cannot support the complicated controller like Xilinx
>>>>>>>>>> GQSPI. Please read the example of table 24-22. With your RFC, you
>>>>>>>>>> mandate guest software's GQSPI driver to only use hardware dummy 
>>>>>>>>>> cycle
>>>>>>>>>> generation, which is wrong.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> First, thank you very much for looking into the RFC series, very much
>>>>>>>>> appreciated. Secondly, about above, the GQSPI model in QEMU transfers 
>>>>>>>>> from 2
>>>>>>>>> locations in the file, in 1 location the transfer referred to above 
>>>>>>>>> is done, in
>>>>>>>>> another location the transfer through the txfifo is done. The 
>>>>>>>>> location where
>>>>>>>>> transfer referred to above is done will not need any modifications 
>>>>>>>>> (and will
>>>>>>>>> thus work equally well as it does currently).
>>>>>>>>
>>>>>>>> Please explain this a little bit. How does your RFC series handle
>>>>>>>> cases as described in table 24-22, where the 6 dummy cycles are split
>>>>>>>> into 2 transfers, with one transfer using tx fifo, and the other one
>>>>>>>> using hardware dummy cycle generation?
>>>>>>>
>>>>>>> Sorry, I missunderstod. You are right, that won't work.
>>>>>>
>>>>>> +Edgar E. Iglesias
>>>>>>
>>>>>> So it looks by far the only way to implement dummy cycles correctly to
>>>>>> work with all SPI controller models is what I proposed here in this
>>>>>> patch series.
>>>>>>
>>>>>> Maintainers are quite silent, so I would like to hear your thoughts.
>>>>>>
>>>>>> @Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you
>>>>>> please share your thoughts since you are the one who reviewed the
>>>>>> existing dummy implementation (based on commits history)
>>>>
>>>> I agree with Edgar, in that Francisco and Bin know this better than me
>>>> and that modelling things in cycles is a pain.
>>>
>>> Hi Alistair,
>>>
>>>>
>>>> As Bin points out it seems like currently we should be modelling bytes
>>>> (from the variable name) so it makes sense to keep it in bytes. I
>>>> would be in favour of this series in that case. Do we know what use
>>>> cases this will break? I know it's hard to answer but I don't think
>>>> there are too many SSI users in QEMU so it might not be too hard to
>>>> test most of the possible use cases.
>>>
>>> The use case I'm aware of is regression testing of drivers. Ex: if a
>>> driver is using 10 dummy clock cycles with the commands and a patch
>>> accidentaly changes the driver to use 11 dummy clock cycles QEMU currently
>>> finds the problem, that won't be possible with this series. It's difficult
>>> to say but it is not impossible there are other use cases also.
>>
>>
>> It was breaking the Aspeed machines :
>>
>>   
>> https://lore.kernel.org/qemu-devel/78a12882-1303-dd6d-6619-96c5e2cbf531@kaod.org/
> 
> Yes, as I mentioned in the series the modification was based on a pure
> guess from existing QEMU codes as I could not find a datasheet of the
> Aspeed SPI controller on the internet. Do you know if this is publicly
> available?

It is not but much of the register bitfields are described in the code.
I should be able to help you in making this work.

Thanks,

C. 


>> QEMU 6.1 should have acceptance tests that will help in detecting
>> regressions in this area.
>>
> 
> Regards,
> Bin
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]