qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH v1 0/8] qapi: add generator for Golang interface


From: Markus Armbruster
Subject: Re: [RFC PATCH v1 0/8] qapi: add generator for Golang interface
Date: Tue, 03 May 2022 09:57:27 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Andrea Bolognani <abologna@redhat.com> writes:

> On Mon, May 02, 2022 at 01:46:23PM +0200, Markus Armbruster wrote:
>> Andrea Bolognani <abologna@redhat.com> writes:
>> >> > The wire protocol would still retain the unappealing name, but at
>> >> > least client libraries could hide the uglyness from users.
>> >>
>> >> At the price of mild inconsistency between the library interface and
>> >> QMP.
>> >
>> > That's fine, and in fact it already happens all the time when QAPI
>> > names (log-append) are translated to C identifiers (log_append).
>>
>> There's a difference between trivial translations like "replace '-' by
>> '_'" and arbitrary replacement like the one for enumeration constants
>> involving 'prefix'.
>
> Fair enough.
>
> I still feel that 1) users of a language SDK will ideally not need to
> look at the QAPI schema or wire chatter too often

I think the most likely point of contact is the QEMU QMP Reference
Manual.

>                                                          even when
> that ends up being necessary, figuring out that LogAppend and
> logappend are the same thing is not going to be an unreasonable
> hurdle, especially when the status quo would be to work with
> Logappend instead.

Yes, it's "mild inconsistency", hardly an unreasonable hurdle.  I think
it gets in the way mostly when searching documentation.  Differences in
case are mostly harmless, just use case-insensitive search.  Use of '_'
vs '-' would also be harmless (just do the replacement), if the use of
'-' in the schema was consistent.  Sadly, it's not, and that's already a
perennial low-level annoyance.

My point is: a name override feature like the one you propose needs to
be used with discipline and restraint.  Adds to reviewers' mental load.
Needs to be worth it.  I'm not saying it isn't, I'm just pointing out a
cost.

>> > The point is that, if we want to provide a language interface that
>> > feels natural, we need a way to mark parts of a QAPI symbol's name in
>> > a way that makes it possible for the generator to know they're
>> > acronyms and treat them in an appropriate, language-specific manner.
>>
>> It's not just acronyms.  Consider IAmALittleTeapot.  If you can assume
>> that only beginning of words are capitalized, even for acronyms, you can
>> split this into words without trouble.  You can't recover correct case,
>> though: "i am a little teapot" is wrong.
>
> Is there any scenario in which we would care though? We're in the
> business of translating identifiers from one machine representation
> to another, so once it has been split up correctly into the words
> that compose it (which in your example above it has) then we don't
> really care about anything else unless acronyms are involved.
>
> In other words, we can obtain the list of words "i am a little
> teapot" programmatically both from IAmALittleTeapot and
> i-am-a-little-teapot, and in both cases we can then generate
> IAmALittleTeapot or I_AM_A_LITTLE_TEAPOT or i_am_a_little_teapot or
> whatever is appropriate for the context and target language, but the
> fact that in a proper English sentence "I" would have to be
> capitalized doesn't really enter the picture.

My point is that conversion from CamelCase has two sub-problems:
splitting words and recovering case.  Splitting words is easy when
exactly the beginning of words is capitalized.  Recovering case is
guesswork.  Most English words are all lower case, but some start with a
capital letter, and acronyms are all caps.

Wild idea: assume all lower case, but keep a list of exceptions.

>> "Split before capital letter" falls apart when you have characters that
>> cannot be capitalized: Point3d.
>>
>> Camel case is hopeless.
>
> I would argue that it works quite well for most scenarios, but there
> are some corner cases where it's clearly not good enough. If we can
> define a way to clue in the generator about "Point3d" having to be
> interpreted as "point 3d" and "VNCProps" as "vnc props", then we are
> golden. That wouldn't be necessary for simple cases that are already
> handled correctly.

Hyphenization rules?  *Cough* *cough*

> A more radical idea would be to start using dash-notation for types
> too. That'd remove the word splitting issue altogether, at the cost
> of the schema being (possibly) harder to read and more distanced from
> the generated code.

Yes.

> You'd still only be able to generate VncProps from vnc-props though.
>
>> > The obvious way to implement this would be with an annotation along
>> > the lines of the one I proposed. Other ideas?
>>
>> I'm afraid having the schema spell out names in multiple naming
>> conventions could be onerous.  How many names will need it?
>
> I don't have hard data on this. I could try extracting it, but that
> might end up being a bigger job than I had anticipated.

I figure extracting is easier for me than for you.  But let's have a
closer look at the job at hand first.

The QAPI schema language uses three naming styles:

* lower-case-with-hyphens for command and member names

  Many names use upper case and '_'.  See pragma command-name-exceptions
  and member-name-exceptions.

  Some (many?) names lack separators between words (example: logappend).

* UPPER_CASE_WITH_UNDERSCORE for event names

* CamelCase for type names

  Capitalization of words is inconsistent in places (example: VncInfo
  vs. DisplayReloadOptionsVNC).

What style conversions will we need for Go?  Any other conversions come
to mind?

What problems do these conversions have?

> My guess is that the number of cases where the naive algorithm can't
> split words correctly is relatively small compared to the size of the
> entire QAPI schema. Fair warning: I have made incorrect guesses in
> the past ;)
>
>> Times how many naming conventions?
>
> Yeah, I don't think requiring all possible permutations to be spelled
> out in the schema is the way to go. That's exactly why my proposal
> was to offer a way to inject the semantic information that the parser
> can't figure out itself.
>
> Once you have a way to inform the generator that "VNCProps" is made
> of the two words "vnc" and "props", and that "vnc" is an acronym,
> then it can generate an identifier appropriate for the target
> language without having to spell out anywhere that such an identifier
> would be VNCProps for Go and VncProps for Rust.
>
> By the way, while looking around I realized that we also have to take
> into account things like D-Bus: the QAPI type ChardevDBus, for
> example, would probably translate verbatim to Go but have to be
> changed to ChardevDbus for Rust. Fun :)
>
> Revised proposal for the annotation:
>
>   ns:word-WORD-WoRD-123Word
>
> Words are always separated by dashes; "regular" words are entirely
> lowercase, while the presence of even a single uppercase letter in a
> word denotes the fact that its case should be preserved when the
> naming conventions of the target language allow that.

Is a word always capitalized the same for a single target language?  Or
could capitalization depend on context?

>> Another issue: the fancier the translation from schema name to
>> language-specific name gets, the harder it becomes to find one from the
>> other.
>
> That's true, but at least to me the trade-off feels reasonable.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]