qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 11/14] qapi/introspect.py: add type hint annotations


From: Markus Armbruster
Subject: Re: [PATCH v4 11/14] qapi/introspect.py: add type hint annotations
Date: Tue, 09 Feb 2021 10:06:23 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)

John Snow <jsnow@redhat.com> writes:

> On 2/5/21 8:42 AM, Markus Armbruster wrote:
>> John Snow <jsnow@redhat.com> writes:
>> 
>>> On 2/3/21 10:15 AM, Markus Armbruster wrote:
>>>> John Snow <jsnow@redhat.com> writes:
>>>>
>>>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>>>> ---
>>>>>    scripts/qapi/introspect.py | 115 ++++++++++++++++++++++++++-----------
>>>>>    scripts/qapi/mypy.ini      |   5 --
>>>>>    scripts/qapi/schema.py     |   2 +-
>>>>>    3 files changed, 82 insertions(+), 40 deletions(-)
>>>>>
>>>>> diff --git a/scripts/qapi/introspect.py b/scripts/qapi/introspect.py
>>>>> index 60ec326d2c7..b7f2a6cf260 100644
>>>>> --- a/scripts/qapi/introspect.py
>>>>> +++ b/scripts/qapi/introspect.py
>>>>> @@ -30,10 +30,19 @@
>>>>>    )
>>>>>    from .gen import QAPISchemaMonolithicCVisitor
>>>>>    from .schema import (
>>>>> +    QAPISchema,
>>>>>        QAPISchemaArrayType,
>>>>>        QAPISchemaBuiltinType,
>>>>> +    QAPISchemaEntity,
>>>>> +    QAPISchemaEnumMember,
>>>>> +    QAPISchemaFeature,
>>>>> +    QAPISchemaObjectType,
>>>>> +    QAPISchemaObjectTypeMember,
>>>>>        QAPISchemaType,
>>>>> +    QAPISchemaVariant,
>>>>> +    QAPISchemaVariants,
>>>>>    )
>>>>> +from .source import QAPISourceInfo
>>>>>    
>>>>>    
>>>>>    # This module constructs a tree data structure that is used to
>>>>> @@ -57,6 +66,8 @@
>>>>      # generate the introspection information for QEMU. It behaves 
>>>> similarly
>>>>      # to a JSON value.
>>>>      #
>>>>      # A complexity over JSON is that our values may or may not be 
>>>> annotated.
>>>>      #
>>>>      # Un-annotated values may be:
>>>>      #     Scalar: str, bool, None.
>>>>      #     Non-scalar: List, Dict
>>>>      # _value = Union[str, bool, None, Dict[str, TreeValue], 
>>>> List[TreeValue]]
>>>>      #
>>>>      # With optional annotations, the type of all values is:
>>>>      # TreeValue = Union[_value, Annotated[_value]]
>>>>      #
>>>>      # Sadly, mypy does not support recursive types, so we must 
>>>> approximate this.
>>>>      _stub = Any
>>>>      _scalar = Union[str, bool, None]
>>>>      _nonscalar = Union[Dict[str, _stub], List[_stub]]
>>>>>    _value = Union[_scalar, _nonscalar]
>>>>>    TreeValue = Union[_value, 'Annotated[_value]']
>> 
>> I'm once again terminally confused about when to use _lower_case and
>> when to use CamelCase for such variables.
>> 
>
> That's my fault for not using them consistently.
>
> Generally:
>
> TitleCase: Classes, Real Type Names :tm:
> lowercase: instance names (and certain built-in types like str/bool/int)
> UPPERCASE: "Constants". This is an extremely loose idea in Python.
>
> I use the "_" prefix for any of the above categories to indicate 
> something not intended to be used outside of the current scope. These 
> types won't be accessible outside the module by default.
>
> TypeVars I use "T", "U", "V", etc unless I bind them to another type; 
> then I use e.g. NodeT instead.
>
> When it comes to things like type aliases, I believe I instinctively 
> used lowercase because I am not creating a new Real Type and wanted some 
> visual distinction from a real class name. (aliases created in this way 
> cannot be used with isinstance and hold no significance to mypy.)
>
> That's why I used _stub, _scalar, _nonscalar, and _value for those types 
> there. Then I disregarded my own convention and used TreeValue; perhaps 
> that ought to be tree_value for consistency as it's not a Real Type :tm:
>
> ...but then we have the SchemaInfo type aliases, which I named using the 
> same type name as they use in QAPI to help paint the association (and 
> pick up 'git grep' searchers.)
>
> Not fantastically consistent, sorry. Feel free to express a preference, 
> I clearly don't have a universally applied one.
>
> (Current leaning: rename TreeValue to tree_value, but leave everything 
> else as it is.)

https://www.python.org/dev/peps/pep-0484/#type-aliases

    Note that we recommend capitalizing alias names, since they
    represent user-defined types, which (like user-defined classes) are
    typically spelled that way.

I think this wants names like _Scalar, _NonScalar, _Value, TreeValue.

>> The reader has to connect _stub = Any back "we must approximate this".
>> Hmm... "we approximate with Any"?
>> 
>
> I can try to be more explicit about it.
>
>>>>>    
>>>>> +# This is a (strict) alias for an arbitrary object non-scalar, as above:
>>>>> +_DObject = Dict[str, object]
>>>>
>>>> Sounds greek :)
>>>>
>>>
>>> Admittedly it is still not explained well ... until the next patch. I'm
>>> going to leave it alone for now until you have a chance to respond to
>>> these walls of text.
>> 
>> You explain it some futher down.
>> 
>>>> It's almost the Dict part of _nonscalar, but not quite: object vs. Any.
>>>>
>>>> I naively expect something closer to
>>>>
>>>>      _scalar = ...
>>>>      _object = Dict[str, _stub]
>>>>      _nonscalar = Union[_object, List[_stub]
>>>>
>>>> and (still naively) expect _object to be good enough to serve as type
>>>> annotation for dicts representing JSON objects.
>>>
>>> "_object" would be good, except ... I am trying to avoid using that word
>>> because what does it mean? Python object? JSON object? Here at the
>>> boundary between two worlds, nothing makes sense.
>> 
>> Naming is hard.
>> 
>
> Yep. We can skip this debate by just naming the incoming types 
> SchemaInfo and similar... (cont'd below)
>
>> We talked about these names in review of v2.  Let me try again.
>> 
>> introspect.py needs to generate (a suitable C representation of) an
>> instance of QAPI type '[SchemaInfo]'.
>> 
>> Its current choice of "suitable C representation" is "a QLitQObject
>> initializer with #if and comments".  This is a "lose" representation:
>> QLitQObject can encode pretty much anything, not just instances of
>> '[SchemaInfo]'.
>> 
>> C code converts this QLitQObject to a SchemaInfoList object[*].
>> SchemaInfoList is the C type for QAPI type '[SchemaInfo]'.  Automated
>> tests ensure this conversion cannot fail, i.e. the "lose" QLitQObject
>> actually encodes a '[SchemaInfo]'.
>> 
>> introspect.py separates concerns: it first builds an abstract
>> representation of "set of QObject with #if and comments", then generates
>> C code from that.
>> 
>> Why "QObject with #if and comments", and not "QLitQObject with #if and
>> comments"?  Because QLitQObject is *one* way to represent QObject, and
>> we don't care which way outside C code generation.
>> 
>> A QObject represents a JSON value.  We could just as well say "JSON
>> value with #if and comments".
>> 
>> So, the abstract representation of "JSON value with #if and comments" is
>> what we're trying to type.  If you'd rather say "QObject with #if and
>> comments", that's fine.
>> 
>> Our abstract representation is a tree, where
>> 
>> * JSON null / QNull is represented as Python None
>> 
>> * JSON string / QString as str
>> 
>> * JSON true and false / QBool as bool
>> 
>> * JSON number / QNum is not implemented
>> 
>> * JSON object / QDict is dict mapping string keys to sub-trees
>> 
>> * JSON array / QList is list of sub-trees
>> 
>> * #if and comment tacked to a sub-tree is represented by wrapping the
>>    subtree in Annotated
>> 
>>    Wrapping a sub-tree that is already wrapped seems mostly useless, but
>>    the code doesn't care.
>> 
>>    Wrapping dictionary values makes no sense.  The code doesn't care, and
>>    gives you GIGO.
>> 
>>    Making the code reject these two feels out of scope.  If you want to
>>    anyway, I won't object unless it gets in the way of "in scope" stuff
>>    (right now it doesn't seem to).
>> 
>> Let me stress once again: this is *not* an abstract representation of a
>> 'SchemaInfo'.  Such a representation would also work, and you might like
>> it better, but it's simply not what we have.  Evidence: _tree_to_qlit()
>> works fine for *any* tree, not just for trees that encode instances of
>> 'SchemaInfo'.
>> 
>
> ... as long as you don't feel that's incorrect to do. We are free to 
> name those structures SchemaInfo but type _tree_to_qlit() in terms of 
> generic Dict objects, leaving us without a middle-abstract thing to name 
> at all.
>
> Based on your review of the "dummy types" patch, I'm going to assume 
> that's fine.

I guess it's okayish enough.  It still feels more complicated to me than
it needs to be.

QAPISchemaGenIntrospectVisitor an abstract representation of "QObject
with #if and comments" for each SchemaInfo.

This is not really a representation of SchemaInfo.  We can choose to
name it that way regardless, if it helps, and we explain it properly.

Once we hand off the data to _tree_to_qlit(), we can't name it that way
anymore, simply because _tree_to_qlit() treats it as the stupid
recursive data structure it is, and doesn't need or want to know about
SchemaInfo.

I think I'd dispense with _DObject entirely, and use TreeValue
throughout.  Yes, we'd use Any a bit more.  I doubt the additional
complexity to *sometimes* use object instead is worthwhile.  This data
structure is used only within this file.  It pretty much never changes
(because JSON doesn't).  It's basically write-only in
QAPISchemaGenIntrospectVisitor.  This means all the extra typing work
buys us is use of object instead of Any where it doesn't actually
matter.

I would use a more telling name than TreeValue, though.  One that
actually hints at the kind of value "representation of QObject with #if
and comment".

>> Since each (sub-)tree represents a JSON value / QObject, possibly with
>> annotations, I'd like to propose a few "obvious" (hahaha) names:
>> 
>> * an unannotated QObject: QObject
>> 
>> * an annotated QObject: AnnotatedQObject
>> 
>> * a possibly annotated QObject: PossiblyAnnotatedQObject
>> 
>>    Too long.  Rename QObject to BareQObject, then call this one QObject.
>> 
>> This gives us:
>> 
>>      _BareQObject = Union[None, str, bool, Dict[str, Any], List[Any]]
>>      _AnnotatedQObject = Annotated[_QObject]
>>      _QObject = Union[_BareQObject, _AnnotatedQObject]
>> 
>> Feel free to replace QObject by JsonValue in these names if you like
>> that better.  I think I'd slightly prefer JsonValue right now.
>> 
>> Now back to _DObject:
>> 
>>> (See patch 12/14 for A More Betterer Understanding of what _DObject is
>>> used for. It will contribute to A Greater Understanding.)
>>>
>>> Anyway, to your questions;
>>>
>>> (1) _DObject was my shorthand garbage way of saying "This is a Python
>>> Dict that represents a JSON object". Hence Dict-Object, "DObject". I
>>> have also derisively called this a "dictly-typed" object at times.
>> 
>> In the naming system I proposed, this is BareQDict, with an additional
>> complication: we actually have two different types for the same thing,
>> an anonymous one within _BareQObject, and a named one.
>> 
>>> (2) Dict[str, Any] and Dict[str, object] are similar, but do have a
>> 
>> The former is the anonymous one, the latter the named one.
>> 
>
> Kinda-sorta. I am talking about pure mypy here, and the differences 
> between typing two things this way.
>
> Though I think you're right: I used the "Any" form for the anonymous 
> type (inherent to the structure of a JSON compound type) and the 
> "object" form for the named forms (The SchemaInfo objects we build in 
> the visitors to pass to the generator later).
>
>>> semantic difference. I alluded to it by calling this a "(strict) alias";
>>> which does not help you understand any of the following points:
>>>
>>> Whenever you use "Any", it basically turns off type-checking at that
>>> boundary; it is the gradually typed boundary type. Avoid it whenever
>>> reasonably possible.
>>>
>>> Example time:
>>>
>>>
>>> def foo(thing: Any) -> None:
>>>       print(thing.value)  # Sure, I guess! We'll believe you.
>>>
>>>
>>> def foo(thing: object) -> None:
>>>       print(thing.value)  # BZZT, Python object has no value prop.
>>>
>>>
>>> Use "Any" when you really just cannot constrain the type, because you're
>>> out of bourbon or you've decided to give in to the darkness inside your
>>> heart.
>>>
>>> Use "object" when the type of the value actually doesn't matter, because
>>> you are only passing it on to something else later that will do its own
>>> type analysis or introspection on the object.
>>>
>>> For introspect.py, 'object' is actually a really good type when we can
>>> use it, because we interrogate the type exhaustively upon receipt in
>>> _tree_to_qlit.
>>>
>>>
>>> That leaves one question you would almost assuredly ask as a followup:
>>>
>>> "Why didn't you use object for the stub type to begin with?"
>>>
>>> Let's say we define _stub as `object` instead, the Python object. When
>>> _tree_to_qlit recurses on non-scalar structures, the held value there is
>>> only known as "object" and not as str/bool/None, which causes a typing
>>> error at that point.
>>>
>>> Moving the stub to "Any" tells mypy to ... not worry about what type we
>>> actually passed here. I gave in to the darkness in my heart. It's just
>>> too annoying without real recursion.
>> 
>> May I have an abridged version of this in the comments?  It might look
>> quaint in ten years, when we're all fluent in Python type annotations.
>> But right now, at least some readers aren't, and they can use a bit of
>> help.
>> 
>
> Yeah, I'm sympathetic to that.... though I'm not sure what to write or 
> where. I can add some reference points in the commit message, like this one:
>
> https://mypy.readthedocs.io/en/stable/dynamic_typing.html#any-vs-object
>
> maybe in conjunction with the named type aliases patch this is actually 
> sufficient?

I can see two solutions right now:

1. Use Dict[str, Any] throughout

   All we need to explain is

   * What the data structure is about (JSON annotated with ifconds and
     comments; got that, could use improvement perhaps)

   * Your work-around for the lack of recursive types (got that
     already)

   * That the use of Any bypasses type static checking on use (shouldn't
     be hard)

   * Where such uses are (I believe only in _tree_to_qlit(), were Any
     can't be avoided anyway).

2. Use Dict[str, object] where we can

   Now we get to explain a few more things:

   * Why we bother (to get stricter static type checks on use)

   * Where such uses are (I can't see any offhand)

   * Maybe also where we go from one static type to the other.

In either case, we also need to pick names that need no explanation, or
explain them.

>> [*] Actually, we take a shortcut and convert straight to QObject, but
>> that's just laziness.  See qmp_query_qmp_schema()'s "Minor hack:"
>> comment.
>> 
>
> :)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]