qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 11/14] qapi/introspect.py: add type hint annotations


From: John Snow
Subject: Re: [PATCH v4 11/14] qapi/introspect.py: add type hint annotations
Date: Wed, 10 Feb 2021 12:31:48 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0

On 2/9/21 4:06 AM, Markus Armbruster wrote:
John Snow <jsnow@redhat.com> writes:
On 2/5/21 8:42 AM, Markus Armbruster wrote:
John Snow <jsnow@redhat.com> writes:

https://www.python.org/dev/peps/pep-0484/#type-aliases

     Note that we recommend capitalizing alias names, since they
     represent user-defined types, which (like user-defined classes) are
     typically spelled that way.

I think this wants names like _Scalar, _NonScalar, _Value, TreeValue.


Yeah, that seems like the only way to make it consistent and not have pylint yell at me. I will try to adhere to this in the future, but maybe pylint needs a bug report to make it complain in these cases, too.

[...]


... as long as you don't feel that's incorrect to do. We are free to
name those structures SchemaInfo but type _tree_to_qlit() in terms of
generic Dict objects, leaving us without a middle-abstract thing to name
at all.

Based on your review of the "dummy types" patch, I'm going to assume
that's fine.

I guess it's okayish enough.  It still feels more complicated to me than
it needs to be.

QAPISchemaGenIntrospectVisitor an abstract representation of "QObject
with #if and comments" for each SchemaInfo.

This is not really a representation of SchemaInfo.  We can choose to
name it that way regardless, if it helps, and we explain it properly.

In that: SchemaInfo do not have annotations, but we do. Our SchemaInfo objects here are in a kind of superposition in that we have not yet applied the if conditionals.

Still, I do think it is *very* helpful to name those instances after the SchemaInfo types, because that is absolutely the interface we are matching. The keys are not arbitrary. The types of the values associated with those keys are not arbitrary.

So, I am not sure how useful it is to make such a careful distinction. My instinct is "not very, especially for passers-by to this module."


Once we hand off the data to _tree_to_qlit(), we can't name it that way
anymore, simply because _tree_to_qlit() treats it as the stupid
recursive data structure it is, and doesn't need or want to know about
SchemaInfo.


Yes, this is fine: the data is being interpreted in a new semantic context. It keeps the mechanical type but loses the semantic information. That sounds normal to me.

"Why bother, then?"

Mostly for the notational benefit in the code BUILDING the objects. _tree_to_qlit is so generic you can barely describe it, but the objects we build to feed it are quite concrete and have names and definitions that can be referenced.

I think I'd dispense with _DObject entirely, and use TreeValue
throughout.  Yes, we'd use Any a bit more.  I doubt the additional
complexity to *sometimes* use object instead is worthwhile.  This data

I have gotten rid of _DObject entirely in v5; though using "Any" everywhere doesn't seem like an obvious win to me, because I'd need to turn this:

_NonScalar = Union[Dict[str, _Stub], List[_Stub]]
[...]
SchemaInfo = Dict[str, object]
[...]

into this:

_JSONObject = Dict[str, _Stub]
_JSONArray = List[_Stub]
_NonScalar = Union[_JSONObject, _JSONArray]
[...]
SchemaInfo = _JSONObject
[...]

Which doesn't really strike me as any simpler or nicer on either the semantic or mechanical axes.

structure is used only within this file.  It pretty much never changes
(because JSON doesn't).  It's basically write-only in
QAPISchemaGenIntrospectVisitor.  This means all the extra typing work

Write-only variables need typing! mypy will assume these objects are Dict[str, str] otherwise. We have to type them as something.

And the way I typed them ... is correct, and avoided having to name two more intermediary types.

buys us is use of object instead of Any where it doesn't actually
matter.


Maybe so. Comes down to habits. My current belief is "Do not use Any if you do not have to." I did not have to, so I didn't.

I would use a more telling name than TreeValue, though.  One that
actually hints at the kind of value "representation of QObject with #if
and comment".


We discussed this on IRC; ultimately I wasn't convinced of the utility of naming the tree type "QObject" on the logic that if QLit is a QObject, a function named "QObject to QLit" didn't make sense to me anymore.

Since each (sub-)tree represents a JSON value / QObject, possibly with
annotations, I'd like to propose a few "obvious" (hahaha) names:

* an unannotated QObject: QObject

* an annotated QObject: AnnotatedQObject

* a possibly annotated QObject: PossiblyAnnotatedQObject

    Too long.  Rename QObject to BareQObject, then call this one QObject.

This gives us:

      _BareQObject = Union[None, str, bool, Dict[str, Any], List[Any]]
      _AnnotatedQObject = Annotated[_QObject]
      _QObject = Union[_BareQObject, _AnnotatedQObject]

Feel free to replace QObject by JsonValue in these names if you like
that better.  I think I'd slightly prefer JsonValue right now.


On IRC, We agreed to disagree on the semantic name and use the more mechanically suggestive JsonValue instead. I'll give that a spin.

(It's also kinda-sorta wrong, but everything has felt kinda-sorta wrong to me so far. Guess it's no better or worse.)

Now back to _DObject:

(See patch 12/14 for A More Betterer Understanding of what _DObject is
used for. It will contribute to A Greater Understanding.)

Anyway, to your questions;

(1) _DObject was my shorthand garbage way of saying "This is a Python
Dict that represents a JSON object". Hence Dict-Object, "DObject". I
have also derisively called this a "dictly-typed" object at times.

In the naming system I proposed, this is BareQDict, with an additional
complication: we actually have two different types for the same thing,
an anonymous one within _BareQObject, and a named one.

(2) Dict[str, Any] and Dict[str, object] are similar, but do have a

The former is the anonymous one, the latter the named one.


Kinda-sorta. I am talking about pure mypy here, and the differences
between typing two things this way.

Though I think you're right: I used the "Any" form for the anonymous
type (inherent to the structure of a JSON compound type) and the
"object" form for the named forms (The SchemaInfo objects we build in
the visitors to pass to the generator later).

semantic difference. I alluded to it by calling this a "(strict) alias";
which does not help you understand any of the following points:

Whenever you use "Any", it basically turns off type-checking at that
boundary; it is the gradually typed boundary type. Avoid it whenever
reasonably possible.

Example time:


def foo(thing: Any) -> None:
       print(thing.value)  # Sure, I guess! We'll believe you.


def foo(thing: object) -> None:
       print(thing.value)  # BZZT, Python object has no value prop.


Use "Any" when you really just cannot constrain the type, because you're
out of bourbon or you've decided to give in to the darkness inside your
heart.

Use "object" when the type of the value actually doesn't matter, because
you are only passing it on to something else later that will do its own
type analysis or introspection on the object.

For introspect.py, 'object' is actually a really good type when we can
use it, because we interrogate the type exhaustively upon receipt in
_tree_to_qlit.


That leaves one question you would almost assuredly ask as a followup:

"Why didn't you use object for the stub type to begin with?"

Let's say we define _stub as `object` instead, the Python object. When
_tree_to_qlit recurses on non-scalar structures, the held value there is
only known as "object" and not as str/bool/None, which causes a typing
error at that point.

Moving the stub to "Any" tells mypy to ... not worry about what type we
actually passed here. I gave in to the darkness in my heart. It's just
too annoying without real recursion.

May I have an abridged version of this in the comments?  It might look
quaint in ten years, when we're all fluent in Python type annotations.
But right now, at least some readers aren't, and they can use a bit of
help.


Yeah, I'm sympathetic to that.... though I'm not sure what to write or
where. I can add some reference points in the commit message, like this one:

https://mypy.readthedocs.io/en/stable/dynamic_typing.html#any-vs-object

maybe in conjunction with the named type aliases patch this is actually
sufficient?

I can see two solutions right now:

1. Use Dict[str, Any] throughout

    All we need to explain is

    * What the data structure is about (JSON annotated with ifconds and
      comments; got that, could use improvement perhaps)

    * Your work-around for the lack of recursive types (got that
      already)

    * That the use of Any bypasses type static checking on use (shouldn't
      be hard)

    * Where such uses are (I believe only in _tree_to_qlit(), were Any
      can't be avoided anyway).

2. Use Dict[str, object] where we can

    Now we get to explain a few more things:

    * Why we bother (to get stricter static type checks on use)

    * Where such uses are (I can't see any offhand)

    * Maybe also where we go from one static type to the other.

In either case, we also need to pick names that need no explanation, or
explain them.


"that need no explanation" (to whom?) Suspect this is impossible; there's gonna be explanations anyway.

--js




reply via email to

[Prev in Thread] Current Thread [Next in Thread]