qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH RFC 0/7] RFC: Asynchronous QMP Draft


From: John Snow
Subject: Re: [PATCH RFC 0/7] RFC: Asynchronous QMP Draft
Date: Wed, 14 Apr 2021 15:17:48 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1

First and foremost, thank you for reviewing this! It is very helpful to me to see what others think of this pet project I've been growing in the privacy of my own mind.

On 4/14/21 2:38 AM, Stefan Hajnoczi wrote:
Below are the API docs that I found helpful for understanding the big
picture.

The QMP.execute() API is nice.


Yes. It mimics (sync) qmp.command(), which I believe Eduardo Habkost wrote. I think it's the correct idea for a generic (QAPI-schema ignorant) QMP client library meant to be "used".

I think raising RPC in-band execution errors as exceptions is a nice "pythonic" way to do it.

(And, if desired, it is possible to use the QAPI generator to generate wrappers around this interface using type-safe arguments in a low-level SDK layer. I think that would be pretty swell. We are not there yet, though, and I'll focus on this layer first.)

Regarding QMP events, I can think of two approaches:
1. Callbacks
2. An async get_event(name=Optional[str]) -> object API
    (plus get_event_nowait(name=Optional[str]) -> object)

(There's probably a third approach using async iterators but it's
similar to get_event().)

Both approaches are useful. The first is good in larger asynchronous
applications that perform many tasks concurrently. The second is good
when there is just one specific thing to do, like waiting for a block
job to complete.

(1) On callbacks:

Callbacks are what I meagerly mocked up; discord.py has a "cute" little hack that works like this:

bot = commands.bot(...)

@bot.event
async def on_ready():
    print("Logged in as")
    print(bot.user.name)
    ...

(See https://github.com/Rapptz/discord.py/blob/master/examples/basic_bot.py )

I find this to be extremely cute: the framework uses the name of the callback to determine which event you are registering, and uses the decorator to merely register the callback.

This makes a nice, uncomplicated way to plug coroutines into the state machine of the client loop in the most basic cases.

I thought it might be nice to try and mimic that design, by perhaps using the names of QMP events as their own 'queues', and then dispatching user callbacks as desired. (Possibly with one mega-queue that exists for ALL callbacks.)

For instance, something like this:

@qmp.event
async def job_status_block_job_ready(qmp, event):
    ...

or more generally,

@qmp.event_handler
async def my_own_event_handler(qmp, event):
    ...

I didn't spend much time on the actual queue or dispatch mechanism in my draft, though, but it could be "bolstered" into a more full-fledged API if desired.

One nice thing about this design is that events aren't "consumed" by a caller, they are just dispatched to anyone waiting on an event of that type.

As I recall, events getting "eaten" at the wrong time was a major burden when writing iotests that exercised multiple jobs, transactions, etc.

(A side note: a higher-level VM class that uses QMP may wish to capture certain events to record state changes, such that the state can be queried at an arbitrary point by any number of callers without needing to have witnessed the state change event personally. That's not really important here in the protocol library, though, which will pretend not to know which events exist -- but it's a consideration for making sure the design that IS chosen can be extensible to support that kind of thing.)


(2) On get_event or async iterators:

This is likely a good ad-hoc feature. Should it only work for events that are delivered from that moment in time, or should there be a "backlog" of events to deliver?

Should waiting on events in this manner "consume" the event from the backlog, if we have one?

My concern::

  await qmp.execute('blockdev-backup', {...etc...})
  async for event in qmp.get_events():
      ...


It's possible that an event we'd like to see has already occurred by the time we get around to invoking the async iterator. You'd really want to start checking for events *before* you issue the job request, but that involves tasks, and the code doesn't "flow" well anymore.

I don't have ideas, at-present, for how to make things like iotests "flow" well in a linear co-routine sense...

...although, maybe it's worth creating something like an Event Listener object that, from its creation, stashes events from that point forward. How about::

  async with qmp.event_listener() as events:
      await qmp.execute('blockdev-backup', {...})
      async for event in events:
          ...

Actually, that seems pretty cool. What do you think? I think it's fairly elegant for ad-hoc use. We could even extend the constructor to accept filtering criteria if we wanted to, later.

Possibly we could also augment the Event Listener object to support a few methods to facilitate blocking until a certain event occurs, like::

  async with qmp.event_listener() as events:
      await qmp.execute('blockdev-backup', {...})
      await events.event('JOB_STATUS_CHANGE', status="pending")
      await qmp.execute('job-finalize', {...})
      ...


I think that's pretty workable, actually! And it could co-exist perfectly well alongside event callback handlers.


My general impression is that the public API is nice and usable but the
implementation is complex and risks discouraging other people from
hacking on the code. There are too many abstractions and while it's
highly structured, there is a cost to having all this infrastructure. I
think simplifying it would make it easier for others to understand and
contribute to the code.


It's a fair point. I am hoping that the protocol.py layers won't need to be touched quite so much. (Famous last words) and that most of the interesting work can happen at the qmp_protocol.py level and above, though.

Ideas: open code or inline simple things instead of defining
abstractions that only have 1 user, drop the pydantic models, drop
classes that just wrap things like Message and the exception hierarchy,
combine protocol and qmp_protocol.


(1) On models:

Pydantic models are definitely optional at this stage, but I am floating them here to prepare people for the idea that I might try to get more mileage out of them in the future to offer a type-safe, QAPI-aware SDK layer.

They're definitely only a mild benefit here, for now, as the strict typing they help provide is not floated upwards or exposed to the user.


(2) On the Message class:

I'll try a draft where I try to drop or simplify the Message class. It seems like a good candidate, but I think I'm subjectively afraid I won't like the inlining. We'll see!


(3) On the Exception hierarchy:

Let's talk about error handling design a little bit more. I want to make sure that the errors that can happen and in which circumstances are obvious and have good names, but there might be better approaches to managing that complexity.


(4) On combining protocol and qmp_protocol:

Maybe. Do you want to look at the qtest implementation? It's somewhat ancillary to this project, but felt it would make a nice companion library. It doesn't benefit as strongly as QMP (As it does not offer anything like OOB), but it does have async messages it can send, so it can re-use the same infrastructure.

(Fully admit that the first draft, of course, did feature a combined protocol/qmp_protocol class. It was split out later.)

Things that might be worth adding:
1. File descriptor passing support.

Do you have an example workflow that I can use to test this? This is a weak spot in my knowledge.

2. Introspection support to easily check if a command/feature is
    available. Users can do this manually by sending QMP commands and
    interpreting the response, but this may be common enough to warrant a
    friendly API.


I think this treads into QAPI-specific domain knowledge, and I might leave such features to a higher-level object.

The QMP spec itself does not define a mechanism by which the QMP protocol itself will reveal the valid commands, and so it might be up to a machine.py-based extension/capsulation of qmp_protocol to provide that.

(Though, I do agree; I want this feature somewhere. We do have such a thing coded into the existing qmp-shell tool, using the query-commands command. Maybe I can offer a subclass that offers some of these convenience features using a best-effort guess-and-check style introspection. Please forgive me if I focus on shoring up the design of the core implementation first.)

Help on module qmp.qmp_protocol in qmp:

NAME
     qmp.qmp_protocol - QMP Client Implementation

DESCRIPTION
     This module provides the QMP class, which can be used to connect and
     send commands to a QMP server such as QEMU. The QMP class can be used to
     either connect to a listening server, or used to listen and accept an
     incoming connection from the server.

CLASSES
     qmp.error.AQMPError(builtins.Exception)
         ExecuteError
     qmp.protocol.AsyncProtocol(typing.Generic)
         QMP
class ExecuteError(qmp.error.AQMPError)
      |  ExecuteError(sent: qmp.message.Message, received: qmp.message.Message, 
error: qmp.models.ErrorInfo)
      |
      |  Execution statement returned failure.
      |
      |  Method resolution order:
      |      ExecuteError
      |      qmp.error.AQMPError
      |      builtins.Exception
      |      builtins.BaseException
      |      builtins.object
      |
      |  Methods defined here:
      |
      |  __init__(self, sent: qmp.message.Message, received: 
qmp.message.Message, error: qmp.models.ErrorInfo)
      |      Initialize self.  See help(type(self)) for accurate signature.
      |
      |  __str__(self) -> str
      |      Return str(self).
      |
      |  ----------------------------------------------------------------------
      |  Data descriptors inherited from qmp.error.AQMPError:
      |
      |  __weakref__
      |      list of weak references to the object (if defined)
      |
      |  ----------------------------------------------------------------------
      |  Static methods inherited from builtins.Exception:
      |
      |  __new__(*args, **kwargs) from builtins.type
      |      Create and return a new object.  See help(type) for accurate 
signature.
      |
      |  ----------------------------------------------------------------------
      |  Methods inherited from builtins.BaseException:
      |
      |  __delattr__(self, name, /)
      |      Implement delattr(self, name).
      |
      |  __getattribute__(self, name, /)
      |      Return getattr(self, name).
      |
      |  __reduce__(...)
      |      Helper for pickle.
      |
      |  __repr__(self, /)
      |      Return repr(self).
      |
      |  __setattr__(self, name, value, /)
      |      Implement setattr(self, name, value).
      |
      |  __setstate__(...)
      |
      |  with_traceback(...)
      |      Exception.with_traceback(tb) --
      |      set self.__traceback__ to tb and return self.
      |
      |  ----------------------------------------------------------------------
      |  Data descriptors inherited from builtins.BaseException:
      |
      |  __cause__
      |      exception cause
      |
      |  __context__
      |      exception context
      |
      |  __dict__
      |
      |  __suppress_context__
      |
      |  __traceback__
      |
      |  args
class QMP(qmp.protocol.AsyncProtocol)
      |  QMP(name: Optional[str] = None) -> None
      |
      |  Implements a QMP connection to/from the server.
      |
      |  Basic usage looks like this::
      |
      |    qmp = QMP('my_virtual_machine_name')
      |    await qmp.connect(('127.0.0.1', 1234))
      |    ...
      |    res = await qmp.execute('block-query')
      |    ...
      |    await qmp.disconnect()
      |

This reminds me.

I was briefly considering the idea that the QMP object could be converted into something like a QMP session factory instead, and you could use a context manager.

e.g.

qmp = QMP(session_factory_settings)
async with qmp.connect(...) as session:
    ...
    ...

with disconnect() being implicitly called upon the destruction of the session object when it goes out of scope.

I could also simply mock this up without creating a factory/session split, just by having the context manager simply return 'self':

proto = QMP(various_settings)
async with proto.connect(...) as qmp:
    ...
    ...

Though it's a little hacky, as with-expressions are expected to return *something*, and we actually already have a handle to that object.

      |  :param name: Optional nickname for the connection, used for logging.
      |
      |  Method resolution order:
      |      QMP
      |      qmp.protocol.AsyncProtocol
      |      typing.Generic
      |      builtins.object
      |
      |  Methods defined here:
      |
      |  __init__(self, name: Optional[str] = None) -> None
      |      Initialize self.  See help(type(self)) for accurate signature.
      |
      |  async execute(self, cmd: str, arguments: Optional[Mapping[str, object]] = 
None, oob: bool = False) -> object
      |      Execute a QMP command and return the response.
      |
      |      :param cmd: QMP command name.
      |      :param arguments: Arguments (if any). Must be JSON-serializable.
      |      :param oob: If true, execute "out of band".
      |
      |      :raise: ExecuteError if the server returns an error response.
      |      :raise: DisconnectedError if the connection was terminated early.
      |
      |      :return: Execution response from the server. The type of object 
depends
      |               on the command that was issued, though most return a dict.
      |
      |  async execute_msg(self, msg: qmp.message.Message) -> object
      |      Execute a QMP message and return the response.
      |
      |      :param msg: The QMP `Message` to execute.
      |      :raises: ValueError if the QMP `Message` does not have either the
      |               'execute' or 'exec-oob' fields set.
      |      :raises: ExecuteError if the server returns an error response.
      |      :raises: DisconnectedError if the connection was terminated early.
      |
      |      :return: Execution response from the server. The type of object 
depends
      |               on the command that was issued, though most return a dict.
      |
      |  on_event(self, func: Callable[[ForwardRef('QMP'), qmp.message.Message], 
Awaitable[NoneType]]) -> Callable[[ForwardRef('QMP'), qmp.message.Message], 
Awaitable[NoneType]]
      |      FIXME: Quick hack: decorator to register event handlers.
      |
      |      Use it like this::
      |
      |        @qmp.on_event
      |        async def my_event_handler(qmp, event: Message) -> None:
      |          print(f"Received event: {event['event']}")
      |
      |      RFC: What kind of event handler would be the most useful in
      |      practical terms? In tests, we are usually waiting for an
      |      event with some criteria to occur; maybe it would be useful
      |      to allow "coroutine" style functions where we can block
      |      until a certain event shows up?
      |
      |  ----------------------------------------------------------------------
      |  Class methods defined here:
      |
      |  make_execute_msg(cmd: str, arguments: Optional[Mapping[str, object]] = 
None, oob: bool = False) -> qmp.message.Message from builtins.type
      |      Create an executable message to be sent by `execute_msg` later.
      |
      |      :param cmd: QMP command name.
      |      :param arguments: Arguments (if any). Must be JSON-serializable.
      |      :param oob: If true, execute "out of band".
      |
      |      :return: An executable QMP message.
      |
      |  ----------------------------------------------------------------------
      |  Data and other attributes defined here:
      |
      |  __orig_bases__ = (qmp.protocol.AsyncProtocol[qmp.message.Message],)
      |
      |  __parameters__ = ()
      |
      |  logger = <Logger qmp.qmp_protocol (WARNING)>
      |
      |  ----------------------------------------------------------------------
      |  Methods inherited from qmp.protocol.AsyncProtocol:
      |
      |  async accept(self, address: Union[str, Tuple[str, int]], ssl: 
Optional[ssl.SSLContext] = None) -> None
      |      Accept a connection and begin processing message queues.
      |
      |      :param address: Address to connect to;
      |                      UNIX socket path or TCP address/port.
      |      :param ssl:     SSL context to use, if any.
      |
      |      :raise: `StateError`   (loop is running or disconnecting.)
      |      :raise: `ConnectError` (Connection was not successful.)
      |
      |  async connect(self, address: Union[str, Tuple[str, int]], ssl: 
Optional[ssl.SSLContext] = None) -> None
      |      Connect to the server and begin processing message queues.
      |
      |      :param address: Address to connect to;
      |                      UNIX socket path or TCP address/port.
      |      :param ssl:     SSL context to use, if any.
      |
      |      :raise: `StateError`   (loop is running or disconnecting.)
      |      :raise: `ConnectError` (Connection was not successful.)
      |
      |  async disconnect(self) -> None
      |      Disconnect and wait for all tasks to fully stop.
      |
      |      If there were exceptions that caused the bottom half to terminate
      |      prematurely, they will be raised here.
      |
      |      :raise: `Exception`      Arbitrary exceptions re-raised on behalf 
of
      |                               the bottom half.
      |      :raise: `MultiException` Iterable Exception used to multiplex 
multiple
      |                               exceptions when multiple tasks failed.
      |
      |  ----------------------------------------------------------------------
      |  Readonly properties inherited from qmp.protocol.AsyncProtocol:
      |
      |  disconnecting
      |      Return True when the loop is disconnecting, or disconnected.
      |
      |  running
      |      Return True when the loop is currently connected and running.
      |
      |  unconnected
      |      Return True when the loop is fully idle and quiesced.
      |
      |      Returns True specifically when the loop is neither `running`
      |      nor `disconnecting`. A call to `disconnect()` is required
      |      to transition from `disconnecting` to `unconnected`.

Any thoughts on these? I think I accidentally created some landmines here, actually.

disconnecting: This one is, I think, consistent with other primitives in Python like "is_closing", where it also does include the "fully closed" state. It hopefully adheres to the principle of least surprise.


running: This one is maybe bad, though. I mean this to be "fully running", i.e. the loop is fully open and nothing is wrong. I believe it is not true until sometime "in the middle" of the accept() or connect() calls; i.e. after the Reader/Writer tasks are started. I worry that this is a confusing point *within* this library.

Maybe it ought to report "true" as soon as we start trying to build a session; and I may need other internal helpers for testing more specific conditions inside the loop.


unconnected: I think this name is just genuinely bad, but I needed some kind of state to represent "fully quisced, not connected, and idle." i.e. that we were prepared and able to issue another accept() or connect() call.

"quiesced"? (Kind of jargon-y.)

"disconnected"? (Viable, but introduces ambiguity between fully-disconnected but waiting for the user to collect loop status/exceptions.)

"idle"? (Might be confused with a loop that's open.)


... Maybe it doesn't need to be externally exposed at all, anyway. The primary purpose for it is to safe-guard calls to connect() and accept() such that if the loop is engaged or not-yet-cleaned up, that these calls will bark an error back at the user.

Perhaps an internal property would suffice in that case.

_has_session True/False might suffice for this concept.

Thoughts?

      |
      |  ----------------------------------------------------------------------
      |  Data descriptors inherited from qmp.protocol.AsyncProtocol:
      |
      |  __dict__
      |      dictionary for instance variables (if defined)
      |
      |  __weakref__
      |      list of weak references to the object (if defined)
      |
      |  ----------------------------------------------------------------------
      |  Class methods inherited from typing.Generic:
      |
      |  __class_getitem__(params) from builtins.type
      |
      |  __init_subclass__(*args, **kwargs) from builtins.type
      |      This method is called when a class is subclassed.
      |
      |      The default implementation does nothing. It may be
      |      overridden to extend subclasses.


From here down, ...

DATA
     Awaitable = typing.Awaitable
         A generic version of collections.abc.Awaitable.
Callable = typing.Callable
         Callable type; Callable[[int], str] is a function of (int) -> str.
The subscription syntax must always be used with exactly two
         values: the argument list and the return type.  The argument list
         must be a list of types or ellipsis; the return type must be a single 
type.
There is no syntax to indicate optional or keyword arguments,
         such function types are rarely used as callback types.
Dict = typing.Dict
         A generic version of dict.
List = typing.List
         A generic version of list.
Mapping = typing.Mapping
         A generic version of collections.abc.Mapping.
Optional = typing.Optional
         Optional type.
Optional[X] is equivalent to Union[X, None]. Tuple = typing.Tuple
         Tuple type; Tuple[X, Y] is the cross-product type of X and Y.
Example: Tuple[T1, T2] is a tuple of two elements corresponding
         to type variables T1 and T2.  Tuple[int, float, str] is a tuple
         of an int, a float and a string.
To specify a variable-length tuple of homogeneous type, use Tuple[T, ...].


... Oops, I need to "hide" some more of those things from help, I think by specifying __all__ in that module I can restrict some of these things that aren't interesting to see in the help menu.

FILE
     /tmp/foo/qmp/qmp_protocol.py



In general, do you feel this design is roughly serviceable and worth pursuing cleanups for? I realize it's a bit "much" but as the audience extends beyond our castle walls, I wanted to be quite thorough. It's a design that's likely overkill for iotests, but hopefully just about correct for external users to prototype toy management scripts with.

At some point, I might try to get it checked in to the QEMU tree as an "alpha" library so that iterations on the design can be debated on their own merit instead of trying to update a giant new-code-blob. I am not sure if it's ready for that just yet, but I think it's close to that point where it needs to not live primarily in a separate repo anymore.

Thanks again,
-- John




reply via email to

[Prev in Thread] Current Thread [Next in Thread]