RFC: kernel trace facility

I've had so many problems with rpctrace that I'm starting work on a kernel trace facility, and I think the consensus on this list is that we need something like this.

Here's the idea.

Add a trace port as a new task special port. The same RPCs that get/set task ports and exception ports will be used to get/set a task's trace port. Task ports will be inherited from the parent on task creation, so children of traced tasks will run traced by default.

All messages sent and received by the task will be copied to the trace port, in the same format seen by the task (i,e, the traced task's port names will be used), prepended by a message header that will identify the traced task, indicate if the message was sent or received, and include the return value of the mach_msg() call, to indicate truncated messages using MACH_RCV_TOO_LARGE. Out-of-line memory will not be copied. The traced message will include the address of the out-of-line memory in the traced task's memory space, but will not include the out-of-line memory. In short, the message will be copied verbatim as seen by the traced task.

Format of the trace messages:

task_t task;

boolean_t send-or-receive;

unsigned int type;

kernel_return_t retval;

byte[] message;

'task' is a send right in the port space of the message recipient, which means that any task receiving trace messages will be getting send rights to task ports, but since you need such a send right to request the messages in the first place, I think that's OK. I might wrap the two booleans into a single integer. 'send-or-receive' is obvious, and maybe should get wrapped into 'type' as a flag bit. 'type' indicates if we're tracing a message that the task itself exchanged, or a message exchanged on one of its special ports. Possible types: 'normal', 'task', 'thread', 'task-exception', 'thread-exception'.

Since it's a debugging tool... send timeouts will trigger delivery of a trace message with MACH_SEND_TIMED_OUT in the trace header. Most error returns from mach_msg() will trigger a traced message indicating the error.

No facility will be provided to edit or block the delivery of messages. However, the trace operation (and thus mach_msg) will block and wait if the task port's queue is full.

Resource shortages in the kernel will cause trace messages to be quietly dropped, with nothing more than a printf() to the console.

All syscalls will check current_task()'s trace port. If it's not IP_NULL, return SEND_INTERRUPTED to force an actual RPC message to be generated.

New routines:

ipc_kmsg_trace_copy() will be passed a kmsg, will ikm_alloc a new kmsg with enough space for the trace header and the old message, will copy the old kmsg into the new one, and return the new one. It's expected to be called right after ipc_kmsg_get(), and right before ipc_kmsg_put().

ipc_kmsg_trace_send() will be passed a trace kmsg and a return code. The return code will be inserted into the trace kmsg and the message will be queued to the task's trace port.

mach_msg() and friends will call ipc_kmsg_trace_copy() right after ipc_kmsg_get() and right before ipc_kmsg_put(). In the get case, we'll wait until the message has been processed a bit to figure out what return code should be associated with then, then call ipc_kmsg_trace_send(). In the put case, we're about to return to user space, so we pretty know what our return code is and can call trace_send() right after trace_copy().

ipc_kobject_server() will check its destination to see if it's a task port or a thread port. If so, it will call the ipc_kmsg_trace() routines for both the request and the reply. This ensures that that we'll also see messages targeted at the task's control ports, even if they come from another task. They won't be in the same format, however. By the time ipc_kobject_server() runs, the port rights have been translated into kernel pointers, and that's the format the trace will receive. Since a message like vm_map() might include a send right that only exists in some other task's port space, it doesn't seem like there's too much of an alternative. This will leak some kernel addresses, but I don't think that's too serious, as there's nothing useful the receive can do with them, baring some kind of Meltdown-type memory leakage bug, but the existence of such a bug is a separate issue.

Exceptions are also traced.

How to identify threads? Maybe add an extra header field to the trace message to indicate which thread a 'thread' or 'thread-exception' message is localized to.

Comments?

agape

brent

From:	Brent W. Baccala
Subject:	RFC: kernel trace facility
Date:	Fri, 9 Mar 2018 17:52:54 -0500