bug-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] fully enable rpctrace to trace multitask programs.


From: Da Zheng
Subject: Re: [PATCH] fully enable rpctrace to trace multitask programs.
Date: Mon, 08 Jun 2009 21:27:29 +0800
User-agent: Thunderbird 2.0.0.21 (Macintosh/20090302)

olafBuddenhagen@gmx.net wrote:
Hi,

On Sun, May 31, 2009 at 11:32:25AM +0800, Zheng Da wrote:

I rewrite rpctrace so that it can trace multitask programs.

I wonder, is this inherently a single large change, or would it somehow
be possible to split it into several distinct patches, which could be
applied individually?...

1. tracing the source and destination of the RPC requests/replies. It
is required when we move the port rights and it also allows rpctrace
to show the caller of RPCs (I think it is a nice feature).

It only shows the caller when it is also a traced task, i.e. this is
only meaningful when indeed tracing multitasked stuff, right?
I think it is still useful even when rpctrace is tracing a single task program. rpctrace can know whether a send right is to a traced task or not and thus, don't need to treat mach_port_insert_right() specially and can handle the case below properly. mach_port_allocate (mach_task_self (), MACH_PORT_RIGHT_RECEIVE, &receive_right); mach_port_extract_right (mach_task_self (), receive_right, MACH_MSG_TYPE_MAKE_SEND, &receive_right, &poly); This might be a very stupid case, but I think rpctrace should still handle it properly.
The patch can also fix some bugs reported in savannah, for example,
the traced task hangs when the it gets signals.

Are these fixes a side effect of the new functionality, or have you done
fixes explicitely? In the latter case, they should be split out into
distinct patches...
I provided a patch before to fix the bug that a traced task hangs when it gets signals. see https://savannah.gnu.org/patch/?6830 Another bug (see https://savannah.gnu.org/bugs/?26476) is fixed by the patch naturally.
1. rpctrace hangs when it traces itself. I think the reason is that
the tracer and tracee both need to set the kernel task port and thread
port.

This sounds plausible. I guess it could only be fixed by proxying the
RPCs for setting the kernel ports...

However, AIUI the original rpctrace doesn't use these -- so there must
be other problems as well?...

It seems to me that it would be useful to add some consistency checks at
various places, making sure that things are still in the state rpctrace
believes them to be.
It is certainly nice to fix this problem. I will try.
2. pfinet dies when it is traced by rpctrace. The reason is unknown.

3. gcc/mig reports "server type check failure" when it is traced by
rpctrace.

Would be useful if these could be tracked down... Even if it turns out
that the actual issues are not easily fixable.
This bug and the one above are very strange.
when rpctrace traces pfinet, it dies without any error.
The patch is below. Comments and tests are very welcome.

I don't know enough about rpctrace to do a full review... So this is
only going to be a lot of nitpicking and some strange questions :-)

diff --git a/utils/rpctrace.c b/utils/rpctrace.c
index d80f41d..eb33af7 100644
--- a/utils/rpctrace.c
+++ b/utils/rpctrace.c
@@ -39,6 +39,7 @@
 #include <stdbool.h>
 #include <stddef.h>
 #include <argz.h>
+#include <sys/mman.h>

 const char *argp_program_version = STANDARD_HURD_VERSION (rpctrace);

@@ -59,6 +60,9 @@ static const struct argp_option options[] =
   {0}
 };

+#define UNKNOWN_TASK -1
+#define UNKNOWN_NAME -1

Not a good idea to define these to an arbitrary value -- there is no
guarantee that this value won't appear in normal usage...

You could use one of the reserved values (MACH_PORT_NULL,
MACH_PORT_DEAD) perhaps, if there are no clashes to be expected with the
normal use of these values. Or you could simply allocate a dummy port
that will never be used, except as a placeholder for these values...

Note that according to the GNU Mach documentation, MACH_PORT_DEAD is
actually ~0, i.e. -1 -- so if you use MACH_PORT_DEAD, the effect will be
the same, but the overloading will be explicit then...
OK.
+
 static const char args_doc[] = "COMMAND [ARG...]";
 static const char doc[] = "Trace Mach Remote Procedure Calls.";
 
@@ -82,6 +86,49 @@ msgid_ihash_cleanup (void *element, void *arg)
 static struct hurd_ihash msgid_ihash
   = HURD_IHASH_INITIALIZER (HURD_IHASH_NO_LOCP);

+/* all traced tasks. */
+task_t *traced_tasks;
+size_t nb_traced_tasks;
+
+void
+add_task (task_t task)
+{
+  static size_t capability = 100;

"capability" is a rather confusing name... Commonly such things are
called alloc_size or something like that.

+
+  if (traced_tasks == NULL)
+    traced_tasks = malloc (sizeof (task_t) * capability);

No need to special-case this: realloc() applied on NULL is equivalent to
malloc().

Of course, this requires the start value for alloc_size to be 0, and
consequently the incrementation algorithm needs to change. (e.g.
alloc_size = alloc_size * 2 + 100)

+  else if (nb_traced_tasks == capability)
+    {
+      capability *= 2;
+      traced_tasks = realloc (traced_tasks, capability);
+    }
+  assert (traced_tasks);
+
+  traced_tasks[nb_traced_tasks++] = task;
+}

actually I don't think it's a good idea to use chunked allocation here
at all... Probably just adds unnecessary complexity. Simply realloc()
each time -- it's not *that* inefficient. Or use a proper dynamic data
structure; like a linked list, or perhaps even better an ihash.
I think this is the simplest way to handle it. The entry of the array 'traced_tasks' is just a task port. The array is also relatively static assuming that task creation and termination is relatively rare, so it is really unnecessary to use dynamic data structure, which increases the complexity and reduces the performance (thought the performance isn't an important issue in rpctrace).
+
+void
+remove_task (task_t task)
+{
+  int i,j;
+  for (i = 0, j = 0; i < nb_traced_tasks && j < nb_traced_tasks; i++, j++)
+    {
+      if (traced_tasks[j] == task)
+       j++;
+      traced_tasks[i] = traced_tasks[j];
+    }
+  nb_traced_tasks = i;
+}

Yeah, this definitely looks like a job for a dynamic data structure...
I don't really understand why glibc doesn't provide general data structure such as vector and linked list as STL or Java library does. Though Hurd provides ihash, we cannot use it as a dynamic array or linked list. They are very common structure. We should have a general implementation and include them in some library such as libshouldbeinlibc so Hurd developers don't need to implement them again and again when they need them.
+
+task_t
+get_task (int idx)
+{
+  if (idx >= nb_traced_tasks)
+    return -1;
+  assert (traced_tasks);
+  return traced_tasks[idx];
+}
+
 /* Parse a file of RPC names and message IDs as output by mig's -list
    option: "subsystem base-id routine n request-id reply-id".  Put each
    request-id value into `msgid_ihash' with the routine name as its value.  */
@@ -190,21 +237,40 @@ msgid_trace_replies (const struct msgid_info *info)
 /* We keep one of these structures for each port right we are tracing.  */
 struct traced_info
 {

I had various doubts about this structure, until I realized that the
real issue goes deeper: It is no longer appropriate to use a single
structure here, when tracing multiple tasks.

I hope you are familiar with relational databases a bit, as it's easiest
to explain using that terminology: in the original case the situation
was simple, as for every traced port, there was exactly one sender and
one receiver; one wrapper port, and one forward port. (The role
alternating between the traced task and the "outside", depending on what
end the traced task holds). There was a 1:1 relation, meaning everything
can be stored in a single table.

With multiple traced tasks, the situation is different though: we still
have exactly one receiver for each port, but any number of senders -- a
1:m relationship. You really need two tables now.

You have tried still to keep it in a single data structure, but this is
wrong, and it shows: The bits of data required for the sender part and
the receiver parts are very different. Consequently, you get lots of
fields that are only meaningful in some cases, or ever worse, have
overloaded meaning in different cases. In database-speak, the table is
heavily denormalized.

The solution is simple: Use two different data structures. One for
receiver_info, holding the forward port and other information about the
receiver; and another one for sender_info, holding the wrapper port and
other information about the sender. Many things will become much cleaner
and more obvious this way.

(I'm not quite sure what best to do about send-once wrappers, though...)

On a related note, you are calling the data structures holding receiver
info "receive wrapper"; in every message, you have a "send wrapper" and
a "receive wrapper" involved. This doesn't really make sense: the port
is only wrapped once.

The receiver-side data structure doesn't actually do any wrapping -- it
only holds information about the receiver and the forward port. The
sender side structure holds the wrapper port, so the term "wrapper" is
more appropriate here -- but still, I suggest using the term only for
the actual wrapper port structure, not for all of the sender_info
structure...

I think it would indeed be useful to do the restructuring in separate
patches, and let the actual support for multiple tasks go in last,
building on that. (It seems to me that the bulk of the changes is
actually in the restructuring...)
I thought about separating this structure but I compromised because it needs much bigger modification. If we do separate the structure, we need to have receive_info, send_info, send_once_info because send_once_info needs only a few fields.
+  /* 'pi' is only used for send right or send-once right,
+   * no task can get the send right or send-right to
+   * the wrapper for the receive right.*/
   struct port_info pi;

-  mach_port_t forward;         /* real port */
+  /* real port.
+     it is only used by send-once right and receive right. */
+  mach_port_t forward;
   mach_msg_type_name_t type;

Putting a comment above a block of several lines like this, suggests
that the comment applies to the whole block -- which is probably not
intended here?...

Use a blank line to make it clear that the comment doesn't also apply to
the second line.

Also, serious nitpicking mode: Sentences begin with a capital letter :-)
(Only exception is if the sentence starts with an identifier, in which
case it should not be capitalized of course...)
OK, I see.
-
+  /* the task who has the right.
+     it is only used by the receive right and send right */
+  task_t task;
+  /* the port name in the owner task.
+   * it is only used by the receive right. */
+  mach_port_t portname;
+
+  /* It is only used by the send right and the receive right.
+   * For the send right, it points to the corresponding receive right.
+   * For the receive right, it links to other receive rights. */
+  struct traced_info *receive_right;

Why does it need to link to other receive rights?...
I need to check all receive rights while I am looking for an untraced receive right (see seen_receive_right and discover_receive_right below). But there is a bug in seen_receive_right(). maybe that is why you have this question.
+  /* it is used to form the list of send rights for different tasks.
+   * the head is the receive right. */
+  struct traced_info *next;

Another case of problematic overloading: the head pointer should use a
distinct name from the next pointers; it's very confusing otherwise. But
this issue, like many other ones in this data structure, will go away
when having distinct structers for sender and receiver...

+                               
   char *name;                  /* null or a string describing this */

   union
   {
     struct traced_info *nextfree; /* Link when on free list.  */

-    struct                     /* For a send right wrapper.  */
+    struct                     /* For a receive right wrapper.  */
     {
       hurd_ihash_locp_t locp;  /* position in the traced_names hash table */
-    } send;
+    } receive;

This change looks very strange... But I guess it's just another artifact
from the unhappy conflation of the information in a single data
structure, having a completely different meaning than originally.

     struct                     /* For a send-once right wrapper.  */
     {
@@ -220,11 +286,70 @@ struct traced_info
 };
 #define INFO_SEND_ONCE(info) ((info)->type == MACH_MSG_TYPE_MOVE_SEND_ONCE)

+struct req_info
+{
+  boolean_t valid;
+  mach_msg_id_t req_id;
+  mach_port_t reply_port;
+  task_t from;
+  task_t to;
+  struct req_info *next;
+};
+
+static struct req_info *req_head = NULL;

Perhaps using ihash would be appropriate here as well? It would avoid
the need for custom list handling.
You might be right, but this request list usually has very few requests (it's rare that many RPC requests always come at a very short time and rpctrace cannot handle them fast enough), so a linked list is very enough.
I still think using ihash everywhere is overdo.
+
+static struct req_info *
+add_request (mach_msg_id_t req_id, mach_port_t reply_port,
+            task_t from, task_t to)
+{
+  struct req_info *req = malloc (sizeof (*req));
+  assert (req);
+  req->req_id = req_id;
+  req->from = from;
+  req->to = to;
+  req->reply_port = reply_port;
+  req->valid = TRUE;
+
+  req->next = req_head;
+  req_head = req;
+
+  return req;
+}
+
+static struct req_info *
+remove_request (mach_msg_id_t req_id, mach_port_t reply_port)
+{
+  struct req_info **prev;
+  struct req_info *req;
+
+  if (req_head == NULL)
+    return NULL;

No need to special-case this -- the loop and final check will handle
this case just fine.
right.
+
+  prev = &req_head;
+  while (*prev)
+    {
+      if ((*prev)->req_id == req_id && (*prev)->reply_port == reply_port)
+       break;
+      prev = &(*prev)->next;
+    }

I prefer FOR loops when traversing lists -- IMHO they are more elegant
and easier to read...

But I guess that's a matter of taste, so feel free to ignore this remark
:-)

+  if (*prev == NULL)
+    return NULL;
+
+  req = *prev;
+  *prev = req->next;
+  return req;
+}
+
+struct port_info *notify_pi;

Is this actually related to tracing multiple tasks?...
which one? notify_pi?
I would say, yes. It is related to the strategy that I use to trace port rights: a traced port has a receive info and multiple send wrapper. When the traced port is destroyed, we need to notify all traced tasks "have" send rights to the port. In order to do it, we need to destroy all corresponding send wrappers when 'forward' in the receive info becomes the dead name.
+/* The list of receive wrappers, but only the ones for the traced tasks. */
+struct traced_info *receive_right_list;
+static struct traced_info dummy_wrapper;
 static struct traced_info *freelist;

 struct hurd_ihash traced_names
-  = HURD_IHASH_INITIALIZER (offsetof (struct traced_info, u.send.locp));
+  = HURD_IHASH_INITIALIZER (offsetof (struct traced_info, u.receive.locp));
 struct port_class *traced_class;
+struct port_class *other_class;
 struct port_bucket *traced_bucket;
 FILE *ostream;

@@ -247,42 +372,115 @@ static void print_data (mach_msg_type_name_t type,
                        const void *data,
                        mach_msg_type_number_t nelt,
                        mach_msg_type_number_t eltsize);
+
 
 /*** Mechanics of tracing messages and interposing on ports ***/

-

As always, please avoid gratuitous whitespace changes.
OK
-/* Create a new wrapper port and do `ports_get_right' on it.  */
+/* Create a new wrapper for the receive right.
+ * The wrapper lives until the traced receive right dies. */
 static struct traced_info *
-new_send_wrapper (mach_port_t right, mach_port_t *wrapper_right)
+new_receive_wrapper (mach_port_t right, mach_port_t owner)

Perhaps putting the functions in a different order would have made the
patch less messy...

BTW, Git implements the patience diff algorithm now! The patch becomes
somewhat more readable if you generate it with git diff --patience. I
wonder why it isn't default... (And it isn't even possible to make it
default through the config file, as far as I can see :-( )
OK.
Unfortunately, the new_send_wrapper() vs. new_receive_wrapper() stuff is
still messy even with patience diff. I believe the problem is again the
fact that we have a completely different structure in the wrapping
process now, and the old send_wrapper/receive_wrapper naming simply
doesn't reflect this appropriately -- as explaind when talking about the
traced_info structure above.

(The moving of the locp from "sender" to "receiver" in the traced_info
structure probably is indeed also related to this.)

I hope that when using distinct data structures, and function names
better reflecting the new constellation, things will become clearer...

 {
   error_t err;
   struct traced_info *info;
+  mach_port_t foo;

-  /* Use a free send-once wrapper port if we have one.  */
-  if (freelist)
+  info = malloc (sizeof (*info));
+  assert (info);

Never assert() on malloc() errors!

malloc() failures are perfectly possible in normal use; they do *not*
indicate program errors. Use normal error reporting for them.

(Note that when building with NDEBUG, assert()s are not even compiled at
all.)
Maybe I should use error().
We should let the rpctrace stop if malloc() fails because the tracing cannot continue after the memory allocation failure.
+  info->forward = right;
+  info->type = MACH_MSG_TYPE_PORT_RECEIVE;
+  info->task = owner;
+  info->portname = UNKNOWN_NAME;
+  info->receive_right = NULL;
+  if (owner != UNKNOWN_TASK)
     {
-      info = freelist;
-      freelist = info->u.nextfree;
+      info->receive_right = receive_right_list;
+      receive_right_list = info;
     }
-  else
+  info->next = 0;
+  info->name = 0;
+
+  /* Request the dead-name notification,
+   * so if the receive right is destroyed,
+   * we can destroy the wrapper. */
+  err = mach_port_request_notification (mach_task_self (), right,
+                                       MACH_NOTIFY_DEAD_NAME, 1,
+                                       notify_pi->port_right,
+                                       MACH_MSG_TYPE_MAKE_SEND_ONCE, &foo);
+  assert_perror (err);

Are you sure that mach_port_request_notification() will only fail when
the program messed up, and never in normal use?...
The program might not work properly as I explained above (see the comments on notify_pi), but it depends on the traced task.
So I want rpctrace to stop and everyone knows where the problem is.
+  mach_port_deallocate (mach_task_self (), foo);
+
+  err = hurd_ihash_add (&traced_names, info->forward, info);
+  assert_perror (err);

Same here -- and in many other places too...

Admittedly, the original code does the same... Not sure what to do about
this :-(
I think I got inspired from the original code;-)
but we cannot simply ignore the error as the following tracing might not work correctly. The simplest action is stopping rpctrace and printing the error as we do now.
Maybe you can consider it as a beta version:-)
+  return info;
+}
+
+static void
+destroy_receive_wrapper (struct traced_info *info)
+{
+  struct traced_info *send_wrapper;
+  struct traced_info **prev;
+
+  mach_port_deallocate (mach_task_self (), info->forward);
+  /* Remove it from the receive right list. */
+  prev = &receive_right_list;
+  if (*prev)
     {
-      /* Create a new wrapper port that forwards to *RIGHT.  */
-      err = ports_create_port (traced_class, traced_bucket,
-                              sizeof *info, &info);
-      assert_perror (err);
-      info->name = 0;
+      while (*prev != info && *prev)
+       prev = &((*prev)->receive_right);
+      /* If we find the receive wrapper in the list. */
+      if (*prev)
+       *prev = info->receive_right;
+    }

The outer IF is pointless -- again, the loop and final IF will handle
the emply list case just fine.
right.
+
+  send_wrapper = info->next;
+  while (send_wrapper)
+    {
+      struct traced_info *next = send_wrapper->next;
+      /* we have to complete destroy the port,
+       * so the tasks who have the send right can be notified. */
+      assert (send_wrapper->pi.weakrefcnt == 1);
+      ports_port_deref_weak (send_wrapper);
+      assert (send_wrapper->pi.refcnt == 1);
+      ports_destroy_right (send_wrapper);
+      send_wrapper = next;
     }

-  info->forward = right;
-  info->type = MACH_MSG_TYPE_MOVE_SEND;
+  hurd_ihash_locp_remove (&traced_names, info->u.receive.locp);
+  free (info);
+}

-  /* Store it in the reverse-lookup hash table, so we can
-     look up this same right again to find the wrapper port.
-     The entry in the hash table holds a weak ref on INFO.  */
-  err = hurd_ihash_add (&traced_names, info->forward, info);
+/* Create a new wrapper port and do `ports_get_right' on it.
+ *
+ * The wrapper lives until there is no send right to it,
+ * or the corresponding receive wrapper is destroyed.
+ */
+static struct traced_info *
+new_send_wrapper (struct traced_info *receive, task_t task,
+                 mach_port_t *wrapper_right)
+{
+  error_t err;
+  struct traced_info *info;
+
+  /* Create a new wrapper port that forwards to *RIGHT.  */
+  err = ports_create_port (traced_class, traced_bucket,
+                          sizeof *info, &info);
   assert_perror (err);
+  info->name = 0;
+
+  assert ((info->pi.flags & PORT_HAS_SENDRIGHTS) == 0);
+  asprintf (&info->name, "  %d<--%d(pid%d)",
+           receive->forward, info->pi.port_right, (int) task2pid (task));

I don't think it's really necessary to explicitely cast to int?...
You're right.
I followed the original code.
I thought it might make the compiler complain.
+  info->forward = 0;
+  info->type = MACH_MSG_TYPE_MOVE_SEND;
+  info->task = task;
+  info->portname = UNKNOWN_NAME;
+  info->receive_right = receive;
+  info->next = receive->next;
+  receive->next = info;
+  info->u.nextfree = NULL;
+
   ports_port_ref_weak (info);
-  assert (info->u.send.locp != 0);

   *wrapper_right = ports_get_right (info);
   ports_port_deref (info);
@@ -302,6 +500,7 @@ new_send_once_wrapper (mach_port_t right,
mach_port_t *wrapper_right)

Your mail client wrapped this line (and many more like this), breaking
the patch :-(
I knew thunderbird had some problems so I used gmail directly.
It seems gmail isn't suitable for sending patches, either:-(
     {
       info = freelist;
       freelist = info->u.nextfree;
+      assert (info->pi.weakrefcnt == 0);
     }
   else
     {
@@ -314,6 +513,10 @@ new_send_once_wrapper (mach_port_t right,
mach_port_t *wrapper_right)

   info->forward = right;
   info->type = MACH_MSG_TYPE_MOVE_SEND_ONCE;
+  info->task = UNKNOWN_TASK;
+  info->portname = UNKNOWN_NAME;
+  info->receive_right = NULL;
+  info->next = NULL;

   /* Send-once rights never compare equal to any other right (even
      another send-once right), so there is no point in putting them
@@ -332,6 +535,32 @@ new_send_once_wrapper (mach_port_t right,
mach_port_t *wrapper_right)
   return info;
 }

+static void
+traced_clean (void *pi)
+{

No comment for this function?...
OK.
+  struct traced_info **prev;
+  struct traced_info *info = pi;
+
+  assert (info->type == MACH_MSG_TYPE_MOVE_SEND);
+  if (info->name)
+    free (info->name);

The condition is unnecessary -- if the pointer is NULL, free() will
handle it just fine.

+
+  if (info->receive_right)
+    {
+      /* Remove it from the send right list. */
+      prev = &info->receive_right->next;
+      if (*prev)
+       {
+         while (*prev != info && *prev)
+           prev = &((*prev)->next);
+         assert (*prev);
+         *prev = info->next;
+       }

Another case of unnecessary outer IF...

This code is actually incorrect I believe: surely the assert() should
also trigger when the list is empty?
The outer IF here is necessary because of the code "prev = &info->receive_right->next;".
If the list is empty, the code inside the inner IF cannot be executed.
+
+      info->next = NULL;
+      info->receive_right = NULL;
+    }
+}

 /* This gets called when a wrapper port has no hard refs (send rights),
    only weak refs.  The only weak ref is the one held in the reverse-lookup
@@ -339,74 +568,253 @@ new_send_once_wrapper (mach_port_t right,
mach_port_t *wrapper_right)
 static void
 traced_dropweak (void *pi)
 {
+  struct traced_info **prev;
   struct traced_info *const info = pi;

   assert (info->type == MACH_MSG_TYPE_MOVE_SEND);
-  assert (info->u.send.locp);

-  /* Remove INFO from the hash table.  */
-  hurd_ihash_locp_remove (&traced_names, info->u.send.locp);
+  free (info->name);
+  info->name = 0;
+
+  assert (info->pi.port_right);
+
+  /* Remove it from the send right list. */
+  prev = &info->receive_right->next;
+  assert (*prev);
+  if (*prev)

Here, it's even more obvious that the extra IF is pointless: it follows
right after an assert() for the same condition!

+    {
+      while (*prev != info && *prev)
+       prev = &((*prev)->next);
+      assert (*prev);
+      *prev = info->next;
+    }

Haven't I seen this code before?... ;-)

Even if it's very small, it's probably still good to put it in an extra
function -- redundancy is always problematic. Actually the problem shows
in this case, as you introduced a difference in the code, see above...
OK.
+  /* This function is called because the send wrapper doesn't have
+   * any send right. If we find the all send wrappers are gone,
+   * it means that our traced port doesn't have send right.
+   * We notify the owner of the receive right by deallocating
+   * the forward port. */

I find this comment really hard to understand... The original one in the
old code was clearer, though of course it needs to be adapted.

What you are trying to say is simply, that if this send wrapper was the
last one for the port, we give up the forward port as well, right?
yes:-)
+  if (info->receive_right->next == NULL)
+    {
+      fprintf (ostream, "no senders\n");
+      destroy_receive_wrapper (info->receive_right);
+    }
+
+  info->next = NULL;
+  info->receive_right = NULL;
+
   ports_port_deref_weak (info);
+}

-  /* Deallocate the forward port, so the real port also sees no-senders.  */
-  mach_port_deallocate (mach_task_self (), info->forward);
+boolean_t
+seen_receive_right (task_t task, mach_port_t name)
+{
+  struct traced_info *info = receive_right_list;
+  while (info)
+    {
+      if (info->task == task && info->portname == name)
+       return TRUE;
+      info = info->next;
It is a bug here. It should be
info = info->receive_right;
+    }
+  return FALSE;
+}

-  /* There are no rights to this port, so we can reuse it.
-     Add a hard ref and put INFO on the free list.  */
-  ports_port_ref (info);
+/* This function is to find the receive right for the send right 'send'
+ * among traced tasks. I assume that all receive rights are moved
+ * under the control of rpctrace.
+ *
+ * Note: 'send' shouldn't be the send right to the wrapper.
+ *
+ * Note: the traced_info returned from the function
+ * might not be the receive right in the traced tasks.
+ * */
+struct traced_info *
+discover_receive_right (mach_port_t send)
+{

I'm rather sceptical about this whole function. Is it really necessary
to do such a brute-force search for the receive right? Shouldn't we
actually always know about all receive rights in all traced tasks, as we
are tracing their task ports?
No, we cannot. A receive right is created by mach_port_allocate(), mach_port_allocate_name() or mach_reply_port(). The caller of mach_port_allocate() creates the port, but it dosn't get the receive right but the port name in the target task. The caller of mach_port_allocate_name() is the same except that the port name is decided by the caller. mach_reply_port() cannot be traced by the rpctrace at all since it's a system trap. If we don't treat the first two RPCs specially, rpctrace dosen't know a port has been created and the target task has the receive right. Of course, we can do it (understand them and get the receive right) as I did before. But it becomes unnecessary after discover_receive_right is implemented. This is a more general and elegant (I think:-) way to handle the receive right.
BTW, I don't understand the last part of the comment at all...
The receive rights of most ports are not owned by the traced tasks. Instead, they are owned by tasks that aren't traced, such as the kernel task port, the proc port, etc. I create the receive info for all of these receive rights and mark them as UNKNOWN_TASK and the code below

+  info = hurd_ihash_find (&traced_names, send);

can return these kinds of receive info.
As I said at the beginning, all tasks that aren't traced are considered as a "signal task" called UNKNOWN_TASK.
+  int i;
+  error_t err;
+  struct traced_info *info;

-  free (info->name);
-  info->name = 0;
+  info = hurd_ihash_find (&traced_names, send);
+  /* If we have seen the send right or send once right. */
+  if (info
+      /* If the receive right is in one of traced tasks,
+       * but we don't know its name
+       * (probably because the receive name has been moved),
s/receive name/receive right/
+       * we need to find it out. */

As I said above, I don't see how this could ever happen...
did you? I forget. maybe the patch is really too large:-)
as I explained in the comment, a receive right can be moved from one task to another. When a receive right is moved, I don't try to get its new port name immediately. Instead, I wait until this function is called. To be honest, it is not really necessary to trace the port name, but we can increase the performance by tracing them. If we don't trace them, we need to check all receive rights one by one (by calling if we need to search for new receive right.
+      && !(info->task != UNKNOWN_TASK
+         && info->portname == UNKNOWN_NAME))
+    return info;
+
+  for (i = 0; i < nb_traced_tasks; i++)
+    {
+      int j;
+      mach_port_t *portnames = NULL;
+      mach_msg_type_number_t nportnames = 0;
+      mach_port_type_t *porttypes = NULL;
+      mach_msg_type_number_t nporttypes = 0;
+      struct traced_info *receive_wrapper = NULL;
+      task_t task = traced_tasks[i];
+      err = mach_port_names (task, &portnames, &nportnames,
+                            &porttypes, &nporttypes);

Missing blank line between declarations and statements...
right.
+      if (err)
+       {
+         remove_task (task);
+         continue;
+       }

Is this really a good way to handle errors here? If a task unexpectedly
disappears (or otherwise becomes unaccessible) without our noticing,
doesn't that mean something went seriously wrong, and we better complain
loudly rather than just silently dropping it?...
You are right. Actually, I need to remove the task explicitly when it is terminated and complain loudly here when there is an error. I was lazy and assume the task has been terminated when mach_port_names returns error:-)
-  info->u.nextfree = freelist;
-  freelist = info;
+      for (j = 0; j < nportnames; j++)
+       {
+         mach_port_t send_right;
+         mach_msg_type_name_t type;
+         if (!(porttypes[j] & MACH_PORT_TYPE_RECEIVE) /* not a receive right */
+             || (porttypes[j] & MACH_PORT_TYPE_RECEIVE

Checking again for the same condition in the || part is redundant -- it
must be true here, or the first part would have triggered.
right.
+                 && seen_receive_right (task, portnames[j])))
+           continue;
+
+         err = mach_port_extract_right (task, portnames[j],
+                                        MACH_MSG_TYPE_MAKE_SEND,
+                                        &send_right, &type);
+         assert_perror (err);
+
+         receive_wrapper = hurd_ihash_find (&traced_names, send_right);
+         /* If we have seen this send right before. */
+         if (receive_wrapper)
+           mach_port_deallocate (mach_task_self (), send_right);
+         else
+           receive_wrapper = new_receive_wrapper (send_right, task);

Do I get it right, that this function will create forward ports for any
receive rights it finds along the way, as a side effect?

Seems questionable to me. At least it should be documented.
Yes, you are right.
The code above does have two functions. I will add the comment.
+
+         receive_wrapper->portname = portnames[j];
+
+         /* We have found the wrapper for the receive right. */
+         if (send_right == send)
+           break;
+         receive_wrapper = NULL;
+       }
+      if (portnames)
+       vm_deallocate (mach_task_self (), (vm_address_t) portnames,
+                      nportnames * sizeof (*portnames));
+      if (porttypes)
+       vm_deallocate (mach_task_self (), (vm_address_t) porttypes,
+                      nporttypes * sizeof (*porttypes));
+
+      if (receive_wrapper)
+       return receive_wrapper;
+    }
+  return NULL;
 }

+/* get_send_wrapper searches for the wrapper
+   of the send right for the target task.
+   if it doesn't exist, create a new one. */
+struct traced_info *get_send_wrapper (struct traced_info *receive_wrapper,
+                                     mach_port_t task, mach_port_t *right)
+{
+  struct traced_info *info = receive_wrapper->next;
+
+  while (info)
+    {
+      if (info->task == task)
+       {
+         *right = ports_get_right (info);
+         assert (info->pi.refcnt == 1);
+         return info;
+       }
+      info = info->next;
+    }
+  /* no send wrapper is found. */
+  return new_send_wrapper (receive_wrapper, task, right);
+}

 /* Rewrite a port right in a message with an appropriate wrapper port.  */
 static struct traced_info *
-rewrite_right (mach_port_t *right, mach_msg_type_name_t *type)
+rewrite_right (mach_port_t *right, mach_msg_type_name_t *type,
+              struct req_info *req)
 {
   error_t err;
-  struct traced_info *info;
+  struct traced_info *receive_wrapper;
+  struct traced_info *send_wrapper;
+  task_t dest = UNKNOWN_TASK;
+  task_t source = UNKNOWN_TASK;

   /* We can never do anything special with a null or dead port right.  */
   if (!MACH_PORT_VALID (*right))
     return 0;

-  switch (*type)
+  if (req)
     {
-    case MACH_MSG_TYPE_PORT_SEND:
-      /* See if we are already tracing this port.  */
-      info = hurd_ihash_find (&traced_names, *right);
-      if (info)
+      if (req->valid)    /* it's a RPC request */
        {
-         /* We are already tracing this port.  We will pass on a right
-            to our existing wrapper port.  */
-         *right = ports_get_right (info);
-         *type = MACH_MSG_TYPE_MAKE_SEND;
-         return info;
+         source = req->from;
+         dest = req->to;
+       }
+      else
+       {
+         source = req->to;
+         dest = req->from;
        }
+    }
+
+  switch (*type)
+    {
+    case MACH_MSG_TYPE_PORT_SEND:
+      assert (req);

       /* See if this is already one of our own wrapper ports.  */
-      info = ports_lookup_port (traced_bucket, *right, 0);
-      if (info)
+      send_wrapper = ports_lookup_port (traced_bucket, *right, 0);
+      if (send_wrapper)
        {
-         /* This is a send right to one of our own wrapper ports.
-            Instead, send along the original send right.  */
+         /* This is a send right to one of our own wrapper ports. */
          mach_port_deallocate (mach_task_self (), *right); /* eat msg ref */
-         *right = info->forward;
-         err = mach_port_mod_refs (mach_task_self (), *right,
-                                   MACH_PORT_RIGHT_SEND, +1);
-         assert_perror (err);
-         ports_port_deref (info);
-         return info;
+         /*  If the send right is moved to the task with the receive right,
+          * translate it.*/

"translate it" is not very specific...

Also, you haven't commented at all on the else case.
OK.
+         assert (send_wrapper->receive_right);
+         if (dest == send_wrapper->receive_right->task)
+           {
+             *right = send_wrapper->receive_right->forward;
+             err = mach_port_mod_refs (mach_task_self (), *right,
+                                       MACH_PORT_RIGHT_SEND, +1);
+             assert_perror (err);
+             ports_port_deref (send_wrapper);
+           }
+         else
+           {
+             struct traced_info *send_wrapper2
+               = get_send_wrapper (send_wrapper->receive_right, dest, right);
+             ports_port_deref (send_wrapper);
+             *type = MACH_MSG_TYPE_MAKE_SEND;
+             send_wrapper = send_wrapper2;
+           }
+         return send_wrapper;
        }

-      /* We have never seen this port before.  Create a new wrapper port
-        and replace the right in the message with a right to it.  */
-      *type = MACH_MSG_TYPE_MAKE_SEND;
-      return new_send_wrapper (*right, right);
+      receive_wrapper = discover_receive_right (*right);
+      if (receive_wrapper == NULL)
+       {
+         /* It's unusual to see an unknown send right from a traced task.
+          * We ignore it. */
+         if (source != UNKNOWN_TASK)
+           {
+             error (0, 0, "get an unknown send right from process %d",
+                    (int) task2pid (source));
+             return &dummy_wrapper;
+           }

"Unusual"? What does that mean?...

Isn't this something that should never happen; and if it does, something
must be seriously screwed? In that case, is it even useful to go on at
all?...
Here is one example. see https://savannah.gnu.org/bugs/?26476
I think the best way is to ignore the case. Thus, the program can still work correctly even though we might not be able to trace some RPCs related to it.
+         /* the receive right is owned by an unknown task. */
+         receive_wrapper = new_receive_wrapper (*right, UNKNOWN_TASK);
+         mach_port_mod_refs (mach_task_self (), *right,
+                             MACH_PORT_RIGHT_SEND, 1);
+       }
+      /* if the send right is moved to the task with the receive right,
+       * don't do anything. */

Again, you didn't explain the else case anywhere...
I will add comment.
+      if (dest == receive_wrapper->task)
+       return receive_wrapper;
+      else
+       {
+         assert (*right == receive_wrapper->forward);
+         mach_port_deallocate (mach_task_self (), *right);
+         send_wrapper = get_send_wrapper (receive_wrapper, dest, right);
+         *type = MACH_MSG_TYPE_MAKE_SEND;
+         return send_wrapper;
+       }

     case MACH_MSG_TYPE_PORT_SEND_ONCE:
       /* There is no way to know if this send-once right is to the same
@@ -429,49 +837,86 @@ rewrite_right (mach_port_t *right,
mach_msg_type_name_t *type)
         to A that were sent through and replaced with our wrapper (B).
         If not, we create a new receive right.  */

This comment needs to be rewritten... While the basic idea still holds,
the way it is, it does not really fit the situation with multiple tasks.

The explanation about receive rights from the beginning of your mail
would fit here, although it took me a while to understand it... (And the
explanation about send rights would fit for the send rights case above.)
right.
       {
-       mach_port_t rr;         /* B */
-       char *name;
-
-       info = hurd_ihash_find (&traced_names, *right);
-       if (info)
+       assert (req);
+       receive_wrapper = hurd_ihash_find (&traced_names, *right);
+       if (receive_wrapper)

Avoid using different variable names when changing code, unless the old
name really does not make sense anymore... Such renames make the patch
larger and harder to follow.

If you think that renaming is useful, you can do that in an extra
cleanup patch. But most of the time, it's better to avoid it
alltogether, as it breaks diffing between versions before and after the
rename. (And many operations that depend on diffing, like cherry-picking
for example.)

This actually applies not only to variable names, but also other kinds
of non-functional changes...
The original variable name 'info' cannot fit the code any more because the code now needs a send wrapper and a receive "wrapper". If I still keep 'info', which wrapper should be named 'info'? I think it might cause more confusion.
          {
-           /* This is a receive right that we have been tracing sends to.  */

You dropped the old comment, but haven't added a new one for this code
block at all...
I didn't just drop one comment. I dropped the comment and the code below it.
The new code is explained by other comments.
-           name = info->name;
-           rr = ports_claim_right (info);
-           /* That released the refs on INFO, so it's been freed now.  */
+           struct traced_info *send_wrapper2;
+           char *name;
+           mach_port_t rr;

Why did you move the declarations of these two variables down here?
These variables are only used in this IF clause.
Looks like another change that could go in an extra cleanup patch, but
better should not go into a patch that does non-trivial changes...
I don't understand what you mean here.
+
+           /* The port has at least one send right - the one
+            * in receive_wrapper->forward.
+            * We wrap the port.
+            * if the source task doesn't have the send right, the port
+            * will be destroyed after we deallocate the only send right. */

I don't see what this comment refers to...
It refers to the code starting from here to "ports_port_deref (send_wrapper)" below.
I don't know how to comment this part of code properly.
+           /* We have to deallocate the send right in
+            * receive_wrapper->forward before we import the port to
+            * port_info. So the reference count in the port info will be 1,
+            * if it doesn't have any other send rights. */
+           mach_port_deallocate (mach_task_self (), receive_wrapper->forward);

This comment is very hard to understand; I don't think it is helpful at
all :-(

I can only *guess*, from my understanding of the code, that what you are
trying to say is: the task just gave up the receive right, so we won't
be forwarding to it anymore, and thus need to drop the send right for
the forward port.

+           err = ports_import_port (traced_class, traced_bucket,
+                                    *right, sizeof *send_wrapper,
+                                    &send_wrapper);
+           assert_perror (err);
It comments these three lines (mainly on mach_port_deallocate and ports_import_port).
I should have put an extra blank line, which might make the comment clearer.
If 'forward' has the send right before ports_import_port() is called, the reference count of the new created port info is 2. If I don't call mach_port_deallocate before ports_import_port, the reference count is always 2 because 'forward' always hold the send right to the port, and ports_port_deref will not be able to destroy the port info if the source task doesn't have the send right to the port.
+           send_wrapper->forward = 0;
+           send_wrapper->type = MACH_MSG_TYPE_MOVE_SEND;
+           send_wrapper->task = source;
+           send_wrapper->portname = UNKNOWN_NAME;
+           send_wrapper->name = receive_wrapper->name;
+           /* Initialize them in case that the source task doesn't
+            * have the send right to the port, and the port will
+            * be destroyed immediately. */
+           send_wrapper->receive_right = NULL;
+           send_wrapper->next = NULL;
+           ports_port_ref_weak (send_wrapper);
+           ports_port_deref (send_wrapper);
+
+           hurd_ihash_locp_remove (&traced_names, 
receive_wrapper->u.receive.locp);
+
+           send_wrapper2 = get_send_wrapper (receive_wrapper, dest, &rr);
+           assert (send_wrapper2->pi.refcnt == 1);
+           name = send_wrapper2->name;
+           send_wrapper2->name = NULL;
+           /* The port has been removed from the send wrapper,
+            * so we cannot put it in the free list. */

I don't understand this comment at all :-(
The original rpctrace recycles send wrappers that lose all send rights by increasing the weak reference count of port info and adding the send wrapper in the free list when the reference count becomes 0. However, send_wrapper2 cannot be recycled because its port will be removed from the send wrapper by ports_claim_right(). So I need to decrease its weak reference count with ports_port_deref_weak before calling ports_claim_right.
+           ports_port_deref_weak (send_wrapper2);
+           rr = ports_claim_right (send_wrapper2);
+           /* Get us a send right that we will forward on.  */
+           err = mach_port_insert_right (mach_task_self (), rr, rr,
+                                         MACH_MSG_TYPE_MAKE_SEND);
+           assert_perror (err);
+           receive_wrapper->forward = rr;
+           receive_wrapper->task = dest;
+           if (dest != UNKNOWN_TASK)
+             {
+               receive_wrapper->receive_right = receive_right_list;
+               receive_right_list = receive_wrapper;
+             }
+           /* the port name will be discovered
+            * when we search for this receive right. */
+           receive_wrapper->portname = UNKNOWN_NAME;
+           receive_wrapper->name = name;
+
+           send_wrapper->receive_right = receive_wrapper;
+           send_wrapper->next = receive_wrapper->next;
+           receive_wrapper->next = send_wrapper;
+
+           err = hurd_ihash_add (&traced_names, receive_wrapper->forward,
+                                 receive_wrapper);
+           assert_perror (err);
+           *right = rr;
          }
        else
          {
-           /* This is a port we know nothing about.  */
-           rr = mach_reply_port ();
-           name = 0;
+           /* weird? no send right for the port. */

Again, is that something that could happen in normal operation?...
It might happen, but I bet no one really do it.
It can happen when someone creates a port and moves it to another task before getting the send right to it.
+           err = mach_port_insert_right (mach_task_self (), *right, *right,
+                                         MACH_MSG_TYPE_MAKE_SEND);
+           assert_perror (err);
+           receive_wrapper = new_receive_wrapper (*right, dest);
          }

-       /* Create a new wrapper object that receives on this port.  */
-       err = ports_import_port (traced_class, traced_bucket,
-                                *right, sizeof *info, &info);
-       assert_perror (err);
-       info->name = name;
-       info->type = MACH_MSG_TYPE_MOVE_SEND; /* XXX ? */
-
-       /* Get us a send right that we will forward on.  */
-       err = mach_port_insert_right (mach_task_self (), rr, rr,
-                                     MACH_MSG_TYPE_MAKE_SEND);
-       assert_perror (err);
-       info->forward = rr;
-
-       err = hurd_ihash_add (&traced_names, info->forward, info);
-       assert_perror (err);
-       ports_port_ref_weak (info);
-
-       /* If there are no extant send rights to this port, then INFO will
-          die right here and release its send right to RR.
-          XXX what to do?
-       */
-       ports_port_deref (info);
-
-       *right = rr;
-       return info;
+       return receive_wrapper;
       }

     default:
@@ -482,7 +927,7 @@ rewrite_right (mach_port_t *right,
mach_msg_type_name_t *type)

 static void
 print_contents (mach_msg_header_t *inp,
-               void *msg_buf_ptr)
+               void *msg_buf_ptr, struct req_info *req)
 {
   error_t err;

@@ -523,7 +968,7 @@ print_contents (mach_msg_header_t *inp,
       else
        msg_buf_ptr += ((nelt * eltsize + sizeof(natural_t) - 1)
                        & ~(sizeof(natural_t) - 1));
-
+

Something very strage must have happened here... Perhaps you
inadvertantly added a space, but your mail client ate it when sending
the patch?
You are right. I will modify the code to avoid the problem.
       if (first)
        first = 0;
       else
@@ -549,20 +994,58 @@ print_contents (mach_msg_header_t *inp,
          poly = 0;
          for (i = 0; i < nelt; ++i)
            {
+             mach_port_t orig_port = portnames[i];
              newtypes[i] = name;

-             if (inp->msgh_id == 3215) /* mach_port_insert_right */
+             ti = rewrite_right (&portnames[i], &newtypes[i], req);
+
+             if (inp->msgh_id == 2161 /* the reply message for thread_create */
+                 || inp->msgh_id == 2107 /* for task_create */)

I don't like constructs of the form "if (foo == a || foo == b) { if (foo
== a) {} else {} }"... It's much clearer to use "switch (foo) {}" or "if
(foo == a) {} else if (foo == b) {}" IMHO.

But I guess this is also a matter of taste...
if (foo == a || foo == b) was necessary because handling thread_create and task_create share much the same code.
You can see it https://savannah.gnu.org/patch/?6830
However, those code disappear after I make it trace multitask programs, but this IF isn't removed.
                {
-                 /* XXX
-                  */
-                 fprintf (ostream,
-                          "\t\t[%d] = pass through port %d, type %d\n",
-                          i, portnames[i], name);
-                 continue;
+                 if (inp->msgh_id == 2161)
+                   {
+                     err = mach_port_insert_right (mach_task_self (),
+                                                   portnames[i], portnames[i],
+                                                   MACH_MSG_TYPE_MAKE_SEND);
+                     assert_perror (err);
+                     err = thread_set_kernel_port (orig_port, portnames[i]);
+                     assert_perror (err);
+                     if (ti->name)
+                       free (ti->name);
+                     asprintf (&ti->name, "thread%d(pid%d)",
+                               (int) orig_port, (int) task2pid (req->from));
+                   }
+                 else
+                   {
+                     pid_t pid;
+                     struct traced_info *receive_wrapper;
+                     struct traced_info *task_wrapper;
+                     mach_port_t pseudo_task;
+                     add_task (orig_port);

Missing blank line...
OK.
+                     receive_wrapper = hurd_ihash_find (&traced_names, 
orig_port);
+                     assert (receive_wrapper == ti->receive_right);
+                     task_wrapper = new_send_wrapper (receive_wrapper,
+                                                      orig_port, &pseudo_task);
+                     err = mach_port_insert_right (mach_task_self (),
+                                                   pseudo_task, pseudo_task,
+                                                   MACH_MSG_TYPE_MAKE_SEND);
+                     assert_perror (err);
+                     err = task_set_kernel_port (orig_port, pseudo_task);
+                     assert_perror (err);
+
+                     pid = task2pid (orig_port);
+                     if (ti->name)
+                       free (ti->name);

Unnecessary IF...
OK.
+                     asprintf (&ti->name, "pid%d(pid%d)",
+                               (int) pid, (int) task2pid (req->from));
+                     free (task_wrapper->name);
+                     asprintf (&task_wrapper->name, "pid%d(pid%d)",
+                               (int) pid, (int) pid);
+                   }
+
+                 mach_port_deallocate (mach_task_self (), portnames[i]);
                }

-             ti = rewrite_right (&portnames[i], &newtypes[i]);
-
              putc ((i == 0 && nelt > 1) ? '{' : ' ', ostream);

              if (portnames[i] == MACH_PORT_NULL)
@@ -632,19 +1115,24 @@ print_contents (mach_msg_header_t *inp,
                type->msgt_name = name;
            }
          else if (newtypes[0] != name)
-           if (type->msgt_longform)
-             lt->msgtl_name = newtypes[0];
-           else
-             type->msgt_name = newtypes[0];
+           {
+             if (type->msgt_longform)
+               lt->msgtl_name = newtypes[0];
+             else
+               type->msgt_name = newtypes[0];
+           }

Putting it in braces is definitely preferable, and in fact recommended
by the GNU Coding Standards... However, like all cleanups, this should
go in an extra patch -- it is totally unrelated to the other changes.
I just wanted to satisfy GCC.
OK. I will put the original code.
        }
       else
        print_data (name, data, nelt, eltsize);
+      fflush (ostream);

I guess this is not related to tracing multiple tasks, but rather a
general fix?...
It isn't even a fix.
I just simplify the debugging.
I can see what has been received by rpctrace by flushing the stream.
I can remove it, but it can definitely help the further development of rpctrace.
     }
 }

 int
 trace_and_forward (mach_msg_header_t *inp, mach_msg_header_t *outp)
 {
+  mach_port_t reply_port;
+
   const mach_msg_type_t RetCodeType =
   {
     MACH_MSG_TYPE_INTEGER_32,  /* msgt_name = */
@@ -671,19 +1159,22 @@ trace_and_forward (mach_msg_header_t *inp,
mach_msg_header_t *outp)
      with a send-once right, even if there have never really been any.  */
   if (MACH_MSGH_BITS_LOCAL (inp->msgh_bits) == MACH_MSG_TYPE_MOVE_SEND_ONCE)
     {
-      if (inp->msgh_id == MACH_NOTIFY_DEAD_NAME)
+      if (inp->msgh_id == MACH_NOTIFY_DEAD_NAME && info == (void *) notify_pi)
        {
-         /* If INFO is a send-once wrapper, this could be a forged
-            notification; oh well.  XXX */
-
+         struct traced_info *receive_wrapper;
          const mach_dead_name_notification_t *const n = (void *) inp;

-         assert (n->not_port == info->forward);
          /* Deallocate extra ref allocated by the notification.  */
          mach_port_deallocate (mach_task_self (), n->not_port);
-         ports_destroy_right (info);
-         ports_port_deref (info);
+         receive_wrapper = hurd_ihash_find (&traced_names, n->not_port);
+         /* The receive wrapper might have been destroyed. */

How?
When all corresponding send wrappers are destroyed, the receive info will be destroyed.
However, the dead-name notification isn't canceled.
Also, this comment is rather confusing: it looks like it would explain
the purpose of the whole IF block, while actually it only explains why
the block needs to be conditional...
Should I say "The receive wrapper hasn't been destroyed"?
+         if (receive_wrapper)
+           {
+             assert (n->not_port == receive_wrapper->forward);
+             destroy_receive_wrapper (receive_wrapper);
+           }
          ((mig_reply_header_t *) outp)->RetCode = MIG_NO_REPLY;
+         ports_port_deref (info);
          return 1;
        }
       else if (inp->msgh_id == MACH_NOTIFY_NO_SENDERS
@@ -698,43 +1189,109 @@ trace_and_forward (mach_msg_header_t *inp,
mach_msg_header_t *outp)
          ((mig_reply_header_t *) outp)->RetCode = MIG_NO_REPLY;
          return 1;
        }
+      /* Get some unexpected notification for rpctrace itself,
+       * TODO ignore them for now. */
+      else if (info == (void *) notify_pi)
+       {
+         ports_port_deref (info);
+         ((mig_reply_header_t *) outp)->RetCode = MIG_NO_REPLY;
+         return 1;
+       }
     }

+  assert (info != (void *) notify_pi);
   assert (MACH_MSGH_BITS_LOCAL (inp->msgh_bits) == info->type);

   complex = inp->msgh_bits & MACH_MSGH_BITS_COMPLEX;

   msgid = msgid_info (inp->msgh_id);

+  /* If it's the request of exec_startup_get_info,
+   * it means that the traced process starts to run */

What's the purpose of this? I thought all task/thread creation is
intercepted anyways?...
No. The first thread is created by exec server, which cannot be traced by rpctrace. So we have to find this first thread. The best way I can find is observing exec_startup_get_info, which is called when the process starts to run. Of course, it has drawback. If the traced program doesn't use glibc (which I believe is really rare), it cannot work.
+  if (inp->msgh_id == 30500)
+    {
+      struct traced_info *thread_send_wrapper;
+      struct traced_info *thread_receive_wrapper;
+      thread_t *threads = NULL;
+      size_t nthreads = 0;
+      err = task_threads (get_task(0), &threads, &nthreads);

Missing blank line...
OK
+      if (err)
+       error (1, err, "task_threads");
+
+      for (int i = 0; i < nthreads; i++)
+       {
+         /* If we hasn't seen the port. */

"Haven't seen the port yet."

Also, it's probably clearer to put the comment immediately before or
even after the IF...
OK.
+         thread_receive_wrapper = hurd_ihash_find (&traced_names, threads[i]);
+         if (thread_receive_wrapper == NULL)
+           {
+             mach_port_t new_thread_port;
+             thread_receive_wrapper = new_receive_wrapper (threads[i], 
UNKNOWN_TASK);
+             thread_send_wrapper = new_send_wrapper (thread_receive_wrapper,
+                                                     info->task, 
&new_thread_port);
+             free (thread_send_wrapper->name);
+             asprintf (&thread_send_wrapper->name, "thread%d(pid%d)",
+                       (int) threads[i], (int) task2pid (info->task));
+       
+             err = mach_port_insert_right (mach_task_self (),
+                                           new_thread_port, new_thread_port,
+                                           MACH_MSG_TYPE_MAKE_SEND);
+             if (err)
+               error (0, err, "mach_port_insert_right");

Is it really useful to go on after this operation fails?...
No. I will let the whole program stops.
+
+             err = thread_set_kernel_port (threads[i], new_thread_port);
+             if (err)
+               error (0, err, "thread_set_kernel_port");
+
+             mach_port_deallocate (mach_task_self (), new_thread_port);
+           }
+       }
+      munmap (threads, nthreads * sizeof (thread_t));

Using vm_deallocate() is clearer.
I see proc use munmap.
I agree with you, it's clearer to use vm_deallocate since both task_threads and vm_deallocate are Mach RPCs.
But there isn't big difference.
+    }
+
   /* Swap the header data like a crossover cable. */
   {
     mach_msg_type_name_t this_type = MACH_MSGH_BITS_LOCAL (inp->msgh_bits);
     mach_msg_type_name_t reply_type = MACH_MSGH_BITS_REMOTE (inp->msgh_bits);
+
+    /* Save the original reply port in the RPC request. */
+    reply_port = inp->msgh_remote_port;

     inp->msgh_local_port = inp->msgh_remote_port;
     if (reply_type && msgid_trace_replies (msgid))
       {
-       struct traced_info *info;
-       info = rewrite_right (&inp->msgh_local_port, &reply_type);
-       assert (info);
-       if (info->name == 0)
-         {
-           if (msgid == 0)
-             asprintf (&info->name, "reply(%u:%u)",
-                       (unsigned int) info->pi.port_right,
-                       (unsigned int) inp->msgh_id);
-           else
-             asprintf (&info->name, "reply(%u:%s)",
-                       (unsigned int) info->pi.port_right, msgid->name);
-         }
-       if (info->type == MACH_MSG_TYPE_MOVE_SEND_ONCE)
+       struct traced_info *reply_pi;

Another gratuitous rename I believe?...
It is. I was trying to debug this part of code. I could see the code clearer by renaming the variable here.
+       /* It must be send-once right. */
+       reply_pi = rewrite_right (&inp->msgh_local_port, &reply_type, NULL);
+       /* The reply port might be dead, e.g., the traced task has died.
+        * If so, we don't need to trace the reply message. */
+       if (reply_pi)
          {
-           info->u.send_once.sent_to = info->pi.port_right;
-           info->u.send_once.sent_msgid = inp->msgh_id;
+           assert (inp->msgh_local_port);
+           if (reply_pi->name == 0)
+             {
+               if (msgid == 0)
+                 asprintf (&reply_pi->name, "reply(%u:%u)",
+                           (unsigned int) reply_pi->pi.port_right,
+                           (unsigned int) inp->msgh_id);
+               else
+                 asprintf (&reply_pi->name, "reply(%u:%s)",
+                           (unsigned int) reply_pi->pi.port_right, 
msgid->name);
+             }
+           if (reply_pi->type == MACH_MSG_TYPE_MOVE_SEND_ONCE)
+             {
+               reply_pi->u.send_once.sent_to = reply_pi->pi.port_right;
+               reply_pi->u.send_once.sent_msgid = inp->msgh_id;
+             }
          }
       }

-    inp->msgh_remote_port = info->forward;
+    if (info->type == MACH_MSG_TYPE_MOVE_SEND_ONCE)
+      inp->msgh_remote_port = info->forward;
+    else
+      {
+       assert (info->receive_right);
+       inp->msgh_remote_port = info->receive_right->forward;
+      }
     if (this_type == MACH_MSG_TYPE_MOVE_SEND_ONCE)
       {
        /* We have a message to forward for a send-once wrapper object.
@@ -744,6 +1301,7 @@ trace_and_forward (mach_msg_header_t *inp,
mach_msg_header_t *outp)
           we are consuming its `forward' right in the message we send.  */
        free (info->name);
        info->name = 0;
+       info->forward = 0;

Is this related to tracing multiple tasks?
It might be unnecessary. But I just wanted to reset it to avoid some possible error.
Yes, you can think it as cleanup or something like that.
        info->u.nextfree = freelist;
        freelist = info;
       }
@@ -762,22 +1320,57 @@ trace_and_forward (mach_msg_header_t *inp,
mach_msg_header_t *outp)
          && info->type == MACH_MSG_TYPE_MOVE_SEND_ONCE
          && inp->msgh_size >= sizeof (mig_reply_header_t)
          && (*(int *) &((mig_reply_header_t *) inp)->RetCodeType
-             == *(int *)&RetCodeType))
+             == *(int *)&RetCodeType)
+         /* The notification message is considered as a request. */
+         && (inp->msgh_id > 72 || inp->msgh_id < 64))
        {
+         struct req_info *req = remove_request (inp->msgh_id - 100,
+                                                inp->msgh_remote_port);
+         assert (req);
+         req->valid = FALSE;
          /* This sure looks like an RPC reply message.  */
          mig_reply_header_t *rh = (void *) inp;
          print_reply_header (info, rh);
          putc (' ', ostream);
-         print_contents (&rh->Head, rh + 1);
+         fflush (ostream);
+         print_contents (&rh->Head, rh + 1, req);
          putc ('\n', ostream);
+         free (req);
        }
       else
        {
+         task_t to = 0;
+         struct req_info *req;
+
          /* Print something about the message header.  */
          print_request_header (info, inp);
-         print_contents (inp, inp + 1);
+         fflush (ostream);
+         /* If it's a nofication message,
+          * I hope there is no port right in the message body. */

"hope"?...

Isn't this comment redundant anyways with the one below?
You are right.
I wrote the comment above first, and then realized that notification message can have port right but forgot to change the comment.
+         if (inp->msgh_id <= 72 && inp->msgh_id >= 64)
+           {
+             assert (info->type == MACH_MSG_TYPE_MOVE_SEND_ONCE);
+             /* mach_notify_port_destroyed message has a port,
+              * TODO how do I handle it? */
+             assert (inp->msgh_id != 69);
+           }
+         /* if it's mach_port RPC,
+          * the port rights in the message will be moved to the target task. */
+         else if (inp->msgh_id >= 3200 && inp->msgh_id <= 3218)
+           to = info->receive_right->forward;
+         else
+           to = info->receive_right->task;
+         req = add_request (inp->msgh_id, reply_port, info->task, to);
+         print_contents (inp, inp + 1, req);
          if (inp->msgh_local_port == MACH_PORT_NULL) /* simpleroutine */
-           fprintf (ostream, ");\n");
+           {
+             /* If it's a simpleroutine,
+              * we don't need the request information any more. */
+             req = remove_request (inp->msgh_id, reply_port);
+             assert (req);
+             free (req);
+             fprintf (ostream, ");\n");
+           }
          else
            /* Leave a partial line that will be finished later.  */
            fprintf (ostream, ")");
@@ -1014,8 +1607,6 @@ print_data (mach_msg_type_name_t type,
 
 /*** Main program and child startup ***/

-task_t traced_task;
-

 /* Run a child and have it do more or else `execvpe (argv, envp);'.  */
 pid_t
@@ -1024,7 +1615,9 @@ traced_spawn (char **argv, char **envp)
   error_t err;
   pid_t pid;
   mach_port_t task_wrapper;
+  task_t traced_task;
   struct traced_info *ti;
+  struct traced_info *receive_ti;
   file_t file = file_name_path_lookup (argv[0], getenv ("PATH"),
                                       O_EXEC, 0, 0);

@@ -1038,6 +1631,7 @@ traced_spawn (char **argv, char **envp)
                     0, &traced_task);
   assert_perror (err);

+  add_task (traced_task);
   /* Declare the new task to be our child.  This is what a fork does.  */
   err = proc_child (getproc (), traced_task);
   if (err)
@@ -1046,9 +1640,12 @@ traced_spawn (char **argv, char **envp)
   if (pid < 0)
     error (2, errno, "task2pid");

+  receive_ti = new_receive_wrapper (traced_task, UNKNOWN_TASK);
   /* Create a trace wrapper for the task port.  */
-  ti = new_send_wrapper (traced_task, &task_wrapper);/* consumes ref */
-  asprintf (&ti->name, "task%d", (int) pid);
+  ti = new_send_wrapper (receive_ti, traced_task, &task_wrapper);

Why did you drop the original comment from this line?...
What comment is dropped?
+  ti->task = traced_task;
+  free (ti->name);
+  asprintf (&ti->name, "pid%d(pid%d)", (int) pid, (int) pid);

   /* Replace the task's kernel port with the wrapper.  When this task calls
      `mach_task_self ()', it will get our wrapper send right instead of its
@@ -1124,6 +1721,7 @@ main (int argc, char **argv, char **envp)
   bool nostdinc = FALSE;
   const char *outfile = 0;
   char **cmd_argv = 0;
+  error_t err;

   /* Parse our options...  */
   error_t parse_opt (int key, char *arg, struct argp_state *state)
@@ -1196,7 +1794,11 @@ main (int argc, char **argv, char **envp)
   setlinebuf (ostream);

   traced_bucket = ports_create_bucket ();
-  traced_class = ports_create_class (0, &traced_dropweak);
+  traced_class = ports_create_class (&traced_clean, &traced_dropweak);
+  other_class = ports_create_class (0, 0);
+  err = ports_create_port (other_class, traced_bucket,
+                          sizeof (*notify_pi), &notify_pi);
+  assert_perror (err);

   hurd_ihash_set_cleanup (&msgid_ihash, msgid_ihash_cleanup, 0);

@@ -1222,6 +1824,8 @@ main (int argc, char **argv, char **envp)
     else
       fprintf (ostream, "Child %d %s\n", pid, strsignal (WTERMSIG (status)));
   }
+
+  ports_destroy_right (notify_pi);

   return 0;
 }

BTW, have you checked that no changes are required to the
documentation?...
The document in where?
The only document I can find is http://www.gnu.org/software/hurd/hurd/debugging/rpctrace.html, which actually isn't document at all.
and I didn't touch its options.

It is a very big patch. It took me almost the whole afternoon to reply it.
But it seems impossible to split it because the modifications are closely related.

The most important comment as far as I remember (I have forgotten many of them:-) is separating the traced_info structure.
But more modification is needed in order to separate it.
The trouble is that receive "wrapper" doesn't need port_info but send wrapper does, and port_info has to be at the beginning of the send wrapper and send_once wrapper.
Maybe I can define a common structure traced_info, which is like:
struct traced_info
{
   struct port_info pi;
   char *name;
   mach_msg_type_name_t type;
};

receive_info, send_info and send_once_info each define their own specific fields and contain traced_info at the beginning of the structure. The only weird thing is port_info won't be used by receive_info, but I guess it is acceptable to reduce the code changing.

Zheng Da




reply via email to

[Prev in Thread] Current Thread [Next in Thread]