Junling Ma, le dim. 02 août 2020 15:32:01 -0700, a ecrit: 1. Change devicer_user_intr to be a linux interrupt handler.
Right, this makes sense, but did you actually try it? Check the existing code twice to realize why it's done that way. Here linux_intr is looking through the list of actions for an irq. You are making action->handler() call free_irq itself when the handler should be removed. But then the loop of linux_intr will have its action pointer undefined since freed by free_irq.
Right I realized that, and moved the free_irq to intr_thread. Will clean up the patch.
I agree that we should probably remove some of the "user_intr" pieces of linux_intr(). But we cannot afford removing the part that checks the value returned by the handler, otherwise it'll break.
If we move the cleaning-up-dead-notification code to intr_thread, then the code should not break.
Ok. 2. Make user_intr_t represent a specific interrupt line, so that notifications queue on each line. 3. Make struct irqdev store a vector user_intr_t, indexed by interrupt line.
? But then we can't have several userland translators using the same irq? That'll break.
Each user_intr_t has a queue of notifications. Originally, each notification was queued up on the linux side. In this patch we just move them back to the mach side and sort them by interrupt line (i.e., attach to user_intr_t). I am not sure why it would break.
To be honest, I don't know. Because your patch changes so many thing that I can't even take the time to try to track what is being changed how. What I can see easily, however, is: +typedef struct { + queue_head_t notification_queue; + unsigned interrupts; /* Number of interrupts occurred since last run of intr_thread */ + unsigned n_unacked; /* Number of times irqs were disabled for this */ + decl_simple_lock_data(, lock); /* a lock to protect the queues */ } user_intr_t; So the port is gone indeed, so it tends to say that now it's per intr instead of per-user. But then why calling it user_intr_t? Changing the role of an existing structure is really not the way for changes to be reviewable.
user_intr_t to me means a user interrupt. A notification port that is registered represents a notification. Multiple notifications may queue on a single irq. Now, in that struct there is n_unacked. But how can this be per-irq? As I mentioned elsewhere and it is kept here, it's the number of times irq where disabled for this. But that's for a given *user*, not for a given irq. If it's not per-user, when a user dies you don't know how many times you need to call __enable_irq.
Ok, I can see that in my patch n_unacked duplicates the counter used by __enable_irq. I was distracted by using a counter at all in the original code, because an irq cannot be fired more than once before it is acked. What we really need is a boolean_t type per notification,
IN fact, I was pondering on the idea that each irq* device should be opened exclusively,
Then things won't work. Shared IRQ is a thing you can't circumvent with the hardware that people have atm.
You may have missed the part where I say let user side [/dev/irq*] translator do the multiplexing. And when MSI/MSI-X is introduced, irq sharing will not be needed. I am afraid that an interrupt going through two indirect calls would be too time costly to hurt driver performance.
Interrupts *are* inherently costly. A couple software layers are really not much compared to that. Drivers are already used to have to gather notifications so as to spare interrupts.
ok. 4. The n_unacked counter only increases when successfully delivered an interrupt.
? It is meant to know how many times __disable_irq was called, and thus how many times we should call __enable_irq if the port dies.
The __disable_irq uses its own counter.
Yes, and the way you do it just duplicates it, that's nonsense. Either it's the same, or it's not. As mentioned above, it's not.µ
Yes. See the above discussion. The original way of counting means that the interrupt will be unfortunately blocked forever if a message somehow failed to deliver.
There is no such thing. Either the message gets delivered, or it does not. It can take some time to get delivered. Then other translators using the same IRQ will suffer from the situation. But that's what we *have* to do. Until the userland translator either acknowledges the irq or dies, we *have* to keep the irq disabled (and know that we have done so for the death case). We can't go around it anyway, so there is no point in trying to fix it.
The delivery could have returned an error, for example, if ikm_alloc return NULL, as checked in deliver_intr. This patch increases n_unacked only if a notification is delivered,
That would mean that the responsibility of ack-ing interrupts belongs to userland?
We already require that user land ack the interrupt by calling device_intr_ack, right? The kernel side keeps track of who acked and who hasn’t. As I understand, if a handler has not acked, then the n_unacked remain positive, and also __enable_irq will not be called, and thus the irq remains disabled, for all other handlers attached to the same irq. I am of course happy to produce a new version of the patches. But I also feel that whether we want to introduce this new scheme still needs more discussion.
So far (except for the first part of your first patch) what you have proposed really does not convince me at all as being an actual improvement over what we currently have, on the contrary.
That is what I felt from the IRC discussion yesterday, too. And that is fine if there is no interest in this patch. But still, new comes may ask the same question as I did, why would any device driver need to implement device_intr_register and device_intr_ack, when they are excluded from the scene at all (only irq device has a chance of receiving these calls). Maybe there are other, better, solutions than this patch. Maybe it is not a problem at all for others.
Junling
|