Discussion:
external events api
Roderic Morris
2011-07-25 00:04:24 UTC
Permalink
Hey everyone,

while looking into the deadlock detection issue with a friend, we noticed the external events api in the vm and rts/external-events package. It seems that the code that checks for deadlocks checks if there are any external events being waited on if every thread is blocked. Why don't we change the posix package's wait implementation (and anything else appropriate in there) to use that mechanism?

If that sounds like a good idea, then I just want to ask if anyone knows the right way to use it. It's not documented as far as I see, but looking at its usage in the address package (see c/net/address.c and scheme/net/address.scm), it seems that we need to first create an external event uuid with s48_external_event_uid(), call wait-for-external-event on it, then use s48_note_external_event() in whatever thread the event is completed on, and finally release the uid when the original thread wakes up with s48_unregister_external_event_uid(). If that's right, then I wonder if there are equivalents to those s48_* c functions in some scheme package? It looks like it'd be more straight forward to just create and unregister the uids in scheme.

-Roderic
Marcus Crestani
2011-07-25 05:49:01 UTC
Permalink
RM> It's not documented as far as I see,

There is a section "External events" in the development version of the
documentation that describes how to use external events, see
doc/src/external.tex.

RM> but looking at its usage in the address package (see c/net/address.c
RM> and scheme/net/address.scm), it seems that we need to first create
RM> an external event uuid with s48_external_event_uid(), call
RM> wait-for-external-event on it, then use s48_note_external_event() in
RM> whatever thread the event is completed on, and finally release the
RM> uid when the original thread wakes up with
RM> s48_unregister_external_event_uid().

That sounds correct.

RM> If that's right, then I wonder if there are equivalents to those
RM> s48_* c functions in some scheme package? It looks like it'd be more
RM> straight forward to just create and unregister the uids in scheme.

Right, since the uids have to be shared between Scheme and C anyway,
they could also be created in Scheme and then exported to C. This
functionality is currently only in the VM, though.
--
Marcus
Roderic Morris
2011-08-03 02:40:52 UTC
Permalink
Post by Marcus Crestani
There is a section "External events" in the development version of the
documentation that describes how to use external events, see
doc/src/external.tex.
Ahh, thanks, I was looking in 1.8's manual.
Post by Marcus Crestani
Right, since the uids have to be shared between Scheme and C anyway,
they could also be created in Scheme and then exported to C. This
functionality is currently only in the VM, though.
Ok. I just created some c functions that do only those actions (note, create, or unregister) to make experimenting with this easier.

I've run into what look like bugs in this API unfortunately. I've tried a few things to try to get wait-for-child-process to use wait-for-external-events, and most have worked in the simple and most common case of only one thread waiting on a particular process id, but all have failed for some reason on the multiple thread case.

In the first one, I tried noting and unregistering a process id's external event uid as soon as the sigchld came in. That didn't work, and it's reasonable that the external events system would require you to unregister after all waiting threads are awake.

In the next, I tried having all threads waiting on a given process id wait on one uid that was a field of the process id. I expected that they'd all be woken up when the uid was noted, but only the first thread to wait did. I didn't think that was right, but I tried to get around it, and at least get wait-for-child-process working.

For the next implementation, i tried having a process id have a queue of uids which would be added to when a process waits on it. All the uids in the queue would be noted when the sigchld comes in, and each thread would unregister the uid it added to the queue. In this case too, only the first thread that waited is woken. Oddly, if I start multiple threads waiting on *different* process-ids, they are all woken if the sigchilds come in at sufficiently long intervals. If the sigchilds come in too close together, it has the same effect as waiting for one process id with multiple threads (some aren't woken up). It seems that this api has trouble with notes happening in quick succession.

Does anyone have any insight as to what's going on? I've attached a patch for the last implementation and example code that demonstrates the problem.

-Roderic
Marcus Crestani
2011-08-03 17:59:22 UTC
Permalink
RM> It seems that this api has trouble with notes happening in quick
RM> succession.

Yes, it is not guaranteed that every `s48_note_external_event' gets
handled immediately. That's why external code has to collect multiple
external events and provide a mechanism for Scheme to obtain *all*
occurred events that have not yet been obtained before.

I took a quick look at your patch and I think you should try the
following:

- In `process-terminated-children' you have to collect all terminated
children, e.g. put them in a list.

- In `really-wait-for-child-process' you have to obtain that list of
terminated children after `wait-for-external-event' returns. Then,
process every terminated child by unblocking previously called
`wait-for-child-process', for example via the placeholder in
process-id.

This way you should not lose any terminated child.
--
Marcus
Roderic Morris
2011-08-11 03:36:58 UTC
Permalink
On Aug 3, 2011, at 1:59 PM, Marcus Crestani wrote:

Should've replied to this sooner, but I've been busy. I think we have a misunderstanding.
Post by Marcus Crestani
RM> It seems that this api has trouble with notes happening in quick
RM> succession.
Yes, it is not guaranteed that every `s48_note_external_event' gets
handled immediately. That's why external code has to collect multiple
external events and provide a mechanism for Scheme to obtain *all*
occurred events that have not yet been obtained before.
I understand that. The problem is that some of those calls are *never* handled. I've left the process idling for quite a while without them being woken. Moreover, after the first time a thread fails to wake up, all subsequent calls to s48_note_external_event have no effect, even on brand new event ids.
Post by Marcus Crestani
I took a quick look at your patch and I think you should try the
- In `process-terminated-children' you have to collect all terminated
children, e.g. put them in a list.
- In `really-wait-for-child-process' you have to obtain that list of
terminated children after `wait-for-external-event' returns. Then,
process every terminated child by unblocking previously called
`wait-for-child-process', for example via the placeholder in
process-id.
This way you should not lose any terminated child.
This is basically going back to what was their before. The reason I'm changing the code to not block on a placeholder is because of the deadlock detection. If I block on a placeholder with only one thread running, it'll detect a deadlock. I want to use the wait-for-external-event stuff so that won't happen. The problem is not missing terminated children. I'm noting everywhere I should, but the notes are having no effect in certain situations (as the last email said).

-Roderic
Michael Sperber
2011-08-13 14:01:53 UTC
Permalink
Post by Roderic Morris
Should've replied to this sooner, but I've been busy. I think we have a misunderstanding.
Post by Marcus Crestani
RM> It seems that this api has trouble with notes happening in quick
RM> succession.
Yes, it is not guaranteed that every `s48_note_external_event' gets
handled immediately. That's why external code has to collect multiple
external events and provide a mechanism for Scheme to obtain *all*
occurred events that have not yet been obtained before.
I understand that. The problem is that some of those calls are *never*
handled. I've left the process idling for quite a while without them
being woken. Moreover, after the first time a thread fails to wake up,
all subsequent calls to s48_note_external_event have no effect, even
on brand new event ids.
Details, please.
--
Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla
Roderic Morris
2011-08-14 01:27:20 UTC
Permalink
Post by Michael Sperber
Post by Roderic Morris
I understand that. The problem is that some of those calls are *never*
handled. I've left the process idling for quite a while without them
being woken. Moreover, after the first time a thread fails to wake up,
all subsequent calls to s48_note_external_event have no effect, even
on brand new event ids.
Details, please.
I'm not sure what else to say. The patch and example code I gave a couple emails back demonstrate the behavior I'm talking about, if you'd like a clearer illustration. If the waits are done too close together (and thus the calls to s48_note_external_event are performed close together), some of the threads are not woken and sleep indefinitely. Further calls to wait (which end in calls to wait-for-external-event) are never woken as well.

-Roderic
Michael Sperber
2011-08-14 08:38:35 UTC
Permalink
Post by Roderic Morris
Post by Michael Sperber
Post by Roderic Morris
I understand that. The problem is that some of those calls are *never*
handled. I've left the process idling for quite a while without them
being woken. Moreover, after the first time a thread fails to wake up,
all subsequent calls to s48_note_external_event have no effect, even
on brand new event ids.
Details, please.
I'm not sure what else to say. The patch and example code I gave a
couple emails back demonstrate the behavior I'm talking about, if
you'd like a clearer illustration. If the waits are done too close
together (and thus the calls to s48_note_external_event are performed
close together), some of the threads are not woken and sleep
indefinitely. Further calls to wait (which end in calls to
wait-for-external-event) are never woken as well.
You mean this, right:

http://article.gmane.org/gmane.lisp.scheme.scheme48/2432

? Could you send the C code that goes with it?
--
Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla
Roderic Morris
2011-08-14 11:52:07 UTC
Permalink
Post by Michael Sperber
http://article.gmane.org/gmane.lisp.scheme.scheme48/2432
? Could you send the C code that goes with it?
There's C code in the patch (changes to posix.c). I simply made some scheme bindings that call the s48_*external_event functions directly.

-Roderic
Roderic Morris
2011-08-14 13:00:36 UTC
Permalink
changes to c/posix/proc.c I meant.

-Roderic
Post by Michael Sperber
http://article.gmane.org/gmane.lisp.scheme.scheme48/2432
?  Could you send the C code that goes with it?
There's C code in the patch (changes to posix.c). I simply made some scheme bindings that call the s48_*external_event  functions directly.
-Roderic
Marcus Crestani
2011-09-05 18:00:24 UTC
Permalink
RM> Should've replied to this sooner, but I've been busy. I think we
RM> have a misunderstanding.

Now I've been away, so it took me a while to respond. Sorry for the
misunderstanding.

RM> I understand that. The problem is that some of those calls are *never*
RM> handled. I've left the process idling for quite a while without them
RM> being woken. Moreover, after the first time a thread fails to wake up,
RM> all subsequent calls to s48_note_external_event have no effect, even
RM> on brand new event ids.

I've seen some commits you've made in the last weeks, do these commits
improve the situation? (Sorry, no time to test myself, yet.)
--
Marcus
Roderic Morris
2011-09-05 18:55:44 UTC
Permalink
Post by Marcus Crestani
I've seen some commits you've made in the last weeks, do these commits
improve the situation? (Sorry, no time to test myself, yet.)
Hey, Marcus. No, they don't fix this particular problem. I've tried to debug it myself, but it seems that package (external-events) has some special privileges, and doesn't print debug-messages or allow me to enter it with ,in at runtime. Moreover, I'm not sure if the bug is in that scheme code or the associated vm code. I haven't looked at it too recently.

-Roderic
Roderic Morris
2011-09-29 02:55:36 UTC
Permalink
So, to resurrect this, would it be alright to push my patch as it is,
even though it doesn't solve the full problem (i.e. doesn't work for
multiple threads waiting on one process in rapid succession)? It would
make things easier for me, as far as getting people working on scsh
(have a few people interested here at Northeastern). Attaching the
version of the patch I'm talking about so you don't have to hunt for
it in your archives.

-Roderic
Post by Roderic Morris
Post by Marcus Crestani
I've seen some commits you've made in the last weeks, do these commits
improve the situation?  (Sorry, no time to test myself, yet.)
Hey, Marcus. No, they don't fix this particular problem. I've tried to debug it myself, but it seems that package (external-events) has some special privileges, and doesn't print debug-messages or allow me to enter it with ,in at runtime. Moreover, I'm not sure if the bug is in that scheme code or the associated vm code. I haven't looked at it too recently.
-Roderic
Michael Sperber
2011-10-03 12:28:56 UTC
Permalink
Post by Roderic Morris
So, to resurrect this, would it be alright to push my patch as it is,
even though it doesn't solve the full problem (i.e. doesn't work for
multiple threads waiting on one process in rapid succession)? It would
make things easier for me, as far as getting people working on scsh
(have a few people interested here at Northeastern). Attaching the
version of the patch I'm talking about so you don't have to hunt for
it in your archives.
Looks good to me: Go ahead and push it. (And sorry for the
late-as-usual reply.)
--
Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla
Michael Sperber
2011-07-25 06:23:55 UTC
Permalink
Post by Roderic Morris
while looking into the deadlock detection issue with a friend, we
noticed the external events api in the vm and rts/external-events
package. It seems that the code that checks for deadlocks checks if
there are any external events being waited on if every thread is
blocked. Why don't we change the posix package's wait implementation
(and anything else appropriate in there) to use that mechanism?
That seems appropriate.
Post by Roderic Morris
If that sounds like a good idea, then I just want to ask if anyone
knows the right way to use it.
I do :-)
Post by Roderic Morris
It's not documented as far as I see, but looking at its usage in the
address package (see c/net/address.c and scheme/net/address.scm), it
seems that we need to first create an external event uuid with
s48_external_event_uid(), call wait-for-external-event on it, then use
s48_note_external_event() in whatever thread the event is completed
on, and finally release the uid when the original thread wakes up with
s48_unregister_external_event_uid().
That's the case for temporary events like a DNS lookup. The idea was
that for stuff like the GUI event loop, you'd have a permanent uid.
Post by Roderic Morris
If that's right, then I wonder if there are equivalents to those s48_*
c functions in some scheme package? It looks like it'd be more
straight forward to just create and unregister the uids in scheme.
That may be true - I believe I wanted to avoid having to add a VM
extension, or more external functions.
--
Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla
Franklyn A. Turbak
2011-08-11 03:45:17 UTC
Permalink
I will be out of email contact untli Mon. Aug. 22. I will respond to your
message then.

- lyn -
Franklyn A. Turbak
2011-08-11 03:45:17 UTC
Permalink
I will be out of email contact untli Mon. Aug. 22. I will respond to your
message then.

- lyn -
Franklyn A. Turbak
2011-08-13 14:02:10 UTC
Permalink
I will be out of email contact untli Mon. Aug. 22. I will respond to your
message then.

- lyn -
Franklyn A. Turbak
2011-08-13 14:02:10 UTC
Permalink
I will be out of email contact untli Mon. Aug. 22. I will respond to your
message then.

- lyn -
Franklyn A. Turbak
2011-08-14 08:38:48 UTC
Permalink
I will be out of email contact untli Mon. Aug. 22. I will respond to your
message then.

- lyn -
Franklyn A. Turbak
2011-08-14 08:38:48 UTC
Permalink
I will be out of email contact untli Mon. Aug. 22. I will respond to your
message then.

- lyn -
Continue reading on narkive:
Loading...