Roderic Morris
2011-04-18 22:58:37 UTC
I've come across a bug in wait-for-child-process in the
posix-processes package. If the process with the given pid hasn't died
and is long running, wait-for-child-process will start to allocate a
ridiculous amount of memory. I've had it make pretty powerful machines
unusable.
I looked into it and traced the problem to the C function
posix_waitpid() in c/posix/proc.c. It fails to handle the case where
waitpid() returns 0 (which means that there are children running, but
no statuses are available for them). In the best case, this causes it
to loop until the child process dies, pegging the cpu. Unfortunately,
there's a space leak somewhere inside the loop, so the problem is even
worse and manifests itself in the way i described.
One of the patches I've attached fixes that problem (although it
doesn't address the space leak), but uncovers a few others. First,
process-terminated-children is actually broken in the case where it's
not given an argument, but it finds a process which is being waited
on. Second, wait-for-child-process will never return in the long
running child case, unless some other code has called
make-signal-queue with sigchld as an argument. os-signal-handler isn't
called for sigchld unless that happens. I've attached another patch
for the first, but I'm not sure how to approach the second.
P.S. Is there a way to disable deadlock detection other than the
(spawn (lambda ()
; Sleep for a year
(sleep (* 1000 60 60 24 365))))
hack from the manual? I've never had it be helpful, and it's
especially annoying when doing any work with subprocesses.
-Roderic
posix-processes package. If the process with the given pid hasn't died
and is long running, wait-for-child-process will start to allocate a
ridiculous amount of memory. I've had it make pretty powerful machines
unusable.
I looked into it and traced the problem to the C function
posix_waitpid() in c/posix/proc.c. It fails to handle the case where
waitpid() returns 0 (which means that there are children running, but
no statuses are available for them). In the best case, this causes it
to loop until the child process dies, pegging the cpu. Unfortunately,
there's a space leak somewhere inside the loop, so the problem is even
worse and manifests itself in the way i described.
One of the patches I've attached fixes that problem (although it
doesn't address the space leak), but uncovers a few others. First,
process-terminated-children is actually broken in the case where it's
not given an argument, but it finds a process which is being waited
on. Second, wait-for-child-process will never return in the long
running child case, unless some other code has called
make-signal-queue with sigchld as an argument. os-signal-handler isn't
called for sigchld unless that happens. I've attached another patch
for the first, but I'm not sure how to approach the second.
P.S. Is there a way to disable deadlock detection other than the
(spawn (lambda ()
; Sleep for a year
(sleep (* 1000 60 60 24 365))))
hack from the manual? I've never had it be helpful, and it's
especially annoying when doing any work with subprocesses.
-Roderic