Bug 380716 - FIXED: warning messages and missings when using mclapply()
Summary: FIXED: warning messages and missings when using mclapply()
Status: RESOLVED FIXED
Alias: None
Product: rkward
Classification: Applications
Component: general (show other bugs)
Version: unspecified
Platform: unspecified All
: NOR normal
Target Milestone: ---
Assignee: RKWard Team
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-09-25 10:02 UTC by RKWard Team
Modified: 2011-10-24 07:14 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description RKWard Team 2009-09-25 10:02:19 UTC
-- Originally posted by (AT sourceforge.net): *anonymous --

-- This ticket was imported from http://sourceforge.net/p/rkward/bugs/58 on 2017-05-30 15:26:44 +0100 --
There appears to be a memory leak in rkward when using the function mclapply\(\) in package multicore.  This eventually crashes rkward and the session.  The mclapply\(\) function does not leak memory outside of rkward.   mclapply is a version of lapply for  running parallel computations on multiple CPU cores
Ubuntu 9.04 'Jaunty'
rkward version 0.5.0d
KDE version 4.2.2
R version 2.9.2
multicore 0.1-3
Comment 1 Thomas Friedrichsmeier 2009-09-25 13:45:46 UTC
Thanks for reporting. I hope I can still reach you, as it would be nice, if you could provide some more info / do some more testing.

I was not able to reproduce this exactly using the current SVN version \(but read on, below\). Can you provide a reproducible example script to trigger the crash?

You say this is a memory leak. Do you have any specific evidence for that?

Read on below, only if you're curious.

\-----

\*\* What I can reproduce, and some thoughts on that\*\*

The following script will hang after a few iterations in RKWard \(not R\). \(On a single core machine\). gcinfo\(\) does not show growing memory consumption, but simply nothing happens after a while, with the main rkward.bin process hogging up most of the CPU.

library \(multicore\)

gcinfo\(TRUE\)
print \(gc \(\)\)

for \(iteration in 1:100\) \{
print \(iteration\)
mclapply \(1:1000, rnorm, mc.cores=2\)
\}

print \(gc \(\)\)
gcinfo \(FALSE\)


One symptom around this is that a number of not-quite-dead processes is spilt. Those are mostly sleeping, however. The backtrace inside those processes is:

\#0  0xb8070430 in \_\_kernel\_vsyscall \(\)                                                                                                                  
\#1  0xb5b58292 in pthread\_cond\_timedwait@@GLIBC\_2.3.2 \(\) from /lib/i686/cmov/libpthread.so.0                                                            
\#2  0xb5c588b4 in pthread\_cond\_timedwait \(\) from /lib/i686/cmov/libc.so.6                                                                               
\#3  0xb69caf8e in thread\_sleep \(ti=0x9a1e7d0\) at thread/qthread\_unix.cpp:297                                                                            
\#4  0xb69cb0bb in QThread::msleep \(msecs=<value optimized out>\) at thread/qthread\_unix.cpp:323                                                          
\#5  0x0813abe9 in RThread::handleStandardCallback \(this=0x99da3a0, args=0x9a1e8f4\) at /home/thomas/develop/rkward4/rkward/rbackend/rthread.cpp:487      
\#6  0x08148b92 in RReadConsole \(prompt=0xb71aef7a "Selection: ", buf=0xb720d200 "", buflen=4096, hist=0\)                                                
at /home/thomas/develop/rkward4/rkward/rbackend/rembedinternal.cpp:202                                                                              
\#7  0xb7141f05 in R\_ReadConsole \(\) from /usr/lib/R/lib/libR.so                                                                                          
\#8  0xb7071a1c in ?? \(\) from /usr/lib/R/lib/libR.so                                                                                                     
\#9  <signal handler called>                                                                                                                             
\#10 0xb6bf2046 in QDBusAdaptorConnector::relaySlot \(this=0xa8ba170, argv=0xb3c5fefc\) at qdbusabstractadaptor.cpp:268                                    
\#11 0xb6bf29d8 in QDBusAdaptorConnector::qt\_metacall \(this=0xa8ba170, \_c=QMetaObject::InvokeMetaMethod, \_id=0, \_a=0xb3c5fefc\)                           
at qdbusabstractadaptor.cpp:364                                                                                                                     
\#12 0xb6ad1b33 in QMetaObject::activate \(sender=0xabb4580, from\_signal\_index=0, to\_signal\_index=1, argv=0xb3c5fefc\) at kernel/qobject.cpp:3112          
\#13 0xb6ad1f60 in QMetaObject::activate \(sender=0xabb4580, m=0x8182540, from\_local\_signal\_index=0, to\_local\_signal\_index=1, argv=0xb3c5fefc\)            
at kernel/qobject.cpp:3206                                                                                                                          
\#14 0xb6ad1feb in QObject::destroyed \(this=0xabb4580, \_t1=0xabb4580\) at .moc/release-shared/moc\_qobject.cpp:143                                         
\#15 0xb6ad2df9 in ~QObject \(this=0xabb4580, \_\_in\_chrg=<value optimized out>\) at kernel/qobject.cpp:757                                                  
\#16 0xb6d9de3d in ~Scheduler \(this=0xabb4580, \_\_in\_chrg=<value optimized out>\) at ../../kio/kio/scheduler.cpp:259                                       
\#17 0xb6da1131 in ~SchedulerPrivate \(\) at ../../kio/kio/scheduler.cpp:102                                                                               
\#18 destroy \(\) at ../../kio/kio/scheduler.cpp:209                                                                                                       
\#19 0xb6cd68db in ~KCleanUpGlobalStatic \(this=0xb6e9f1f4, \_\_in\_chrg=<value optimized out>\) at ../../kdecore/kernel/kglobal.h:62                         
\#20 0xb5b98589 in exit \(\) from /lib/i686/cmov/libc.so.6                                                                                                 
\#21 0xb28d9e9e in mc\_exit \(sRes=0xa190538\) at fork.c:492                                                                                                
\#22 0xb701c689 in ?? \(\) from /usr/lib/R/lib/libR.so                                                                                                     
\#23 0xb7045b6a in Rf\_eval \(\) from /usr/lib/R/lib/libR.so                                                                                                
\[...\]
\#48 0xb7045895 in Rf\_eval \(\) from /usr/lib/R/lib/libR.so                                                                                                
\#49 0xb7048628 in ?? \(\) from /usr/lib/R/lib/libR.so
\#50 0xb7045895 in Rf\_eval \(\) from /usr/lib/R/lib/libR.so
\#51 0xb7072b33 in R\_ReplDLLdo1 \(\) from /usr/lib/R/lib/libR.so
\#52 0x08142aed in runUserCommandInternal \(\) at /home/thomas/develop/rkward4/rkward/rbackend/rembedinternal.cpp:950
\#53 0xb6ff2f19 in R\_ToplevelExec \(\) from /usr/lib/R/lib/libR.so
\#54 0x0814406a in REmbedInternal::runCommandInternal \(this=0x99da3a8, command\_qstring=..., Rf\_error=0xb3c622d8, print\_result=true\)
at /home/thomas/develop/rkward4/rkward/rbackend/rembedinternal.cpp:1007
\#55 0x0813c616 in RThread::doCommand \(this=0x99da3a0, command=0xaad53c8\) at /home/thomas/develop/rkward4/rkward/rbackend/rthread.cpp:189
\#56 0x0813d755 in RThread::run \(this=0x99da3a0\) at /home/thomas/develop/rkward4/rkward/rbackend/rthread.cpp:125
\#57 0xb69cb582 in QThreadPrivate::start \(arg=0x99da3a0\) at thread/qthread\_unix.cpp:188
\#58 0xb5b544b5 in start\_thread \(\) from /lib/i686/cmov/libpthread.so.0
\#59 0xb5c49a5e in clone \(\) from /lib/i686/cmov/libc.so.6

So what we appear to have here, is that the main thread has already exited in mc\_exit\(\), and the R thread is somehow left waiting \(for response from the main thread; likely it got a SIGSEGV, though, and is asking for info on that\). I'm not sure, whether these zombie process are really the problem, though, or just a symptom.

Are Qt / GUI programs expected to be fork\(\)able at all?
Comment 2 Thomas Friedrichsmeier 2009-09-25 13:45:49 UTC
- **summary**: Memory leak when using mclapply() --> FEEDBACK: Memory leak when using mclapply()
Comment 3 RKWard Team 2009-09-27 01:44:44 UTC
-- Originally posted by (AT sourceforge.net): nobody --
Hi,  Thanks for the feedback.   The only evidence I have for a memory leak was through my system monitor \(conky\).  I have a intel dual core machine.  I ran your script both in rkward and in R in a terminal window.  There was no problem running the script in a terminal window.  Curiously when I ran your script in rkward gcinfo\(\) did not report growing memory but rkward crashed after about 30 odd iterations.  However monitoring conky at the same time showed rkward.bin pegged at 100% in both cpu's and the ram usage climbing rapidly and reaching 100% before rkward crashed.

I assume the problem is caused by forking rkward.bin.  I really don't have much more insight to offer unfortunately. 
Comment 4 Thomas Friedrichsmeier 2009-09-28 10:10:29 UTC
Ok, so we see slightly different symptoms. This may be due to hardware differences, or to changes in RKWard since 0.5.0d.

I am fairly certain, that the cause really is the forking, though. More precisely, what appears to happen is that mclapply forks the thread that R is running in. In the case of RKWard, the R thread assumes to be able to talk to the main thread of RKWard. That simply can't work.

Unfortunately, this means that this is not easy to fix. For a proper fix we'd need code to detect the special situation of having been forked off, and more code to deal with that, properly. I'd say that is probably more trouble than it's worth to cover this special case.

This does not mean the bug is never going to be fixed. In the somewhat longer term I plan to separate the "main" and "R" threads into two different processes, and the R process in that setup would be much more similar to a vanilla R instance in several ways. This will hopefully fix a number of other bugs, and this one should be much easier to resolve, then, as well. But again, this will take some time. Until then, you can set options\("mc.cores"=1\) to at least be able to develop the code in RKWard. To actually run it only multiple threads, use a plain R session, for the time being.
Comment 5 Thomas Friedrichsmeier 2009-09-28 10:10:31 UTC
- **summary**: FEEDBACK: Memory leak when using mclapply() --> CRASH / HANG when using mclapply()
Comment 6 Thomas Friedrichsmeier 2010-11-19 18:20:19 UTC
In the current development version, frontend in backend run in separate processes. This improves the situation a bit, but does not entirely fix it. There are no more crashes / hangs, but some warnings "select: Interrupted during system call" start to appear.

Occasionally, child processes appear to fail \(run the following a number of times\):

library \(multicore\)
for \(iteration in 1:100\) \{
print \(iteration\)
x <- mclapply \(1:100, rnorm, mc.cores=2\)
y <- sapply \(x, length\)
stopifnot \(y == 1:100\)
\}

So far I could not figure out, just which signal interrupts a select \(probably in the transmitter thread\), and how to block it from doing so.
Comment 7 Thomas Friedrichsmeier 2010-11-19 18:20:20 UTC
- **summary**: CRASH / HANG when using mclapply() --> warning messages and missings when using mclapply()
- **milestone**: 194503 --> 
Comment 8 Thomas Friedrichsmeier 2011-10-04 11:00:47 UTC
For future reference, here's a console transcript with debug statements turned on. No idea, what to make of this, so far, though.

\-------------------

> options \(warn=2\)
\+ library \(parallel\)
\+ for \(i in 1:1000\) \{
\+ x <- mclapply \(1:100, rnorm\)
\+ y <- sapply \(x, length\)
\+ stopifnot \(all.equal \(1:100, y\)\)
\+ \}
parent registers new child 32076
child process 32076 started
parent registers new child 32077
select\_children: maxfd=14, wlen=2, wcount=2, zombies=0, timeout=1:0
child process 32077 started
child 32076: send\_master \(20422 bytes\)
sr = 1
\- read select 1 children: 32076 
read\_child\_ci\(32076\) - read length returned 4
read\_child\_ci\(32076\) - read 20422 at 0 returned 20422
select\_children: maxfd=14, wlen=2, wcount=2, zombies=0, timeout=1:0
child 32076: mcexit called
sr = 1
\- read select 1 children: 32076 
read\_child\_ci\(32076\) - read length returned 4
removing child 32076
select\_children: maxfd=14, wlen=2, wcount=1, zombies=0, timeout=1:0
child process 32076 got SIGUSR1; child\_exit\_status=-1
child 32076: exiting
sr = -1
child 32077: send\_master \(20822 bytes\)
child 32077: mcexit called
child 32077 is waiting for permission to exit
Error in selectChildren\(ac, 1\) : 
\(converted from warning\) error 'Interrupted system call' in select
Calls: mclapply -> selectChildren
Error in selectChildren\(ac, 1\) : 
\(converted from warning\) error 'Interrupted system call' in select
Calls: mclapply -> selectChildren
select\_children: maxfd=14, wlen=1, wcount=1, zombies=0, timeout=0:0
sr = 1
\- read select 1 children: 32077 
read\_child\_ci\(32077\) - read length returned 4
read\_child\_ci\(32077\) - read 20822 at 0 returned 20822
select\_children: maxfd=14, wlen=1, wcount=1, zombies=0, timeout=0:500000
sr = 1
\- read select 1 children: 32077 
read\_child\_ci\(32077\) - read length returned 4
removing child 32077
Comment 9 Thomas Friedrichsmeier 2011-10-24 07:14:12 UTC
Fixed in version 0.5.7.
Comment 10 Thomas Friedrichsmeier 2011-10-24 07:14:12 UTC
- **assigned_to**: nobody --> tfry
- **summary**: warning messages and missings when using mclapply() --> FIXED: warning messages and missings when using mclapply()
- **status**: open --> closed