debug MPI programs with many slaves

Although ddt is handy, in some cases one might want to do it the old fashioned way (e.g., debugging many nodes that exceeds the license limit, or, you do not have ddt).

Instead opening 30 terminals and attach gdb to the process manually, the procedure below turns out to be useful:

 1)  I have a small script called gdbwait in my path, this script can be used manually to wait for a process to be launched, or, used automatically in steps 2) and 3). I am not sure if I need to use –o or –n option for “pgrep” to catch the right pid, but for single process on each node, it does not matter.

#!/bin/sh
progstr=$1
progpid=`pgrep -o $progstr`
while [ "$progpid" = "" ]; do
progpid=`pgrep -o $progstr`
done
gdb -ex continue -p $progpid

2) For example, we have a program called testme to be debugged, edit a script file with these lines below. It should prepare gdb to wait remotely w/o occupying your workspace/terminals.

rsh slave002 screen -d -m gdbwait testme
rsh slave003 screen -d -m gdbwait testme
rsh slave004 screen -d -m gdbwait testme
rsh slave005 screen -d -m gdbwait testme

3) Wait a while and launch the MPI program … To exam the gdb on slave002, use:

ssh -t slave002 screen -r
[NOTE: to detach, hit these two keystrokes in serial: Ctrl-A, d] 
Google “gnu screen” for the usage of the screen utility.

This entry was posted in programming. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *