Release 0.84. Three new features primary features: support for mvapich2 versions 1.0 -- 1.6rcN, InfiniPath MPI support, Totalview for MPICH2. This release is well overdue, what else is new. More details are given below in the Changes section.
Release 0.83. It has been 15 months since the last release. This overdue release has just a few compilation and bug fixes, and a bit of new support. More detail is found below in the Changes section.
Add support for Portals, as implemented in userspace on TCP.
Add suport for mvapich 0.9.9 and 1.0 beta. Each of these last two MPICH on IB releases changed the startup protocol.
Force configure-time selection of a default communication device. Now you must configure using "--with-default-comm=mpich2" or similar.
Release 0.82. A few feature additions and many bug fixes, all of which are explained in more detail below in the Changes section.
Support for Intel MPI version 3 extensions to PMI.
Track individual TM node ids, enabling future NUMA-aware task placement.
New command-line switch '-npernode', generalization of '-pernode'.
The -transform-hostname feature now works on mpich2/pmi.
Release 0.81. Many changes, which are documented in more detail below in the Changes section.
Asynchronous GM and IB startup for much improved scalability.
Support new mvapich startup prtocol, but recent mvapich are still broken.
Support Myrinet's new MX message passing protocol. The
Support for MPI_Spawn and other MPI2 process management through PMI interface.
Redirection of standard IO streams for PBSpro systems through helper code.
Mpiexec is a replacement program for the script mpirun, which is part of the mpich package. It is used to initialize a parallel job from within a PBS batch or interactive environment. Mpiexec uses the task manager library of PBS to spawn copies of the executable on the nodes in a PBS allocation.
Reasons to use mpiexec rather than a script (mpirun) or an external daemon (mpd):
rsh
or ssh
once for each process.
Mpiexec handles creation of the node list file, if required by the message passing library, and the shared-memory file for use on SMP nodes. It also redirects standard input and output to the shell from which it was invoked, bypassing the PBS output and error files if you choose. One handy feature in particular is -allstdin which replicates the contents of an input stream to each process. Support for heterogeneous executables and/or different command lines to each task is provided through the use of an optional configuration file.
Mpiexec works on machines of many architectures, including x86, ia64, alpha, sparc, power4, and with many operating systems, including linux, freebsd, solaris, and darwin. It should be easily portable to any other machine which runs PBS.
Most current MPI implementations are supported:
--comm=pmi
to specify this MPI.
--comm=pmi
).
These two related MPI implementations have their own startup programs that work with PBS. While there is vestigial support for LAM, you are encouraged to try the launcher in the distribution of the MPI library first.
Mpiexec is free software and is licensed for use under the GNU General Public License, version 2.
You probably want to download the latest release, but read on if you need to support older versions of your MPI or communications libraries.
If you prefer to upgrade by patch, the following will apply nicely from
within your current mpiexec directory using patch -p1 -sNE
or similar. Older tarballs are frequently available as tags in the SVN
repository.
Release | |||||
---|---|---|---|---|---|
mpiexec-0.84.tgz | 224 kB | ||||
mpiexec-0.83.tgz | 203 kB | mpiexec-0.83-0.84.diff.gz | ?? kB | ||
mpiexec-0.82.tgz | 197 kB | mpiexec-0.82-0.83.diff.gz | 57 kB | ||
mpiexec-0.81.tgz | 191 kB | mpiexec-0.81-0.83.diff.gz | 79 kB |
An anonymous read-only subversion repository is maintained for mpiexec, too. No guarantees on the quality of the code you'll find in there at any given moment, but it usually tends to work.
You can browse the source code here, and to check out your own copy, this subversion command will create a new directory called mpiexec and populate it with the latest source tree:
svn co http://svn.osc.edu/repos/mpiexec/trunk mpiexec
Please take a look at the Frequently Asked Questions (FAQ) to look for answers to commonly asked mpiexec questions.
The latest version of the README included with the distribution is also available.
There is a mailing list for mpiexec.
Archives of the mailing list are available for browsing.
Send mail to with comments, questions, and bug fixes. Be sure to send only plain text mails to the list, no pure HTML or multipart text plus HTML, please.
Subscribe to the listusing the standard mailman subscription management form.
Posts from subscribers are relayed immediately to all list recipients; however, posts from non-subscribers are moderated to avoid SPAM and thus may be delayed until someone gets around to approving the mail.
A long overdue release.
This release adds support for Totalview with MPICH2 courtesy of Frank Mietke.
Implementation of new startup protocol for QLogic InfiniPath from Christian Bell.
Frank Mietke also supplied changes that inlcude files from the mvapich release version 1.0. These changes are to support the PMI-like startup protocol that was adopted in that version.
Various bug and compile fixes, and updates to documentation/FAQ.
It has been 15 months since the last release. This overdue release has just a few compilation and bug fixes, and a bit of new support.
Add support for Portals, as implemented in userspace on TCP. It is very unlikely anyone will use this, but it is a good recent example of how to support a new communication device in mpiexec. While Portals is in use on some of the biggest machines around, such as the XT3 at ORNL and other sites, those platforms have their own MPI launch mechanism. The support for Portals here is for the TCP implementation that is used mainly in testing Portals codes targeted to the larger machines.
Add suport for mvapich 0.9.9 and 1.0 beta. Each of these last two MPICH on IB releases changed the startup protocol. Jan Ploski was intstrumental in the first of these, which added an entire extra communication phase with another round of socket close/accept for each compute process. This could be very slow. Support for the expected 1.0 was implemented by Frank Mietke. It is similar but adds some more data transfers and thankfully avoids the second round of socket accepts.
Force configure-time selection of a default communication device. Now you must configure using "--with-default-comm=mpich2" or similar. In previous releases, mpiexec would choose a default of GM. This is no longer likely to be useful, and led to confusion. The mpiexec binary will still support all MPI libraries that it knows about through the --comm argument or MPIEXEC_COMM environment variable, as before. The configure error message hopefully leads one to make a good guess without too much head-scratching.
A few bug fixes and compile problems.
There are four interesting feature additions in this release, along with the usual collection of bug fixes.
Intel sells an MPI library that is based on MPICH2 from ANL, and their latest version 3 adds extensions to the PMI startup protocol used by mpiexec and other job launchers to communicate with the MPI tasks. Thanks to documentation from Intel and a patch from Thomas Zeiser, mpiexec works with Intel MPI version 3 as of this release.
As multi-core processors and large SMPs become more prevalent, issues related to the allocation and scheduling of processing tasks and memory regions become important to achieving good performance. Past mpiexec versions did not distinguish one CPU from another on a given node, where node is defined as a single machine in the PBS sense. This release adds code that is careful to track individual per-CPU identifiers as given from PBS. While users will not notice this change, future support in Torque for cpusets or other placement mechanisms will take advantage of this feature.
Also useful for large SMPs is the new command-line switch '-npernode', which is a generalization of '-pernode' that places no more than a given number of tasks on a single node. This idea and patch are also from Thomas.
The -transform-hostname feature now works on mpich2/pmi, thanks to prodding by Brad Settlemeyer, meaning you can cause your MPI program to use a separate ethernet interface for message passing than what PBS uses.
Some minor enhancements:
Shell-style comments starting with '#' are now permitted in config files.
The status summary printed by mpiexec when it exits is more careful to distinguish among the possible failure cases, including when tasks were not started due to previous failures. It does, however, no longer complain when exit statuses were not received from tasks terminated by PBS due to going over the walltime allocation, for example. This was too misleading and apparently difficult to fix it Torque.
Explain more in the runtests.pl test script about hangs caused by buggy MPI versions. Failing to implement MPI_Abort properly (or at all) is a common error.
Numerous bug fixes and compile problems.
This release ended up having lots of changes. It was a long 9 months ago when the previous release happened, so perhaps that is not too surprising.
Startup for GM (or MX) and InfiniBand is now asynchronous, meaning that mpiexec will spawn tasks and pay attention to ones that are starting up at the same time. This greatly increases the speed for large systems, and avoids timeouts in newly created clients. The largest reported machine using this work is an 8000-ish processor InfiniBand cluster at Sandia.
Code was added to support modifications to the startup protocol by mvapich, an MPICHv1 on InfiniBand library. However, the latest mvapich version 0.9.7 does not work with mpiexec. See the next news item below.
Support was added for Myrinet's new message passing protocol, MX. The Myricom developers were nice enough to make MX look a lot like GM as far as mpiexec is concerned, so they are both supported in the same code.
Support for MPI2 process management features was added. You can now call MPI_Spawn and have mpiexec add more processes dynamically to your job. This works with the PMI interface used by MPICH v2 from ANL and vendor releases based on that code. Other MPI2 features such as name publishing are supported too.
Some fixes for PBSPro issues were added, to work around both syntactic and
semantic changes in the PBSPro version of the TM and PBS interfaces. One nice
new feature for people using PBSPro is the redirection helper. It
enables the use of stdio redirection without assistance from PBSpro. If you
configure with --enable-pbspro-helper
, a second binary will be
built and installed. Mpiexec launches this code on each compute node; it
takes care of connecting the stdio sockets back to mpiexec, then starts your
MPI task.
The source code repository is now SVN, not CVS, mainly due to the good support and encouragement of HPC system staff at OSC.
An assortment of little bug fixes, code cleanups, and compiler warning suppressions for various systems were added.
Sufficiently recent versions (>= 2.3.11) of PBS require the included patch to be applied if you want the standard streams handling functionality to work. It's quite handy stuff to be able to redirect your input and output anywhere, not just to the magic PBS hiding place. To do that, you'll need the source to PBS, and will have to recompile. Things do work just fine with a stock version of PBS, but you will have no stream redirection.
All references to PBS on this page refer to "OpenPBS" available from
Veridian Information Solutions at the
OpenPBS site.
In particular, the latest known version with which we're comfortable is
2.3.15, but changes to OpenPBS are so rare and minimal that it is highly
likely that the latest mpiexec will work with a newer OpenPBS. You must
apply the patch included in mpiexec to be able to use input and output
redirection. For the latest versions of everything,
this patch is patch/pbs-2.3.12-mpiexec.diff
, as described
in the README
.
PBS has a long history, having been initially developed in the public domain by the United States government at NASA Ames Research Center and Lawrence Livermore National Laboratory. Veridian and others continued development, then Veridian renamed it in the year 2000 to OpenPBS, to differentiate it from their non-free version called "PBSPro". We don't use the latter, but mpiexec might work with that version.
There have been two reports so far that PBSPro does not work correctly with
mpiexec, though, but there are also plenty of success stories too.
Note! Recent information suggests that the PBSPro distributions are
faulty in that the executable pbs_demux
is not part of the
"client" RPM (at least on linux). If you have compute nodes with their own
file systems, and you find that they do not have the code
pbs_demux
installed, try to copy it by hand from the "server" RPM
distribution. Thanks to Stefan Parnell for figuring this out. Any more
advice, confirmation, or the results of somebody invoking his PBSPro software
maintenance contract would be appreciated information to add here.
Note 2! Complaints from get_hosts like the following:
mpiexec: Warning: get_hosts: ncpus=2 but nodect=2, pretending nodect=1.
are fixed in the CVS as of February 4.
A branch of OpenPBS is under active development and funded by the Department of Energy. It is called Torque and is available from supercluster.org. The developers have been very good about accepting patches. Note that the PBS patch included with mpiexec will not apply cleanly to Torque version 1.0.1p6. You should expect to have to fix the rejects by hand in an editor.
You must select whether you plan to use shared memory with MPICH/P4 when you compile the mpich library. To use shared memory, add the configure option "--with-comm=shared" when you build mpich.
Then when you configure mpiexec, if you have added that option to the mpich build, it is not necessary to do anything. However, if you choose not to build mpich/p4 to use shared memory, you should add the flag "--disable-p4-shmem" here. Note that you must make sure that mpich and mpiexec are compatible in this regard or applications will not start.
There is more information on this topic in the README, along with information on a command-line option to change the shmem setting in mpiexec for testing.
If you have a very old mpich, before version 1.2.4, you will need to apply a patch to your mpich distribution before it will work with mpiexec. The patch is included with mpiexec and described in the README.
Mpiexec works with GM, GM2, and MX.
Very old versions of MPICH/GM from Myricom (before 1.2.4..8) will not work with modern mpiexec. Version 0.69 is the latest that will support such old MPICH/GM.
The MPI-2 specification suggested in 1997 that the name
mpiexec
be used by implementations that provide a mechanism to
initialize a parallel program. They specifically do not suggest
mpirun
because that name was widespread in existing practice, in
non-standard and non-portable ways, and the MPI Forum did not want to confuse
matters.
The existence of this name in any specification was not really a problem
as most MPI implementations happily ignored it and continued with their
existing mpirun
scripts. Now, though, fast forward to 2005
where a MPICH distribution that implements features of MPI-2 begins to see
some popularity. The MPICH2 distribution includes six other parallel code
launcher programs and scripts, all called mpiexec
.
The pages you are reading here discuss the version of mpiexec
that was designed specifically to start parallel MPI codes in PBS
environments. It has almost nothing to do with these other six versions that
are shipped in the MPICH2 distribution. If you are planning to use the
mpd
that comes with MPICH2 and this version of
mpiexec
, things will not work. Take a look at the MPICH2
documentation to understand how to use their mpiexecs. Send bug reports to
mpich2-maint@mcs.anl.gov, and
see the documentation at the
MPICH2 page.
If, however, you use the PBS resource manager and would like to take
advantage of the features provided by the PBS mpiexec
discussed
above, support is included for the MPICH2 library. Specify the flag
--comm=pmi
on the command line (or use configure to make that the
default at build time) to launch your MPICH2 executable. See the manual page
and included README for more information. Send mail to the list if you run
into any problems.
You can do funny, and sometimes useful, things with mpiexec that are not immediately obvious.
Running tasks using rsh or ssh is bad, for all the reasons given above, and is one of the big reasons why we use mpiexec in the first place. But what if you have non-modifiable codes that really expect to be able to use rsh? It should be possible to fool them into using mpiexec. Here's some hints; if anyone comes up with a nice set of wrapper scripts to do this well, please share. It would be interesting to write a program/script called "rsh" that does the right thing, too, parsing rsh arguments, running commands, and starting up an mpiexec server if necessary too.
Basic rsh: run one task on one node.
echo 'opt0600: hostname' | mpiexec --comm=none -nostdin -config=-
Feeding the config file on stdin requires that the task not read
from stdin, which is what "rsh -n" does. But if this isn't the case,
put the config in a temporary file somewhere and pass that file as
the argument to --config=
.
If you need to run more than one remote rsh at a time like this, you'll need to use mpiexec's server mode. At the beginning of the job, start up one instance of:
mpiexec --server &
Then you can spawn off as many instances of mpiexec as above, and leave them in the background until they finish.
echo 'opt0600: sleep 10' | mpiexec --comm=none -nostdin -config=- & echo 'opt0601: hostname' | mpiexec --comm=none -nostdin -config=-
At the end of the job, kill off mpiexec, or just let it be killed by the batch system when the main script exits.
Frequently the problem arises of how to move a file from the master node of the PBS job out to all the worker nodes. People tend to write little scripts to loop through the contents of $PBS_NODEFILE and invoke an rcp to each one. This does the same thing, but in parallel.
cd $working-dir echo some stuff > file.src mpiexec --allstdin --comm=none --pernode cat \> file < file.src
Note importance of escaping the backslash so that it gets evaluated by the
shell on each compute node, not by the shell in which mpiexec is invoked.
Also notice that input-file
is different from
input-file.src
otherwise you'll end up writing over the same file
from which you're reading on the master node, ending up with a zero-length
file on all machines.
To start a separate terminal for each debugger process, you might use
mpiexec xterm -e gdb mycode
with the caveat that this only works for devices which pass information by
environment variables, not by command-line arguments. In other words, use the
above for anything but MPICH/P4. Each process will start in a separate
terminal, and obviously you must ensure that X clients can reach the server on
your desktop. (Perhaps you need to use the argument -v DISPLAY
when starting an interactive PBS jobs with qsub -I
?)
A variation which works for MPICH/P4 is the following:
mpiexec xterm -e gdb --args mycode
This tells gdb that all the magic MPICH arguments should be ignored,
and mpiexec arranges to place them last on the command line.
Then the --args
switch to gdb says to interpret all the
rest of the line as arguments to the debugging target.
Note that with P4 you will see only a single xterm for process zero, then as you use the debugger to run the code to MPI_Init, all the rest of the windows will then pop up and can be manipulated.
In an interactive job, you sometimes have the failure mode where one (or
more) processes die. This replicates your typed input to a gdb around each
process so that if one stops you can at least type where
to
get a backtrace.
mpiexec -np 4 -allstdin gdb mycode
You'll get 4 identical gdb prompts, and anything typed to one will be brodcast to all. This does not work with mpich/p4, however, as it requires special startup order and command-line arguments.
You can always use a config file to put the debugger on only some of the nodes:
mpiexec -np 4 -allstdin -config conf
where file conf
contains something like:
-n 1 : mycode -n 1 : gdb mycode -n 2 : mycode
to debug only the process with rank 1 out of 4 total.
Last two years or so.
mpiexec-0.84 2 Aug 2010
* FAQ: Include frequent compilation issue due to missing
'pbs-config'
* mpiexec.1: Small clarification to document '-n' and '-np' as
synonyms.
* Makefile.in: Put FAQ in distribution, and long-forgotten man
page for redir helper.
* FAQ: Copy from README just the frequently asked questions.
* README: Remove FAQ bits.
* pmgr_collective_mpirun.c pmgr_collective_mpirun.h
pmgr_collective_common.c pmgr_collective_common.h: New files,
copied from mvapich release 1.0 and fixed to compile cleanly.
There are four copies of each file in that tree, all currently the
same, but watch for drift. Preferentially track
mpid/ch_gen2/process in the future. Not completely integrated
into mpiexec (error messages, etc.) to make it easier to pull
in changes from mvapich in the future.
* LICENSE.mvapich: New file, license for mvapich files.
* Makefile.in: Build the new files.
* README: Document updated support for mvapich 1.0.
* ib.c: Use the new protocol version 8 implemented here.
These code changes by Frank Mietke.
* mpiexec.h start_tasks.c pmi.c mpiexec.c: Add support for
totalview in MPICH2.
* Makefile.in: Build and distribute new files.
* README.tv: Document new mpich2 and old mpich totalview support.
* tv_attach.c tv_attach.h: New files that provide support functions
for MPICH2-style totalview attachment.
These changes and documentation by Frank Mietke.
* start_tasks.c util.c: Two fixes for mac, as found by Bill Gropp.
* config.c util.c concurrent.c stdio.c start_tasks.c task.c rai.c
mpiexec.c: Fix a bunch of little compile warnings from Intel cc 9.1,
as reported by Denis Anjos.
* psm.c: Implementation of new startup protocol for QLogic InfiniPath.
* start_tasks.c stdio.c: Startup environment and abort messages.
* mpiexec.h mpiexec.c README Makefile.in: Wiring for new protocol.
* configure.in configure: Configure glue.
All these changes by Christian Bell.
* ib.c: Fix bug with version mismatch code when version is also
out of the known range.
* mpiexec.c: Accept lone "--" on command line to indicate end
of options, as suggested by Andrew McNabb.
mpiexec-0.83 22 Feb 2008
* ib.c: Add suport for version 6 startup in mvapich 1.0. Patch
provided by Frank Mietke.
* event.c: On a "remote system warning" from tm_poll, look up the
event structure to be able to point at the broken node.
* README: Explain mpich2/smpd unusability. Update PBSPro notes
to point out that version 8 does not work with mpiexec.
* configure.in configure: Force selection of a default comm device.
* README: Explain this required option a bit more.
* Makefile.in: Fix for recent autoconf.
* ib.c: Support version 5 two-phase startup protocol for mvapich
0.9.9. Inspiration and testing by Jan Ploski.
* get_hosts.c: Correct i -> j index problem. Should not have caused
any errors, just perhaps slower and obviously wrong. From Eygene
Ryabinkin.
* mpiexec.h mpiexec.c configure.in configure config.h.in start_tasks.c:
Add basic support for Portals, at least using userspace TCP NAL.
Also a good example of how to add a new device in mpiexec.
* config.c mpiexec.c: Fix two old bugs found by Thomas Svedberg.
The interesting one only affect mpich-p4/shmem systems with
ppn > 2.
* README: Rearrange pbs notes a bit. Add FAQ for missing pbs_iff.
* list.h: PGI compiler does not know typeof. Bug found by
Filippo Spiga.
* gm.c util.h task.c list.h event.c ib.c: Hacks and fixes for
Cray C compiler.
mpiexec-0.82 28 Nov 2006
* get_hosts.c mpiexec.h mpiexec.c mpiexec.1: Implement -npernode
generalization of -pernode, by Thomas Zeiser.
* start_tasks.c: Do not track startup_complete for COMM_NONE jobs.
Bug found by Thomas Zeiser.
* runtests.pl: Add a warning about bad mpich1/p4 behavior, and update
for new output strings.
* contests.pl: Fix obvious bug, initialize spid for debugging.
* mpiexec.h: Track individual TM node ids.
* get_hosts.c: Parse host data in two passes, to track TM node ids
and to use fewer string comparisons.
* config.c spawn.c: Set the cpu_index on each task as it is created.
* concurrent.c: Pass TM node ids to clients. Do not exit on SIGPIPE.
Be a bit more careful with the nodealloc lock.
* start_tasks.c: Spawn on particular TM node id.
* task.c: Look at individual TM node ids to get hostname.
* event.c: Warn on remote system error, see what turns up.
* mpiexec.c: Track CPU indexes. Look at sigaction return value.
* mpiexec.h get_hosts.c concurrent.c spawn.c pmi.c mpiexec.c: Rename
numcpu in preparation for tracking individual TM node ids.
* mpiexec.c: When no exit statuses are obtained, do not complain.
* runtests.pl: Be a bit more precise in explaining the mpich2
MPI_Abort problem.
* pmi.c mpiexec.c spawn.c mpiexec.h: Add support for get_ranks2hosts
PMI command used by Intel MPI version 3. Initial patch by Thomas
Zeiser.
* get_hosts.c task.c: The field tasks[i].done tasks on a range
of values, not just true/false.
* runtests.pl: Add a warning explaining that mpich2 MPI_Abort
is broken.
* stdio.c: Likely bug fix when stdio is closed.
* start_tasks.c: Make -transform-hostname work on MPICH2, thanks
to Brad Settlemyer for the prompting and testing.
* mpiexec.1: Document this and explain better.
* contests.pl: Be a little more verbose about expectations; check
for working /bin/true and false.
* pmi.c: Add some warnings about PMI name publishing problems.
* README: Fix typo.
* README: Update open file limit text a bit.
* mpiexec.h start_tasks.c exedist.c mpiexec.c: Add DONE_NOT_STARTED
state to print better messages at task exit time.
* task.c: Avoid killing tasks that are not running, although harmless.
* README: Add mpich/p4 vs mpich2/pmi FAQ item. Rearrange TODOs.
* runtests.pl: Catch new startup-incomplete message.
* get_hosts.c: Fix -nolocal -pernode as reported by Chris Maestas.
* runtests.pl: Add a test to check for this, and make sure it works.
* stdio.c: Update comment hinting at how Torque compile may cause
a weird tty symptom on Mac.
* config.c mpiexec.1: Accept # comments in config spec. Rearrange a
bit to avoid a loop. From Cray.
* stdio.c: Finally get rid of PRINTF macro. Fix a potential negative
pid race fix, from Cray.
* exedist.c mpiexec.c start_tasks.c: Minor cleanups from Cray.
mpiexec-0.81 19 Apr 2006
* stdio.c: Clear returned events for new fds. Convert some PRINTF
to debug. Maybe optimize a bit by checking n before looping over
everything. Work around a potential Mac bug related to polling
on tty stdin.
* gm.c ib.c: Fix off-by-one error in select for --disable-poll case.
* mpiexec.h stdio.c pmi.c: Pass rfs rather than use global, avoids
collision with rfs use in ib.c and gm.c.
* start_tasks.c gm.c mpiexec.h: Implement asynchronous GM startup.
* stdio.c: Fix bug found by Garrick.
* configure.in configure: Bump version.
* runtests.pl: Try again if qsub fails, to work around slow PBS
servers.
* README: Add FAQ entry for running out of sockets. Thanks to
David Golden for this one.
* mpiexec.h util.c: Remember directory name from where mpiexec was
started, if available.
* config.c spawn.c: Add extra 0 argument to resolve_exe call.
* start_tasks.c mpiexec.c: Look for redir helper in same directory
as mpiexec. Thanks to Thomas Zeiser for the suggestion.
* configure.in configure: Bump to pre4.
* get_hosts.c: Remove extra attributes on hostnames added by PBSPro.
* Makefile.in configure.in configure: Refine use of torque pbs-config
a bit; move specific library names out of Makefile.
* get_hosts.c: Accommodate PBSPro TM API difference in node versus
CPU counting.
* mpiexec.spec mpiexec.spec.in: Rename to auto-generate this file.
* configure.in configure: Auto-gen mpiexec.spec. Bump version for
a prerelease.
* Makefile.in: Remove spec version check, auto-generate instead.
* mpiexec.c: Remove bogus hacked-in Version strings.
* redir-helper.c: New file, to work around PBSPro lack of redirection.
* Makefile.in: Compile this new redir-helper code.
* get_hosts.c: Fix exec_host syntax better.
* configure.in configure config.h.in: Make redir-helper a
configure-time option.
* start_tasks.c: Fork mpiexec-redir-helper in front of the actual
executable.
* README: Some documentation.
* mpiexec-redir-helper.1: New file, man page for the new code,
although it is not for users to run directly.
* get_hosts.c: Account for new exec_host syntax in PBSPro, thanks
to Doug Johnson.
* configure.in configure: Add pbs-config check for new torque, from
Garrick Staples.
Switch from CVS to SVN repository.
* spawn.c: New file, to support MPI_Comm_spawn.
* Makefile.in: Compile new file. Remind distributor to fix spec file.
* concurrent.c event.c exedist.c get_hosts.c: Add indirection to
tasks[].status.
* mpiexec.c: Do hostname lookup here instead of in start_tasks(), now
that it can be called multiple times. Indirection for status.
* mpiexec.h: New structure to group tasks by when they were spawned.
Some new functions.
* pmi.c: Support multiple keypair spaces. Handle getbyidx command
needed for spawning.
* start_tasks.c: Look at start and end indexes in spawns[] rather than
going from 0 to numtasks.
* stdio.c: Functions to communicate between stdio listener and parent
to handle spawning new tasks.
* util.c util.h: New handy functions to communicate strings.
* config.c: Export a function.
* mpiexec.c mpiexec.h config.c: Always return a new string in
resolve_exe to simplify usage.
* pmi.c: Use standard list type for keypairs. Handle mcmd syntax
used by spawn. Handle three name publishing commands for MPI2.
Parse spawn command but do not do anything with it yet. Check
for duplicate keys in put command.
* runtests.pl: Reorganize nicely, add comments on tests that may
fail due to bad mpich2 or PBS configuration.
* config.c mpiexec.h: Switch to using standard list.h and get rid
of the single static "working" config_spec_t structure.
* hello.c: Quiet a couple of compile warnings.
* concurrent.c: Fix compilation on rh73, thanks to Chris Samuel.
* stdio.c event.c task.c mpiexec.c mpiexec.h concurrent.c: Introduce a
pipe between the main process and the stdio lister. Adapt some
mechanisms to use it instead of signals.
* gm.c start_tasks.c: Move abort_fd handling into gm instead of
start_tasks for better symmetry.
* ib.c: Rename abort_fd function.
* concurrent.c mpiexec.h: Release client properly after last event.
Return number of clients terminated to main when killing all.
* mpiexec.c: Exit zero if -server and no connected clients when
signalled, as suggested by Martin Schafföner.
* contests.pl: Test this new behavior.
* concurrent.c: Propagate concurrent client return code properly.
* contests.pl: Test this bug, found by Martin Schafföner.
* gm.c: Remove duplicated code bug from nonblock checkin (never
released). Make error and debug lines mention gm and mx.
* README mpiexec.1: Add documentation that GM and MX are similar.
* configure.in configure: Fix spacing, allow mx or gm.
* mpiexec.c: Allow mx as well as gm.
* start_tasks.c: Set env vars for mpich/mx in COMM_MPICH_GM too.
Thanks to Denis Charland for doing the initial port.
* util.c: Add commented-out code to include a timestamp in each
debugging output line.
* event.c: Move start event processing code into generic dispatch.
* start_tasks.c mpiexec.h: Handle start events while spawning, poke
ib service while waiting for all start messages.
* ib.c: Bail if something dies in start event handler while waiting
for accept.
* start_tasks.c mpiexec.h mpiexec.c event.c: Report if tasks exited
before MPI startup was complete.
* ib.c: Be more asynchronous, check more error codes.
* mpiexec.1: Update out-of-date diagnostic.
* util.c util.h: Add error-returning version of read_full.
* start_tasks.c mpiexec.h mpiexec.c: Return any error code from
task startup and kill all if so.
* ib.c: Remove duplicated \n
* task.c: Include progname and redo a bit.
* hello.c: Do not time out MPI startup so quickly.
* task.c: Do not print all hostnames, just the first couple and a
summary.
* ib.c: Use debug() so printfs go to stderr and to centralize
debug level checks.
* concurrent.c: Include header for AIX, thanks to Chris Samuel.
* config.c: Fix off-by-one errors regarding processor assignment.
* runtests.pl: Add -config vs -np tests. Revert debugging code.
* gm.c: Force new socket to be non-blocking.
* ib.c start_tasks.c: Support new startup protocol in mvapich
>= 0.9.5-112. Note new MPIRUN_PROCESSES env var scales poorly.
Last modified: Wed, 06 Jun 2012 14:19:53 -0400