Sometimes a condor job will return an error beginning with:
condor_exec.exe: error while loading shared libraries
This can happen because the software you need is not installed on the execute node, in which case you should exclude that execute node in your submit file and let us know about the problem. You can safely assume that is the problem if you're still getting the error after following the steps outlined below.
More often, the software is on the execute node but is not in its default library search path. All third-party software is installed in /opt, which is a shared network filesystem common to all execute nodes as well as some other machines, including hooke. Thus if the program works in one place, it should work everywhere IF the default library paths are the same. Sometimes they aren't. We try to keep the nodes consistent, but there are over 100 of them in many different small clusters, and sometimes they diverge.
Fortunately, the problem can be easily remedied in the user's shell environment. The search path will add the paths in the environment variable LD_LIBRARY_PATH.
Here is an example using Gurobi 4.5:
condor_exec.exe: error while loading shared libraries: libgurobi45.so: cannot open shared object file: No such file or directory
The problem here is most likely that the execute node doesn't know where to find the shared library file it needs, libgurobi45.so. It should be able to, if wren could, but sometimes that doesn't happen even though the software is actually there.
Fortunately, this is a very easy fix. We only need to find the location of libgurobi45.so, and add that directory path to our environment on the execute node.
The library should be in a fairly obvious place. All special third-party applications are installed in /opt, and most libraries are in a "lib" or "lib64" subdirectory somewhere in their application's directory tree. If we do an "ls" in /opt we find that Gurobi 4.5 is in /opt/gurobi450. It's quickly obvious from there that the most likely library path is /opt/gurobi450/linux64/lib, and an "ls" on that directory does in fact show that it contains libgurobi45.so.
Alternatively, we could use the "find" command, like this:
wren:~ $ cd /opt/gurobi/latest wren:/opt/gurobi/latest $ find . -name libgurobi45.so 2>/dev/null ./linux64/lib/libgurobi45.so
Once we know the directory that contains the needed library, we need to add it to the environment variable LD_LIBRARY_PATH on the execute node. This in can be done in two ways:
1. Add it to your login profile (usually via .profile, .bashrc, or .bash_profile) and ensure "getenv = True" is in your submit file so that the execute node uses your login environment. The syntax in your login profile is:
This is usually the best method.
2. If for some reason you don't want your entire login environment exported on the submit node, you could add the needed environment variable to the submit file explicitly: