There is no limit to the number of jobs you can submit. If there are no resources to run jobs, they will simply wait in the queue until resources are made available.
There are a number of reasons jobs might be running:
• Something happens after the executable begins to run • Has a badly formatted executable • Uses too much memory • Has a badly formatted executable • Has a badly formatted command file • Asked for too many resources.
You can see your job history with the condor_history command. Like so:
That's the first place to start.
There is also the log file output, That gives some very basic info about job statuses.
Analyse a specific job and show the reason why it is in its current state.
condor_q better <jobid>
Show only jobs in the "on hold" state and the reason for that. Held jobs are those that got an error so they could not finish. An action from the user is expected to solve the problem.
To see extra detail about what is happening with a job.