The '-o' option to bsub will append the standard output of your job to the specified file. If you would like the file to be overwritten each time you redirect your output to it use the '-oo' option.
To append the standard output of your parallel job to the file 'output.txt'.
$ bsub -q<queue_name> -n<no_of_cores> -o output.txt mpirun -srun ./myjobTop...
To see the all output (stdout) of your job produced till the present time
$ bpeek <jobid>
To see the output (stdout) of your job as it is produced
$ bpeek -f <jobid>Top...
The LSF job scheduler is enabled with a feature called FAIRSHARE. This keeps track of usage of the cluster by each user. It uses this collected information along with the assigned shares to decide which users' job deserves to be launched next. This system will operate only if all the jobs that are contending for the cluster resources are in the queue at the time the scheduling decision is made. This is because once a job has started it may not be preempted by any other (even higher priority) job.
BrieflySubmit all your jobs into the queue even if they are going into the PEND state. This ensures that your job will be taken into account the next time the LSF Fairshare scheduler makes a scheduling decision.
Top...The '-e' option to bsub will append the standard output of your job to the specified file. If you would like the file to be overwritten each time you redirect your error to it use the '-eo' option.
To append the standard error of your parallel job to the file 'output.txt'.
$ bsub -q<queue_name> -n<no_of_cores> -e error.txt mpirun -srun ./myjobTop...
If a certain node is failing or having other issues when your job is launched on it, report it to the system administrator. Until the node is removed from service, you may manually prevent your jobs from launching on that particular node(s) by using the '-ext' option of bsub.
eg. To prevent your job from launching on node n78 add the following option to your bsub command line
-ext 'SLURM[exclude=n78]'For more than one node
-ext 'SLURM[exclude=n[78,31,22]'Your bsub command line will look like
$bsub -q<queue_name> -n<no_of_cores> -ext 'SLURM[exclude=<nodelist>]' ./myjobTop...