Cal Poly Logo
ITS/Grid Computing 
C a l i f o r n i a   P o l y t e c h n i c   S t a t e   U n i v e r s i t y  
Grid Computing Home > Submitting Parallel Jobs



:: Grid Computing ::


Parallel batch job submission:

In this page we are to show another example on how to submit jobs, but now sequentially, and also we will show a more efficient way of request resources.

First let's begin with an example of a serial batch job request. Read the comment section for a detailed explanation:
#!/bin/bash
#
#SERIAL SCRIPT
################################################
# With his line below, we are requesting only a node to run our simple
# job, and from that node we are asking for BOTH CPUs. We are reserving
# a node to run exclusively this job, freeing all other nodes for the scheduler
# and future jobs.
# ##############################################
#PBS -l nodes=1:ppn=2
#
###############################################
# This job will not take more that 30 mins to run
###############################################
#PBS -l walltime=30:00
#
###############################################
# Join the output and the error file into one file
# named 'myOutputFileName.out', and let's name
# this job 'myJobName'.
###############################################
#PBS -j oe
#PBS -o myOutputFileName.out
#PBS -N myJobName
#
##############################################
# How many procs do I have?
##############################################
NN=`cat $PBS_NODEFILE | wc -l`
echo "Processors received = "$NN
#
##############################################
echo "script running on host `hostname`"
#
##############################################
# cd into the directory where I typed qsub
##############################################
cd $PBS_O_WORKDIR
echo
#
##############################################
# Tell me nodes I got.
##############################################
echo "PBS NODE FILE"
cat $PBS_NODEFILE
#
#############################################
# We now want to run the image rendering program called Pov-ray.
# For this job we are only reserving 1 node and both CPUs. Enough resources,
# since each job below will run sequentially on the same host.
# Reserving more nodes, would not improve job performance.
#############################################
/usr/local/bin/povray -V /home/test2/pov/advanced/text2.pov +H480 +W640 +A0.9
/usr/local/bin/povray -V /home/test2/pov/advanced/newdiffract.pov +H480 +W640 +A0.9
#

In our example above, we requested only a node with all available CPUs of that node (2). Since the second Pov-ray call would only run after the first one on that node, the total running time of this batch job is the sum of both jobs combined, not saving any time.

The parallel script:

In our next example, we want to run both instances of Pov-ray concurrently on two different nodes. For that we reserve two nodes, and from each node we ask for both CPUs.
The way we can run both jobs concurrently, is to create two different batch jobs just like we did before. Each script will request its own resources and run on separate nodes. We then use a "parent" script to run both jobs:
#!/bin/bash
#
##########################################
# PARENT SCRIPT
#########################################
#
#########################################
# Parent script starts both siblings.
#########################################
JOBID_1=`qsub job1.sh`
JOBID_2=`qsub job2.sh`
#

These are the child scripts:
#!/bin/bash
#
#########################################
#SIBLING #1 (job1.sh)
#########################################
#
#########################################
# We need one node with both CPU's
#########################################
#PBS -l nodes=1:ppn=2
#PBS -l walltime=30:00
#PBS -j oe
#PBS -o sib1.out
#PBS -N sib1
#
#########################################
# How many procs do I have? Where is it running??
#########################################
NN=`cat $PBS_NODEFILE | wc -l`
echo "Processors received = "$NN
echo "script running on host `hostname`"
#
#########################################
# cd into the directory where I typed qsub and
# print it to the screen.
#########################################
cd $PBS_O_WORKDIR
echo
echo "PBS NODE FILE"
cat $PBS_NODEFILE
#
#########################################
# Now render the first image. This will be treated as a
# single job.
#
/usr/local/bin/povray -V /home/test2/pov/advanced/text2.pov +H480 +W640 +A0.9
#

and:

#!/bin/bash
#
#########################################
#SIBLING #2 (job2.sh)
#########################################
#
#########################################
# We need one node with both CPU's
#########################################
#PBS -l nodes=1:ppn=2
#PBS -l walltime=30:00
#PBS -j oe
#PBS -o sib2.out
#PBS -N sib2
#
#########################################
# How many procs do I have? Where is it running??
#########################################
NN=`cat $PBS_NODEFILE | wc -l`
echo "Processors received = "$NN
echo "script running on host `hostname`"
#
#########################################
# cd into the directory where I typed qsub and
# print it to the screen.
#########################################
cd $PBS_O_WORKDIR
echo
echo "PBS NODE FILE"
cat $PBS_NODEFILE
#
#########################################
# Now render the first image. This will be treated as a
# single job.
#
/usr/local/bin/povray -V /home/test2/pov/advanced/newdiffract.pov +H480 +W640 +A0.9
#

As we can see, the parent script starts both child, and each child asks for its own resources. Consequently, each process will run alongside each other.
That is an improvement but will only help jobs that are unrelated to each other. What if we want to run jobs that depend on previous jobs? Here is the solution to that:

The echoFinish.sh script.

We will introduce a "flag" script, an "indicator" script called echoFinish.sh which does nothing more than to print an echo message. This script is only to run when our previous jobs have finished without an error.
That script can be more useful than just write a plain text message, you can write a script that does something to your output AFTER they successfully run.
#!/bin/bash
#
##########################################
# PARENT SCRIPT WITH A WAIT FLAG
#########################################
#
#########################################
# Parent script starts both siblings and holds their ID's.
# We will use the ID's to monitor their exit status.
#########################################
JOBID_1=`qsub job1.sh`
JOBID_2=`qsub job2.sh`
#
#########################################
# This is the '-W' flag used by 'qsub' that waits for the
# successful termination of previous jobs.
#########################################
qsub -W depend="afterok:${JOBID_1}:${JOBID_2}" echoFinish.sh
#

Don't forget to make the parent script executable, and you WILL NOT be using 'qsub' to run it:

./parent.sh

For more information on any flag used here, check the 'qsub' man page.

 

 

Cal Poly Home | Cal Poly Find It
 
Home | Application Form | Compilers | Quick Start | Submitting Parallel Jobs | Benchmark Tests | Accessing the Cluster | Commercial Softwares | Research



ITS/Grid Computing
California Polytechnic State University
San Luis Obispo, Ca 93407
jburdett@calpoly.edu