edit · history · print

About SGE

The Sun Grid Engine (http://gridengine.sunsource.net/) is a free, open-source and complete distributed computing tool that may be used to build "computing farm" infrastructure. SGE provides a single entry point for jobs submission, and schedule (or dispatch) them to the best execution hosts on the network according to jobs policies defined by the grid administrator.

Distmake provides some capabilities to interface with SGE. The main idea here is to let the grid engine software to provide the build hosts list, instead of manually defining it in the .bldhosts file. The advantages of this integration include:

  • SGE will choose the best (less loaded or most powerful) build servers and provide them to the distmake job.
  • SGE will not start the build process if all available CPUs are busy (can be configured by the grid administrator). This prevents the build servers to become overloaded and give maximum performance to running jobs.
  • SGE provides accounting and statistics information.
  • SGE may be used to turn end-user workstations into build machines when they are idle (lunch time or during the night).
  • ... any many more...

Configuring SGE

You need to define a SGE parallel environment for distmake (see the Sun Grid Engine manual for details).

E.g.:

pe_name           distmake
slots             999
user_lists        NONE
xuser_lists       NONE
start_proc_args   /bin/true
stop_proc_args    /bin/true
allocation_rule   $round_robin
control_slaves    FALSE
job_is_first_task FALSE
urgency_slots     min

... and link this parallel environment to one or several execution queues.

E.g.:

hostlist              @distmake
seq_no                0
load_thresholds       np_load_avg=4.75
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               distmake
rerun                 FALSE
slots                 2
tmpdir                /tmp
shell                 /bin/csh
prolog                NONE
epilog                NONE
shell_start_mode      posix_compliant
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:60
owner_list            NONE
user_lists            users_list
xuser_lists           NONE
subordinate_list      NONE
complex_values        NONE
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  INFINITY
h_rt                  INFINITY
s_cpu                 INFINITY
h_cpu                 INFINITY
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY

Many options are left intact in this example. See the SGE Manual for detailed help.

Using distmake with SGE

Use 'qrsh -pe <parallel environment name>' to submit a distmake job to the grid :


qrsh -V -cwd -pe distmake 5-10 distmake <other make's options here>


The parameter 5-10 tells the SGE scheduler that this job requires between 5 and 10 CPUs. If less CPUs are available, the job will be suspended until all the requirements are met.

NB: You do not have to specify the -J <n> parameter; distmake will automatically determine this value from the build hosts list.

edit · history · print
Page last modified on March 15, 2006, at 10:49 AM