Schedulers – some notes

Networking: csfq, wfq,drr, wf2q, sfq , Proportinal Fair, I-CSDPS

OS: Linux cfs, proportional sharing, lottery

Datacenters/GP Cluster: Hadoop ecosystem(Presto-Cloudera-YARN-SPARK) fair sched, capacity sched, QuincyLSF , condor (Ignore MPI folks)
Scalability. (response time, number of machines)
Flexibility (heterogeneous mix of jobs)
Isolation – Fault isolation, Resource Isolation
Utilization(Achieve high cluster resource utilization. e.g., cpu utilization, memory utilization)  – Balance the hosts  – Meet the constraints of host

Service or Batch Jobs?

** who|process dominates, which resource to give priority to
** how to catch cheaters?
** How do you pre-empt
** In case of multiple schedulers – make them aware of each other – shared state to avoid one scheduler/workload dominating


Global Scheduler needs to have state
Policies + resource availability
How much do we know about job/tasks
Job requirements (throughput, response time? ,availability)
Job Plan (Dag of tasks or what?,  I/O needs, User affinity )
Estimates of duaration?, Input Size? , TX
Single vs Multiple scheduler agents + cluster state replicated into the nodes
Monlothic Platform LSF, Maui, Moab (HPC community)
Multi-step – Partition resources or dynamically allocate them (Mesos NSDI 2011) – can reject the offer
*** How long job has to wait
*** Fair sharing ()
Partition and resolve dependencies (Omega EuroSys 2013)
Issue – Upgrade/patching of scheduler or substrate



Tetris –

Click to access tetris_sigcomm14.pdf

Corona –
( fair-share scheduling)
“This scheduler is able to provide better fairness guarantees because it has access to the full snapshot of the cluster and jobs when making scheduling decisions. It also provides better support for multi-tenant usage by providing the ability to group the scheduler pools into pool groups. A pool group can be assigned to a team that can then in turn manage the pools within its pool group. The pool group concept gives every team fine-grained control over their assigned resource allocation.”

Docker (Fleet/Citadel/Sampi/Mesos)
sampi –
citadel –

Docker Swarm (CPU|RAM vs random !) (what are the constraints – same storage, same area, some tags?)

Mesos (
containerization –
Filtering (nw |fs – )
**** MesosContainerizerProcess::isolate
(strings::contains(isolation, “cgroups”) ||
strings::contains(isolation, “network/port_mapping”) ||
strings::contains(isolation, “filesystem/shared”) ||
strings::contains(isolation, “namespaces”))
**** process::reap
**** executorEnvironment (
Isolation –
(Fair scheduler dependent on the co-ordination)
* Mesos/Yarn resource managers have a master-slave architecture.  IS it me or they both have have adopted an SMPD MPI rank-x style job control ? Sort of Push (Mesos) and Pull (Yarn ) model.

Slurm – seems to be deployed for v. large clusters. – – the interactivity is pretty cool
Torque – (

Profiling –

Yarn – Apache Hadoop YARN: Yet Another Resource Negotiator. In Proceedings of SoCC, 2013.
Condor –
Quincy –
Vector Bin Packing –
Proactive-Inria –
Omega –
Clustera –
Dryad –

Click to access eurosys07.pdf

REservation based scheduling –
X-Flex – Alternative to DRF –
Stoica –
WF2Q –
VTRR – (O(1) in less than 100 lines of code? )
– Order the clients in the run queue from largest to smallest share
Lottery – (randomized – based on drawl of ticket for the client)
Pegasus –
WFQ – Clients are ordered in a queue sorted from smallest to largest Virtual finish time

Encyclopedia –
Time stamped Scheduler – comparision –

VMware – (gang or co-scheduling)
– related –
HyperV –
Xen –


Schedulers – some notes