Google Summer of Code Proposal: Scheduling Algorithms for Swift
Abstract:
Swift is a tool-kit for managing large scale work-flows on grids with 100's of processors. Currently, its scheduler naively picks the when and where for each job in the task-graph. This is usually not optimal. The aim of this project is to increase performance of Swift grids via a few key implementations: improved execution order through dependence analysis, stagein and stageout reordering, better job throughput metrics, and improved data locality through data-set aware site selection.
Content:
Please answer the following questions. Make sure you have read and understood the note at the end of this application (regarding contributor licenses).
1. Provide a 1-2 paragraph summary of the project you propose to do over the summer.
The execution engine for Swift has a simple scheduling algorithm that naively picks the when and where for each job in the task-graph. Sometimes this works well, but most of the time it is not optimal, and things would be more efficient if the scheduler were 'smarter'. The aim of this project is to give brains to the brawn of Swift grids via a few key implementations: improved execution order motivated by dependencies, reordering of stageins and stageouts, better metrics for job throughput, and improved data locality through dat-aset aware site selection.
I propose to research an area of parallel scheduling that uses graph vertex coloring to determine which tasks can be run in parallel, in order to implement the execution reordering. In addition to that, I'd like to implement a few more metrics to aid in site selection and load balancing, namely: cumulative que time, cumulative runtime, idle time, jobs finished per site per app, as well as a count of the total number of jobs finished per site. Starvation will definitely be a concern with this scheduler, and thus I plan to implement another metric for determining scheduler side queue times so that jobs that have been in line for a long tme get higher priority. The development of heuristics for determining site selection on the basis of data locality will essentially give the scheduler a sense of whether it is better to send necessary data to a node that doesn't have it yet or to wait for a heavily loaded site that already has the data to free up. These anticipated improvements will then be evaluated for performance gain against relative overhead for several work-flows.
2. What Globus project (see list in http://dev.globus.org/) does your GSoC project most closely relate to?
My proposal directly relates to the Globus project called Swift.
3. Have you contacted a Globus mentor about this project proposal? If so, who?
I have discussed my ideas with Ben Clifford, who is a Globus mentor.
4. What languages, libraries, toolkits, etc. will you use for this project? If part of the project will require researching technologies to decide which one is better suited, just say so (do mention what technologies you will be looking at, if you already know this)
I'll be writing a new scheduler package for the Swift toolkit which is written in Java. I'll also be looking for a good library for manipulating directed and undirected graphs, specifically for one that does vertex coloring efficiently. During development I'll use Dot from the Graphviz toolkit to aid in visualizing parts of the task graph which will be fed into my scheduler.
5. What would be the main deliverables for your project? Please include a rough timeline for these deliverables. We are not asking you to commit to specific dates right now, and you can certainly tweak the deliverables later on (in fact, we expect you will do so as you interact more with your mentor and the Globus community). However, please give us an approximate idea of what you expect to produce throughout the summer.
I plan to have the code for reporting the metrics written & tested, as well as a skeleton for the final scheduler by June 26th. I intend to complete implementation and some testing of the scheduler from June 26th to August 7th, with some final testing, writeup and documentation cleanup to happen in the final week before August 18th.
6. What are your qualifications for this project? Please let us know what previous experience you have with the technologies you listed in question (3). Take into account that having limited knowledge on the Globus Toolkit does not disqualify you from participating; GSoC is as much about learning as it is about writing code, and you will have until the summer to get up to speed.
Through my studies at Colorado State University, as well as my Research Assistantship under the supervision of Dr. Michelle Strout, I have become very familiar with various aspects of the world of High Performance Computing (ie. loop tiling, polyhedral model). Although my experience with the Swift toolkit is very limited, the syntax of Swift code seems very intuitive. Due to the fact that Swift's code-base is largely written in Java, understanding the inner workings of Swift should also be fairly straightforward because Java is my language of coice.
7. If you have little or no experience with Globus technologies, or any other technology involved in your project, will you be able to use the "Community Bonding Period" (April 20 - May 23) to get up to speed?
I plan to use the “Community Bonding Period” to become familiar with the build environment for swift, and to find benchmark computations that reflect typical jobs in swift. Later on in the summer I can use these benchmarks to test ideas for improvements to the scheduling algorithm. In particular, I will be looking for problems with varying degrees of average parallelism and data transfer between nodes. I will also look for examples where differing amounts of the task graph are visible to the scheduler.
8. Will you have any other commitments during the summer? In particular, let us know if your school year ends later than May 23 (i.e., if you will still be doing final exams when GSoC starts) and if you are already commited to another job (an internship, a teaching/research assistantship at your university, etc.). This does not disqualify you from participating but you have to be upfront about how much time you'll be able to spend on your GSoC project.
I do not yet have any other commitments for the summer.
9. If you want to provide any additional details about your project, please do so here:
I don't have any additional details at the moment.
NOTE: Globus Contributor's License
If you submit an application to GSoC with Globus as your mentoring organization, please remember that our code is licensed under the Apache 2 License. This means that, for your code to be included in the Globus Toolkit, you can only reuse existing code that is licensed under the Apache 2 License, or a compatible license (most notably, GPL'd code is ineligible for inclusion in the Globus Toolkit). Additionally, if you are accepted to Globus Summer of Code, you will be have to sign an Individual Contributor's License (otherwise, your code cannot be committed to our repository). Make sure you read this license. In a nutshell, by signing the license you authorize Globus to distribute any code you contribute. However, it is not a copyright transfer form (you will still hold the copyright over any code you write).
|