CS 551: Synchronization, Election Algorithms

Election Algorithms

Many of the algorithms for distributed systems require some centralization,
some site with leader/coordinator activities.

All other sites need to recognize this leader;
often there is accomplished by an initial agreement/election.

When the site of the leader fails or goes down for some reason,
it is necessary to elect a new leader.

This is the purpose of election or agreement algorithms.

There are two basic criteria for an election/agreement algorithm.

One way to decide the leader is to use some global priority.
The Bully algorithm by Garcia-Molina (1982)
falls into this category.
The second is a more general, preference-based algorithm,
that permits some nodes to have heavier votes.
Chang & Roberts's Token Ring Election algorithm belongs here.

Assumptions for most election algorithms:

A complete topology,
i.e. one message hop between any two processes.
All process ids are unique and known to all other processes.
All communication networks are reliable,
i.e. only communicating processes may fail.
This assures that no messages are
- lost,
- duplicated,
- corrupted.
A recovering process is aware that it failed.
- Failure is reliably detected by setting the time-out interval to be a little larger than the sum of the round-trip message delay and the message processing time.
- A failed process can rely on the coordinator to poll periodically for recovered process so that they may rejoin the pool of processes.

The following pages discuss a few such algorithms:

Yet another algorithm is related to the Byzantine General paper by Lamport et al.

Comments: