A Primer on Extremal Optimization

A Primer on Extremal Optimization

Sometimes, in the course of coding and problem solving, it becomes necessary for a programmer to find the best possible solution to an intractably difficult problem. We have limited resources, or a specific set of constraints, and need to find the best solution possible that satisfies our requirements. Today we take a deep breath and a shallow dive into the wide and wonderful world of optimization.


What’s optimization for?

You’ve got a complicated problem to solve and you know it’ll have a complex solution; we’re not talking arithmetic here, or a problem you’d solve with a script or two. This is something like “What’s the most efficient way for a delivery driver to deliver his packages?” or “How can I assign a bunch of nurses to various hospital shifts and make everybody happy?” or “What’s the best way to divide this social network into two groups?” These are problems with any number of reasonable solutions, but few optimal ones. It’s hard to find the best possible solution, but we know there must be one.


Is my problem suitable for optimization?

Optimization might be worth exploring if the following are true:

  1. You can easily put together a solution to the problem, even if it’s not necessarily a good one.
  2. You think “I could brute force this, if only it wouldn’t take forever and a day.”
  3. You need the best solution you can get, but it doesn’t need to be absolutely perfect.
  4. You want to look cool, and feel cool doing it.


What do optimization algorithms do?

Optimization algorithms are best understood as simple search algorithms. We visualize a search space, the field of all possible solutions. This space could be vast (and if we need an optimization algorithm, it is), so it becomes clear that finding a good solution here requires some smart navigation as we search. That “smart navigation” is the essence of an optimization algorithm. We need a place within the search space to start, a way to move through it from one possible solution to the next, and we need to be able to figure out whether we’re in a good place or a bad place.

This method for determining if a given solution is “good” is called the fitness function: it accepts a solution as input, and outputs the the fitness of the solution relative to its peers. This is often the most difficult part of composing an optimization algorithm, and is one of a few difficulties we need to address.

Fitness Pizza for the Win



These vast search spaces are typically characterized by many local optima — peaks, hills, and valleys in the fitness as it changes across the space.


Here we see many local optima in a 2D search space, where fitness is shown vertically and by color. Image by Kevin Binz.
Local Optima


Local optima are solutions that are better or worse than all the other solutions around them, and prove to be extremely problematic. Since the space is so huge, it’s very difficult to know whether or not we’ve found the best possible solution. We can only compare it to other solutions nearby, so our algorithm might assume it’s found the best possible solution (a global optimum) when it’s really only found a solution better than similar alternatives (a local optimum). Once you’ve reached the top of a hill, there’s no way to go “up”; we’re trapped, and don’t know where to go from there.

This is ordinarily solved by dividing our optimization process into two phases: explore and exploit. We first explore the space, finding promising areas but never committing to a single one. Our explorer (or explorers) hop around the space, trying many different types of solutions and making note of interesting areas. When we enter the exploit phase, we start in the most promising areas and find their local optima. These still might not be global optima, but they’ll probably be pretty close.

Before we even have to deal with local optima, we encounter the problem of writing our fitness function. Beyond actually coming up with a way to define “fitness” for our problem, we also need a good way to represent solutions. Do we have an array of nurses, and call a solution an array of shift assignments? Or is it better to have a collection of shifts with all nurses assigned to each shift?

Our representation needs to make movement through the space easy. Since we’ll move by changing one solution into another, performing such a change shouldn’t send us clear to the other side of the search space. If it does, we won’t be able to exploit promising areas, and aren’t doing much more than random guessing.

Ultimately, our representation just needs to be easy to work with, and easy to modify. It’ll be gravy so long as it helps us move around the search space without hopping around too much.

At this point, we have a problem to optimize, the search space as a useful metaphor, and an awareness of potential difficulties. We need an algorithm that can handle these difficulties, is easy to implement, and easy to fix.


Extremal Optimization

Like many other optimization algorithms, Extremal Optimization uses a biological metaphor to find its way. We pretend each variable of a solution is a separate species in an ecosystem, and iteratively kill off the weakest species. Sometimes, we kill the the nearly-weakest species, or kill everything close to the weakest species.  In this way, the whole solution changes in a small way at each iteration and moves progressively closer to an optimum.

We can see this through a simple visualization; below, each vertical bar represents a species, and the bar’s height represents its’ fitness. Each species/bar is initialized with a random fitness/height. When you hit “start/stop”, you’ll see the least fit species iteratively replaced with a random species with a new fitness. Over time, the fitness of all species grows.

See the Pen Bak-Sneppen SOC by Travis S (@tstodter) on CodePen.


The trick, and neatest bit of Extremal Optimization is that it benefits from self-organized criticality. In essence, as we kill and reset the weakest parts of our solution, the solution becomes more and more unstable. It eventually collapses, its fitness dropping sharply, and we see it jump to another part of the solution space. This behavior makes it handy for us. It combines both the exploration and exploitation phases, and does this without any provocation or input from us. This is self-organization doing the work for us, and means we have less to worry about.

The essence of the procedure is as follows:

  • Initialize our solution S as any random solution in the search space.
  • While we wish to continue searching, repeat:
    • Apply our fitness function to each component of S, finding the least fit component.
    • Reset the least fit component to a random value.
    • If S is now the best solution we’ve found, make a note of it.

There is a small difference here from what we discussed previously; in Extremal Optimization, the fitness function is applied to each variable of the solution separately, and not to the entire solution at once. This doesn’t generally make it any easier or harder to work with, but does predispose the algorithm towards problems with many discrete components.

As an example, we can easily apply it to a graph partitioning problem, specifically graph bipartitioning. We have a graph with some nodes connected by edges, and we want to separate the nodes into two groups while minimizing the number of edges between groups. Our representation of a solution will simply be a list of nodes, with each node assigned a partition. Our nodes are then a solution’s components/variables, so our fitness function will be a node’s number of good edges divided by all its edges — that is, what proportion of a node’s neighbors are in its partition. If all its neigbors are in its partition, its fitness will be 1; if none of its neighbors are in its partition, its fitness will be 0.

On every iteration, we find the least-fit node (the one most mismatched to its neighbors) and swap its partition with a random node of the opposite partition. The best graph cutsize (the total number of edges in the graph crossing partitions, with one end in partition 1 and the other in partition 2) yet found will gradually shrink.

See the Pen Extremal Optimization, Naive by Travis S (@tstodter) on CodePen.


You’ll notice that it eventually finds a partition, and seems to get stuck at that local optimum. This isn’t performing much of the self-organizing exploration that was promised. We could add a bit more of an exploratory bent by also resetting neighbors of the least-fit node. This reflects interdependent species in an ecosystem: when you get rid of mosquitoes, birds become somewhat less fit for their environment.

In addition to the least-fit node in the graph, we reset the least-fit node in the opposite partition from it (instead of a random node in the opposite partition), and two other nodes randomly selected from their neighbors, one from each partition.

See the Pen Extremal Optimization w/ Neighbors by Travis S (@tstodter) on CodePen.


This seems to help, but may inadvertently lead us away from good solutions before we get to them — too much exploration and not enough exploitation. We could instead stick to two nodes swapped, but make a probabilistic choice instead of a deterministic choice. Instead of resetting the absolute least-fit node, we select a node to reset based on its fitness. The least-fit is still the most likely to be chosen, but we’ll occasionally choose the 2nd or 5th or 31st least-fit node instead.

Here, we setup these probabilities and choose a node from each partition to swap:

See the Pen Probabilistic Extremal Optimization by Travis S (@tstodter) on CodePen.


What next?

We have now a simple but adaptive algorithm for finding optimal solutions to hard problems. It’s not the best method available and it won’t give us the absolute best result, but it’s enough to be going on with. We can easily modify Extremal Optimization to suit our purpose and it will gamely adapt for us, exploring and exploiting through the search space until we have what we need.

The next step would be to find a juicy NP-Hard problem (consider the Travelling Salesman Problem as a classic start) and try applying Extremal Optimization.  It will likely work well, but have some marked deficiencies: too much exploitation and not enough exploration, poor convergence on optima, or perhaps difficulty with adequately representing solutions. This is where more robust algorithms should be investigated and implemented, or perhaps where improvements to Extremal Optimization could be made.

Optimization is not suitable for every problem. Sometimes we need an exact and optimal solution, or can make assumptions that drastically shrink the search space. When that’s not an option, and “good enough” is good enough, a solid optimization algorithm can lead us to the promised land.