So after implementing hill-climbing, I thought it would be a worthwhile exercise to use Python's profile module and see what the slow parts of the code were. To do this I set-up a basic hill-climb on the 100 city data set and then called the run method of profile:
import tsp import profile coords=tsp.read_coords(file('city100.txt')) init_function=lambda: tsp.init_random_tour(len(coords)) matrix=tsp.cartesian_matrix(coords) objective_function=lambda tour: -tsp.tour_length(matrix,tour) move_operator=tsp.reversed_sections max_iterations=10000 profile.run( 'tsp.run_hillclimb(init_function,move_operator,objective_function,max_iterations)')
This yielded the following (edited for clarity/brevity) output:
122433 function calls in 4.710 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.010 0.010 0.060 0.060 hillclimb.py:1(?) 1 0.190 0.190 4.640 4.640 hillclimb.py:3(hillclimb) 1 0.000 0.000 4.640 4.640 hillclimb.py:36(hillclimb_and_restart) 607 0.730 0.001 1.220 0.002 random.py:252(shuffle) 1 0.000 0.000 0.000 0.000 tsp.py:127(init_random_tour) 1 0.000 0.000 4.700 4.700 tsp.py:132(run_hillclimb) 10106 0.110 0.000 1.350 0.000 tsp.py:29(all_pairs) 10000 0.440 0.000 1.790 0.000 tsp.py:40(reversed_sections) 10000 2.180 0.000 2.470 0.000 tsp.py:82(tour_length)
Now this is pretty close to what I would expect - the tour_length function (our objective function) is taking up most of the time (2.470 seconds out of 4.710). For optimisation problems it is usually very common that the objective function is the most time consuming - which is why when comparing different optimisation algorithms a lot of attention is paid to how many times the objective function must be called.
However there was one unexpectedly expensive call in there: reversed_sections is using 1.790 seconds! Examining the output further shows that most of the time in reversed_sections is spent in all_pairs and most of the time in that function is spent in random.shuffle. So looking at all_pairs:
def all_pairs(size,shuffle=random.shuffle): '''generates all i,j pairs for i,j from 0-size''' r1=range(size) r2=range(size) if shuffle: shuffle(r1) shuffle(r2) for i in r1: for j in r2: yield (i,j)
We see that we call random.shuffle twice at the start of the generator function. However quite often, when hill-climbing, we will not run this generator fully. If we find a better solution we will accept it and move on - creating a new generator. So clearly shuffling two arrays of 100 elements each is overkill, if we don't always use every element.
Instead we can modify all_pairs to only generate the random elements as it needs them:
def rand_seq(size): '''generates values in random order equivalent to using shuffle in random, without generating all values at once''' values=range(size) for i in xrange(size): # pick a random index into remaining values j=i+int(random.random()*(size-i)) # swap the values values[j],values[i]=values[i],values[j] # return the swapped value yield values[i] def all_pairs(size): '''generates all i,j pairs for i,j from 0-size''' for i in rand_seq(size): for j in rand_seq(size): yield (i,j)
Profiling this version gives us:
82205 function calls in 3.970 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.010 0.010 0.050 0.050 hillclimb.py:1(?) 1 0.230 0.230 3.910 3.910 hillclimb.py:3(hillclimb) 1 0.000 0.000 3.910 3.910 hillclimb.py:36(hillclimb_and_restart) 1 0.010 0.010 0.010 0.010 random.py:252(shuffle) 1 0.000 0.000 0.010 0.010 tsp.py:127(init_random_tour) 1 0.000 0.000 3.960 3.960 tsp.py:132(run_hillclimb) 10100 0.150 0.000 0.420 0.000 tsp.py:22(all_pairs) 10000 0.520 0.000 0.950 0.000 tsp.py:40(reversed_sections) 10000 2.300 0.000 2.560 0.000 tsp.py:82(tour_length) 10507 0.190 0.000 0.270 0.000 tsp.py:9(rand_seq)
As can be seen all_pairs (and reversed_sections) takes less of the total running time than before - leaving tour_length as the major bottleneck.
conclusion
The profile module is a pretty handy way for spotting slow points in code that you may not have realised existed. In this case we've been able to speed up code that would not seem like it should be costly - giving us a modest speed boost (~15%) for not much extra effort.
However care is advised, as we didn't address the slowest part of the code. Changes to that would probably yield much larger speed improvements. This is particularly true as our problem size increases. With 500 cities tour_length represents an even higher proportion of the total running time (~90%). As ever:
"premature optimisation is the root of all evil"