Monday, July 22, 2013

Optimization techniques - Simulated Annealing & Genetic Algorithms

Chapter 5 of Programming Collective Intelligence (PCI) deals with optimisation problems.

To Quote:
"Optimisation finds the best solution to a problem by trying many different solution and scoring them to determine their quality. Optimisation is typically used in cases where there are too many possible solutions to try them all"

Before embarking on this chapter I decided that it would be best to quickly learn Python, there just seems to be a lot of Python around as soon as you start learning and reading about machine learning and data analysis, it can't actually be ignored.
(Still not sure why this is the case, but set out to get up an running with Python, in 1 weekend)

Some of the resources I used:
http://www.python.org
http://www.stavros.io/tutorials/python/
http://www.diveintopython.net/index.html
http://docs.python-guide.org/en/latest/

As a developer, learning the basics of Python really isn't very difficult,  to be honest it probably took me longer to find an development environment I was happy with, consoles and text editors just don't do it for me.

The main ones I investigated were:
Ninja IDE (Free)
Eclipse + PyDev (Free)
PyCharm ($99)

I spent quite a bit of time playing with Ninja IDE and Eclipse, but there were just little things that kept bugging me, from strange short cuts to highlighting correct code / syntax as incorrect.

10 minutes after installing PyCharm, I was sold. To be fair, I am probably not the best person to judge.
I code in IntelliJ daily and actually ended up converting all the java developers in my department to drop Eclipse and start using IntelliJ.... I also did the majority of my Objective-C work in AppCode, in other words... I am a JetBrains fanboy, happy to hand over my money for an awesome tool.

Getting back to PCI, where were a couple issues with the code in this chapter, which caused me (a person that just learnt Python) a little bit of pain, 'cause I figured the code had to be right and I was just doing something wrong, eventually I went searching and found:

PCI Errata

With that I corrected the issues in the hillclimb and genetic algorithm functions:

The java implementation for the 3 functions ended up twice as long and looking like:

And unlike my previous posts on PCI, I didn't use a whole bunch of open source libraries, only added one.
Java Tuples.

The whole Chapter 5 Optimisation solution is in my Blog Github repo, the concepts used in both the Simulated Annealing and Genetic Algorithm could easily be adapted and used again if looking for a simple example of those concepts.

Now for Chapter 6 ... Document Filtering...

2 comments:

Popular Posts

Followers