Chapter 8 of Programming Collective Intelligence (PCI) explains the usage and implementation of the k-Nearest Neighbours algorithm. (k-NN).
k-NN is a classification algorithm that uses (k) for the number of neighbours to determine what class an item will belong to. To determine the neighbours to be used the algorithm uses a distance / similarity score function, in this example (Euclidian Distance).
PCI takes it a little further to help with accuracy in some scenarios. This includes the usage of a weighted average of the neighbours, as well as then using either simulated annealing or genetic algorithms to determine the best weights, building on Optimization techniques - Simulated Annealing & Genetic Algorithms
As with all the previous chapters the code is in my github repository.
So the similarity score function looked like (slightly different to the one used earlier, which was inverted to return 1 if equals):
The simulated annealing and genetic algorithm code I updated as I originally implemented them using Ints... (lesson learnt when doing anything it do with ML or AI, stick to doubles).
Then finally putting it all together my Java implementation of the PCI example
While reading up some more on k-NN I also stumbled upon the following blog posts
First one describing some of the difficulties around using k-NN.
k-Nearest Neighbors - dangerously simple
And then one giving a great overview of k-NN
A detailed introduction to k-NN algorithm
I make no claim to be a "computer scientist" or a software "engineer", those titles alone can spark some debate, I regar...
I have recently been slacking on content on my blog, between long stressful hours at work and to the wonderful toy that is an iPhone, I have...
I saw an article (well more of a rant) the other day, by Rob Williams Brain Drain in enterprise Dev . I have to say, I do agree with some o...
This series of posts will be about me getting to grips with JBoss Drools . The reasoning behind it is: SAP bought out my company's curre...
I recently finished 97 Things every programmer should know . Well to be completely honest I did skim over a couple of the 97, but all and al...