Continuing with Programming Collection Intelligence (PCI) the next exercise was using the distance scores to pigeonhole a list of blogs based on the words used within the relevant blog.
I had already found Encog as the framework for the AI / Machine learning algorithms, for this exercise I needed an RSS reader and a HTML parser.
The 2 libraries I ended up using were:
ROME
JSoup
For general other utilities and collection manipulations I used:
Google Guava
I kept the list of blogs short, included some of the software bloggers I follow, just to make testing quick, had to alter the %'s a little from the implementation in (PCI), but still got the desired result.
Blogs Used:
http://blog.guykawasaki.com/index.rdf
http://blog.outer-court.com/rss.xml
http://flagrantdisregard.com/index.php/feed/
http://gizmodo.com/index.xml
http://googleblog.blogspot.com/rss.xml
http://radar.oreilly.com/index.rdf
http://www.wired.com/rss/index.xml
http://feeds.feedburner.com/codinghorror
http://feeds.feedburner.com/joelonsoftware
http://martinfowler.com/feed.atom
http://www.briandupreez.net/feeds/posts/default
For the implementation I just went with a main class and a reader class:
Main:
The Results:
*** Cluster 1 ***
[http://www.briandupreez.net/feeds/posts/default]
*** Cluster 2 ***
[http://blog.guykawasaki.com/index.rdf]
[http://radar.oreilly.com/index.rdf]
[http://googleblog.blogspot.com/rss.xml]
[http://blog.outer-court.com/rss.xml]
[http://gizmodo.com/index.xml]
[http://flagrantdisregard.com/index.php/feed/]
[http://www.wired.com/rss/index.xml]
*** Cluster 3 ***
[http://feeds.feedburner.com/joelonsoftware]
[http://feeds.feedburner.com/codinghorror]
[http://martinfowler.com/feed.atom]
Sunday, June 16, 2013
Blog Categorisation using Encog, ROME, JSoup and Google Guava
Labels:
AI,
Encog,
Google,
Java,
Machine Learning
Subscribe to:
Post Comments (Atom)
Popular Posts
-
I have recently been slacking on content on my blog, between long stressful hours at work and to the wonderful toy that is an iPhone, I have...
-
I make no claim to be a "computer scientist" or a software "engineer", those titles alone can spark some debate, I regar...
-
I saw an article (well more of a rant) the other day, by Rob Williams Brain Drain in enterprise Dev . I have to say, I do agree with some o...
-
This series of posts will be about me getting to grips with JBoss Drools . The reasoning behind it is: SAP bought out my company's curre...
-
I recently finished 97 Things every programmer should know . Well to be completely honest I did skim over a couple of the 97, but all and al...

This message is perfect.
ReplyDelete