Saturday, July 19, 2014

TDD, Hamcrest, Shazamcrest

Recently we have started to try get a more TDD culture started at work, having always believed in thorough testing and decent code coverage it shouldn't have been too hard. However... teaching a old dog new tricks can sometimes require quite a bit of patience. Turns out breaking coding habits formulated of more than a decade of keyboard bashing is harder than it seems.

So with generating an enormous amount of test code, comes the usual task code & test maintenance and reuse.
One of the tools / libraries we have included is Hamcrest, which not only improves the readability of assertion failures, but allows you to create and extend custom matchers, which you can then reuse across multiple test scenarios.

I am not going to go into too much detail on Hamcrest here, where are a bunch of great resources / blogs / tutorials out there.. just a few:

While creating a custom type safe matcher for one of our domain objects, I realised that was insane.. really.. this.getA == that.getA... mmmm no.
So I went searching for something could help and and after a bit, I found: Shazamcrest (bonus points for the name)
What Shazamcrest does is:
Serialize the objects to compare.
Compares them and then on fail throws a ComparisonFailure, which the major IDE's allow you use their build in diff display.

Great... no manual bean compares.
So I add the maven dependency, try it out on our complex domain object....
StackOverflowError.... It was a known limitation at the time. The json provider Shazamcrest was using: 
GSON does not cater for circular reference serialization.

As both Shazamcrest and GSON being opensource, I decided to have a look and see if I could contribute, anything is better that writing a manual bean matcher. After some investigation I found that the guys on the GSON project have created a fix GraphAdapterBuilder, it is just not distributed with the actual library.

So after fork on the Shazamcrest GitHub project, a little bit of code and submitting a pull request:

The guys on the Shazamcrest project very quickly merged my changes in and published a new version to the maven repo (Thanks for that). 
So be sure to use the 0.8 version if you are struggling with circular references.

Monday, May 26, 2014

Playing with Java 8 - Lambdas, Paths and Files

I needed to read a whole bunch of files recently and instead of just grabbing my old that I and probably most developers have and then copy from project to project, I decided to have quick look at how else to do it...
Yes, I know there is Commons IO and Google IO, why would I even bother?  They probably do it better, but I wanted to check out the NIO jdk classes and play with lambdas aswell.. and to be honest, I think this actually ended up being a very neat bit of code.

So I had a specific use case:
I wanted to read all the source files from a whole directory tree, line by line.

What this code does, it uses Files.walk to recursively get all the paths from the starting point, it creates a stream, which I then filter to only files that end with the required extension. For each of those files, I use Files.lines to create a stream of Strings, one per line. I trim that, filter out the empty ones and add them to the return collection.
All very concise thanks to the new constructs.

Saturday, April 26, 2014

Playing with Java 8 - Lambdas and Concurrency

So Java 8 was released a while back, with a ton of features and changes. All us Java zealots have been waiting for this for ages, all the way back to from when they originally announced all the great features that will be in Java 7, which ended up being pulled.

I have just recently had the time to actually start giving it a real look, I updated my home projects to 8 and I have to say I am generally quite happy with what we got. The java.time API the "mimics" JodaTime is a big improvement, the package is going useful, lambdas are going to change our coding style, which might take a bit of getting used to and with those changes... the quote, "With great power comes great responsibility" rings true, I sense there may be some interesting times in our future, as is quite easy to write some hard to decipher code. As an example debugging the code I wrote below would be "fun"...

The file example is on my Github blog repo

What this example does is simple, run couple threads, do some work concurrently, then wait for them all to complete. I figured while I am playing with Java 8, let me go for it fully...
Here's what I came up with:

0 [pool-1-thread-1] Starting: StringInputTask{taskName='Task 1'}
0 [pool-1-thread-5] Starting: StringInputTask{taskName='Task 5'}
0 [pool-1-thread-2] Starting: StringInputTask{taskName='Task 2'}
2 [pool-1-thread-4] Starting: StringInputTask{taskName='Task 4'}
2 [pool-1-thread-3] Starting: StringInputTask{taskName='Task 3'}
3003 [pool-1-thread-5] Done: Task 5
3004 [pool-1-thread-3] Done: Task 3
3003 [pool-1-thread-1] Done: Task 1
3003 [pool-1-thread-4] Done: Task 4
3003 [pool-1-thread-2] Done: Task 2
3007 [Thread-0] WaitingFuturesRunner  - complete... adding results

Some of the useful articles / links I found and read while doing this:

Oracle: Lambda Tutorial
IBM: Java 8 Concurrency
Tomasz Nurkiewicz : Definitive Guide to CompletableFuture

Sunday, February 16, 2014

Local Wikipedia with Solr and Spring Data

Continuing with my little AI / Machine Learning research project... I wanted to have a decent sized repo of English text, that was not in a complete mess like a large percentage of data on the internet.  I figured I would try Wikipedia, but what to do with about 40Gb of XML? how do I work / query with all that data. I figured based on recent work implementation where we load something like 200 000 000 records on into a Solr cache, Solr would be the way to go, so the is an example of my basic implementation.

Required for this example:

Wikipedia download (warning it is a 9.9Gb file, extracts to about 42Gb)
Spring Data (Great Blog / Examples on Spring Data:  Petri Kainulainen's blog)

All the code and unit test for this post is on my blog GitHub Repo

When setting up Solr from scratch, you can have a look at Solr's wiki or documentation, their documentation is pretty good. There is also an example of importing Wikipedia here, I started with that and made some minor modifications.

For this specific example the Solr config needed (/conf):
For this example (and in the below config files),
Solr home: /Development/Solr
Index / Data: /Development/Data/solr_data/wikipedia
Import File: /Development/Data/enwiki-latest-pages-articles.xml

The full import into Solr took about 48 hours on my old 2011 i5 iMac and the index on my current setup is about 52Gb.

Data Config for the import:


Solr Config:

The code for this ended up being quite clean, Spring Data - Solr, gives 2 main interfaces SolrIndexService, and SolrCrudRespository, you simply extend / implement these 2, wrap that in a single interface, autowire from a Spring Java context and you good to go.





Next thing for me to look at for sourcing data is Spring Social.

Sunday, January 12, 2014

BYG (Bing, Yahoo, Google) Search Wrapper

One small section of my Aria project will be to interface with the current search engines out there. To do this I will require a module that will give me a consistent interface to work with the 3 main providers; Bing, Yahoo! and Google. (and any future ones I may want to add). This is a basic example or that module.

First thing required is to set up accounts / projects and the like with the relevant providers.
I won't describe this process as they were all pretty well documented.

Bing Developer Center
Yahoo Developer Network
Google Developers Console

A couple tips for the above sites.

  • Bing: Setup both the web and synonym searches.
  • Yahoo: In the BOSS console, under manage account, put in a daily limit $ amount (or turn of limit), as they only allow 1 free query a day... so only the first request works.
  • Google: It doesn't seem that you can set it up to search the whole web, but after creating your custom search engine, you can select  "Search the entire web but emphasize included sites" so don't worry about that.

All these providers allow for many options while searching ( e.g. images, location, news, video etc.) , however in this initial example I have limited it to just a pure and simple web search.

All the code will be available in my blog Github repository.

Going through the main points.
There is a BasicWebSearch interface, that takes the search term and returns SearchResults. 
SearchResults contains results in a map based on a result type enum. 
The implementations of BasicWebSearch namely: BingSearch, GoogleSearch and YahooSearch call the relevant search engine with the search term and then convert the results into a SearchResult. In the case of Yahoo and Bing, I map the JSON result to the SearchResult. Google however does that in their search client included in the dependencies.

Now for the main code bits:

As this is just an example, I use included the search settings in the following class, be sure to replace with the relevant values.

As both Bing and Yahoo use an HttpUrlConnection, I figured I would centralise the handling of that, the only difference between the 2 is that Bing used basic authentication and Yahoo I went with the OAuth implementation.






Google has a whole bunch of extra information being returned so I extended the base SearchResult so add all the information just in case I ever need it.

Maven Dependencies

Popular Posts