Evolving in an AI world.

Monday, May 23, 2011

1 Year and 70 000 views later, the stats...

So it has been just over a year, since my first blog post. Even though the whole work / life thing has kept me overly busy the last couple months, reducing me to 1-2 posts a month, it has been quite a pleasing experience blogging. I would recommend it to all developer types, it's an good way to keep you investigating, learning and having your work out there and visible to people makes you put in that little extra effort.

The actual reason for this post is for some discussion on the stats... There were 1 or 2 interesting percentages I would not have expected.

Views By Country

No big surprise that the US has the most views, this being a English technology based blog, it is however interesting that Germany is second, beating out the UK and India, both with larger populations. South Africa on the list is probably only there because I force my colleagues and friends to read it :)

Views By Browser

With this being an Java-technical-open source oriented blog makes FireFox the obvious choice to lead the browser war. However with Chrome in 2nd by a such large percentage was a bit unexpected.

Views By OS

On OS Windows is still number one. Apple is competing nicely with Linux / Unix combined, this is probably due the large apple following in the US and the boom of the iPad.

Monday, May 16, 2011

Java Compression.

In a recent project, we had to do something I had personally never really had to look at; Compression. We needed to take a couple files and images, zip them up and make them available for FTP, and yes somedays it does feel like we are back in the 90's. Besides the FTP trip into the past its was good opportunity to spend a little bit of time on the subject.

Compressing Files
So above the usual IO classes BufferedInputStream, FileOutputStream and File there are:
ZipInputStream - An input stream for reading files in the ZIP file format. Zip entries are not cached, unlike ZipFile.
ZipOutputStream - An output stream for writing files in the ZIP file format. This has a default internal buffer of 512, a BufferedOutputStream can be used to increase this.
ZipEntry - Represents an entry int a zip file.
ZipFile - Used to read entries from a zip file. The entries are cached.
CRC32 - Used to compute the CRC-32 of a data stream.

Below is an example showing how to compress and decompress files in a folder, with and without a checksum:

Compressing Objects
We didn't end up using object compression but I had a look at it anyways. I did a little generic compress / expand util, don't know if it will ever be useful. I left the input params as OutputStream and InputStream as this could theoretically be used with any stream implementation from socket communication to string manipulation.

The compression related classes being used here:
GZIPInputStream - An input stream filter for reading compressed data in the GZIP file format.
GZIPOutputStream - An output stream filter for writing compressed data in the GZIP file format. Default internal buffer of 512, use BufferedOutputStream if you require more.

Tuesday, April 5, 2011

Little Spring Gem: ReflectionTestUtils

I have recently been slacking on content on my blog, between long stressful hours at work and to the wonderful toy that is an iPhone, I have taken a little break from anything "development" related after leaving the office to maintain my sanity. Now with the project delivered, and things quieting down again I can re-focus my excess neuron energy back to processing more IT related information.

One little thing I discovered in my last project, which I am a little embarrassed about, being a Spring nut, is the little gem that is : ReflectionTestUtils.

I mostly try write unit tests that do not include Spring, for the reason, that you should be testing your code and not Spring config. However sometimes it's really useful and quite beneficial to have your test code wired up with Spring, be that for integration tests or just to extend your test suite.
One issue was, I always found myself adding "setters" to my component interfaces, I always hated doing it, but I would con myself by saying: "It was for more testing, and more testing is always a good thing", and move along. I eventually (while procrastinating on work I didn't really want to do) went searching for a cleaner solution.

1 search and 1 minute later, a little red faced, I kicked myself. I should have guessed straight away that Spring would have thought of this scenario From 2.5 Spring had the ReflectionTestUtils class, which simply lets you set your dependencies / mocks via reflection.

So easy, tell it which object, the name of the field and set the mock / value you want to set. Neater interfaces, good times.
Below is an example using EasyMock interfaces and inject them for my test.

Sunday, March 13, 2011

The simple Big-O Notation Post.

I make no claim to be a "computer scientist" or a software "engineer", those titles alone can spark some debate, I regard myself as a software developer and I generally don't study the math and science behind everything I do. I generally learn what is relevant and useful to my day to day functioning and only rarely go deeper and dabble in the theory behind it. This is one of those occasions, so I decided to scour the internet and see what I could pick up. I hope to keep this simple, practical and to the point.

Big-O:

Describes how the algorithm scales and performs, in terms of either the execution time required or the space used.
Is relative representation of complexity. This allows you to reduce an algorithm to a variable which in turn allows you to easily compare it to another.
Describes an upper limit on the growth of a function, in the other words the "worst case scenario".

There is also Big-Omega notation which looks at the lower bound / "best case scenario" stating that the algorithm will take at least X amount of time and Big-Theta which is tight bound to both lower and upper / "average".

Some quick observations in determining Big-O:

A Sequence of statements, or things like conditional checks are constant: O(1)
A loop of statements result in : O(n) n being the number of loop executions.
Nested loops are multiplied together: O(n²) where n is the times the outer loop executes and m is the times the inner loop executes.

Comparing the common notation examples:
(Thanks to Algorithms: Big-Oh Notation.)

Constant
O(1)

Logarithmic
O(log n)

Linear
O(n)

Linear Logarithmic
O(n log n)

Quadractic
O(n²)

Cubic
O(n³)

512

256

4,096

1,024

10,240

1,048,576

1,073,741,824

1,048,576

20,971,520

10¹²

10¹⁶

Java code example:
Show examples of notations in the table above.

Common Data Structures and Relative functions: Lists and Sets:

Structure	get	add	remove	contains
ArrayList	O(1)	O(1)*	O(n)	O(n)
LinkedList	O(n)	O(1)	O(1)	O(n)
HashSet	O(1)	O(1)	O(1)	O(1)
LinkedHashSet	O(1)	O(1)	O(1)	O(1)
TreeSet	O(log n)	O(log n)	O(log n)	O(log n)

* ArrayList Notes:

Thanks to reader comments, the add method on an ArrayList should be: O(1) amortized (and O(n) in worse case). Useful reference links:

Constant Amortized Time

Linked List vs ArrayList

Maps:

Structure	get	put	remove	containsKey
HashMap	O(1)	O(1)	O(1)	O(1)
LinkedHashMap	O(1)	O(1)	O(1)	O(1)
TreeMap	O(log n)	O(log n)	O(log n)	O(log n)

References:

Algorithms: Big-Oh Notation.

Algorithmic Complexity and Big-O Notation.

Determining Big O Notation An easier way .

Wikipedia.

Sunday, February 20, 2011

Selecting a new programming language to learn.

I have been itching to learn a new language, but being a Java freak, I always end up convincing myself to spend the time and effort discovering, investigating or playing with something in the Java open source stable, Spring, Hadoop, Joda Time, Hibernate, Maven, Hazelcast, EhCache etc etc. Developing in Java these days is almost purely about knowing and wiring together frameworks, which is both a good and a bad thing. (as well as a topic for another day).

Now to get myself to not redirect the "new language" energy into Y.A.F (yet another framework) I decided to give the languages out there a proper look and see which would be the best fit and most beneficial to my work, marketability and just general 'IT Zen'.

So what do I require from a language:
IDE... my number 1 thing is an IDE, if there isn't a decent IDE for a language it is frankly not worth the time and effort. I don't see myself as a "scientist" where I feel the need to cause myself pain and inconvenience to be "pure". I want a comfortable productive working environment, and VI or Notepad with a command line utility ain't it.

Established... Every couple years someone somewhere tries to define some new language, and most of those die in obscurity for example brainf*** or anything listed on Esolang.

Popular / In Demand... As with most things, popularity is good, it means:
open source community, support and most importantly jobs. If you ever want to see the current popularity of a language Tiobe is the site to visit.

So who are the contenders out there?
Based on Feb 2011 Tiobe index:
Java is still the no.1 most popular, it has awesome IDEs and it's been around for just more than 15 years (January 23, 1996), but thankfully I know Java reasonably well :)... so moving right along... To narrow down the list quickly, I won't look at any languages that are losing popularity, for obvious reasons, so from the top 20 on the Tiobe list that excludes: C, C++, PHP, VB, JS, Perl, Ruby, Delphi, Go.
(~~C, C++, PHP, VB, JS, Perl, Ruby, Delphi, Go.~~)

Which leaves behind:
Python, C#, Objective-C, Lisp, NXT-G, Ada, Pascal, Lua, RPG

Now there is a line between established and old, I am going to make a call that could offend some people and say Pascal and RPG are just old. (~~Pascal~~, ~~RPG~~)

Ada, don't know much about it, after reading the ADA overview, it seems okay, going to exclude it based on popularity. (~~Ada~~)

Lua, from a quick read it is a scripting language. (~~Lua~~)

NXT-G has something to do with lego or some robotics, not very mainstream. (~~NXT-G~~)

Lisp again like Ada, at first glace seems fine, just not popular enough. (~~Lisp~~)

Then there are the "New, built on other platforms" functional languages: Scala, F#, Clojure. Although very temping being on the bleeding edge, it's not all that profitable or marketable yet. I'll give them some time to standardize, settle down and see if they are widely adopted. They do appeal greatly to my inner geek, so will always be keeping an eye on them.

So this leaves me with:
Python, C#, Objective-C, (and Java).

Straight away based on the above list we can Tick: IDE, Established and Popular / In demand. We all know they have decent IDEs: Eclipse, XCode, Visual Studio, (IntelliJ and Netbeans). They have also been around and are well known.

Now looking at number of jobs:
Found a site (Simply hired) with a graph displays the percentage of jobs with your search terms anywhere in the job listing. Since June 2009, the following has occurred:

Python jobs increased 72%
C# jobs increased 77%
Objective-c jobs increased 268%
Java jobs increased 76%

Python, C#, Objective-c, Java trends

Python jobs | C# jobs | Objective-c jobs | Java jobs

With the recent boom of iPads and iPhones the Objective-C percentage is not all that surprising. I do have a problem with Apple, Objective-C and XCode and that problem is you need a Mac to run it. Once you start down that road you end up having to change everything to Apple, and I am not ready to do that. So for now I am going to drop ~~Objective-C~~ out of the running. Although if I ever do buy into the whole Apple thing, this will got back to the list.

Leaving me with Python and C#, looking at their salaries compared with Java:
(Data from Payscale).
US Data
Java
PayScale - Java Skill Salary, Average Salaries by Years Experience

Median Salary by Years Experience - Skill: Java (United States)

CAREER TOOLS: Salary Calculator, Career Path Tool, Cost of Living Calculator, Meeting Miser

Python
PayScale - Python Skill Salary, Average Salaries by Years Experience

Median Salary by Years Experience - Skill: Python (United States)

CAREER TOOLS: Salary Calculator, Career Path Tool, Cost of Living Calculator, Meeting Miser

C#
PayScale - C# Skill Salary, Average Salaries by Years Experience

Median Salary by Years Experience - Skill: C# (United States)

CAREER TOOLS: Salary Calculator, Career Path Tool, Cost of Living Calculator, Meeting Miser

South Africa Data

Java
PayScale - Java Skill Salary, Average Salaries by Years Experience

Median Salary by Years Experience - Skill: Java (South Africa)

CAREER TOOLS: Salary Calculator, Career Path Tool, Cost of Living Calculator, Meeting Miser

Python
PayScale - Python Skill Salary, Average Salaries by Years Experience

Median Salary by Years Experience - Skill: Python (South Africa)

CAREER TOOLS: Salary Calculator, Career Path Tool, Cost of Living Calculator, Meeting Miser

C#
PayScale - C# Skill Salary, Average Salaries by Years Experience

Median Salary by Years Experience - Skill: C# (South Africa)

CAREER TOOLS: Salary Calculator, Career Path Tool, Cost of Living Calculator, Meeting Miser

Based on the US data, I would have gone with Python, it's not as popular as C# but the pay is slightly better, I would also get to keep using Eclipse (PyDev) and Spring, but as soon as I looked at the South African data, I realized something, Python is really not big here. I manually went searching for Python positions advertised.. and found a grand total of 2 and the salaries were not good.

(~~Python~~)

Leaving C# as the last language standing.

It's got Visual Studio (even a free version Visual Studio Express), It has proven itself over the last couple years, it's out innovating Java at the moment, there's a ton of jobs, a whole range of certifications and the salaries have closed the gap on Java.
Seems quite a logical choice to me.

To top it off, I have also used C# many years back, so it won't not entirely new. Most of the successful Java open source projects (Spring, Hibernate etc etc) have been ported so all that knowledge is reusable, which also counted a little in my decision. Now I just need to stop working 12-14 hours a day, and I can focus on getting back to my Microsoft roots with little C# as a Java developer. Hopefully a couple months after that I can go through this process again, looking at Python, Objective-C, the mobile platforms (iOS, Android, windows) or maybe rather a concept change to functional with the likes of Clojure or Scala.

Monday, February 14, 2011

Validate XML against its XSD

Just a quick code snippet for possible future reference. How to validate a XML against it's XSD.

Monday, February 7, 2011

Surviving the Wild West Development Process

Now in a perfect world, we all strive for a some kind of development process, be that: Waterfall development, Prototyping, all things Agile or whatever process the CIO / CTO /CEO got sold on by some consultancy.

But in the real world what sometimes ("sometimes" being quite often actually) happens is:
What I like to call the Wild West Development Process (WWDP)... the main principle is basically “promise first and ask questions later”.

Getting started with WWDP is pretty simple:
1. You need someone, somewhere in a place of power and influence in your organization to promise software by some unachievable date.
2. Add in a eager marketing department and some press releases…
3. stir and leave to simmer.
4. After simmering for a while inform the software development department that they must have something, that does "stuff" and the release date can not change.

I am not sure if this is a result of a immature IT industry in my country, but I have spent many years in environments like this. It even seems to find me in the most unlikely employers, the large corporate organizations like insurance and banking institutions, where general red tape and corporate processes should limit this. I started my career in one of those “startup-sell-information-tech-and-business-processes-to-anyone-cheaper-than -anyone-else-type-consultancies". Which means every project was driven by WWDP so I started in the deep end, and for about 4 years I knew no other way of developing software. I currently find myself in another one of those projects. I'd like to share some points that I found helpful in actually delivering some software in unimaginable unmovable deadlines.
(to be fair some of these are in the standard development processes mentioned earlier, maybe with just different priorities)

Pick the right developers
Not everyone can cope in this environment, knowing your staff / team is very important: late nights, constant crisis, constant changes and bucket loads of pressure and stress aren't for everyone. Some people thrive, some survive and run for the hills after, some just end up whimpering in the corner and need to be replaced (which is a disaster).

Have the developers involved.
Projects like these don't allow for the usual requirements / design / develop / test paradigm, all of these functions run concurrently. Having the developers know as much detail as the architect, analysts and testers is vital. Developers will have to make decisions affecting the architecture and business functionality constantly, if everyone has the same picture you'll save time.

"Prototype" early
I say "Prototype" in quotes because unlike an actual prototype there won't be time to throw this code away. This "prototype" is to have the system in it's entirety up and running very early in the project. It can be a "shell" but having it up means testing can start early even if it's just the basics and no business functionality. Since there are no real requirements, make sure the design caters for adding them as they are defined.

Integrate early
If the system has any integration points, between systems, teams or external parties, make sure this is also in the "Prototype Shell", integration always takes longer than you think, having the interfaces up and available to test is crucial.

Keep track of communication
When the shock of not meeting an unachievable deadline kicks in, someone will always look for a scapegoat, always keep at least an email trail making sure you aren't that goat.

Automated builds and deploys
The standard agile practice of continuous integration is as important as ever, and even if you skip other parts of you software development cycle this is something that should not be dismissed.

Focus
With hundreds of things that need to happen a matter of days or weeks, it's very easy to try do 6 things at a time. Don't. Micro manage your time and tasks and focus. It's easy for us developers to start a task, see something, change something and lose a day which you don't have in the first place. If you have to, have someone keep an extensive list of "TODOs" that can be used in the cleanup phase.

Clean up
Once the unachievable deadline has been met, before starting the next phase, have at least one "cleanup" release. This cleanup phase should add very little or preferably no new functionality, but will allow for design and development decisions made in haste to be reviewed and refactored.

Pace Yourself
No matter how super human you may think you are, all of us have a limit to how many hours we can keep going. At some point you are actually being counter productive and a couple more hours sleep will actually help the project a lot more.
Know your limits.

I am sure there are a couple more, but currently being on a WWDP project I need to re-focus and leave that for the clean up phase later :)

Sunday, January 9, 2011

Joda Time == Good Times

I heard about Joda Time a couple months back, but finally got to use and implement it in a project.

We all know the whole Java date thing is ugly, and I sure most people have the own "implementations" and extensions like Period objects and utility Date Diff classes. I know I've written my fair share over the years. I won't be doing that again...

I am not going to go into too much detail as the Joda documentation is pretty good.
I just wanna highlight some of the very usual functionality that won me over:

Setting the "System" time, very useful. The financial systems I have worked on throughout my career always had some concept that required the "Current Date" mostly for data and rules with effective date periods.

Joda lets you statically set the "Current Date" making it a pleasure to test with, ensuring that any new DateTime objects used will be based off the set date:

Intervals:

The classic Date diff:

Working with "parts", just the date or the time:

Friday, December 31, 2010

The "all you need to know link" for tuning Hibernate

I stumbled across this awesome article today: Revving Up Your Hibernate Engine.
Well worth the read if you are or are planning to use Hibernate in the near future.

To quote the summary of the article:
"This article covers most of the tuning skills you’ll find helpful for your Hibernate application tuning. It allocates more time to tuning topics that are very efficient but poorly documented, such as inheritance mapping, second level cache and enhanced sequence identifier generators.

It also mentions some database insights which are essential for tuning Hibernate.
Some examples also contain practical solutions to problems you may encounter."

Friday, December 24, 2010

Find all the classes in the same package

It's been a slow month for blogging, due to the release of a new World of Warcraft expansion, crazy year end software releases at work and the general retardation of the human collective consciousness that happens this time of year.

With all of that I did however have some fun writing the following bit of code. We have a "generic" getter and setter test utility used to ensure a couple extra %'s of code coverage with minimal effort. That utility did however requires that you call it for every class you want to test, being lazy I wanted to just give it a class from a specific package and have that check all the classes.

I initially thought there would have to be just a simple way to do it with the Java reflection API. Unfortunately I didn't find any, it seems a "Package" does not keep track of it's classes, so after a bit of digging on the net and feverish typing (ignoring the actual project I am currently working on for a little) here is what I came up with. The tip is actually the just methods from the java.lang.Class object: "getProtectionDomain().getCodeSource().getLocation().toURI()", giving you the base to work from and I found that somewhere on Stackoverflow.

There are 2 public methods: findAllClassesInSamePackage and findAllInstantiableClassesInSamePackage. (For my purposes of code coverage I just needed the instantiable classes.)

Usage:

PackageClassInformationFinder:

Sunday, December 5, 2010

Enabling enterprise log searching - Playing with Hadoop

Having a bunch of servers spewing out tons of logs is really a pain when trying to investigate an issue. A custom enterprise wide search would just be one of those awesome little things to have and literally save developers days of their lives not to mention their sanity. The "corporate architecture and management gestapo" will obviously be hard to convince, but the chance to write and setup my own "mini Google wannabe" MapReduce indexing is just too tempting. So this will be a personal little project for the next while. The final goal will be distributed log search, using Hadoop and Apache Solr. My environment this mostly consists of log files from:
Weblogic Application server, legacy CORBA components, Apache, Tibco and then a mixture of JCAPS / OpenESB / Glasshfish as well.

Setting Up Hadoop and Cygwin
First thing to get up and running will be Hadoop, our environment runs on windows and there in lies the first problem. To run Hadoop on Windows you are going to need Cygwin.

Download: Hadoop and Cygwin.

Install Cygwin, just make sure to include the Openssh package.

Once installed, using the Cygwin command prompt: ssh-host-config
This is to setup the ssh configuration, reply yes to everything except if it asks
"This script plans to use cyg_server, Do you want to use a different name?" Then answer no.

There seems to be a couple issues with regards to permissions between Windows (Vista in my case), Cygwin and sshd.
Note: Be sure to add your Cygwin "\bin" folder to your windows path (else it will come back and bite you when trying to run your first map reduce job)
and typical to Windows, a reboot is required to get it all working.

So once that is done you should be able to start the ssh server: cygrunsrv -S sshd
Check that you can ssh to the localhost without a passphrase: ssh localhost
If that requires passphrase, run the following:

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Now back to the Hadoop configuration:
Assuming that Hadoop was downloaded and unzipped into a working folder, ensure that the JAVA_HOME is set. Edit the [working folder]/conf/hadoop-env.sh.

The go into [working folder]/conf, and add the following to core-site.xml:

To mapred-site.xml add:

Go to the hadoop folder: cd /cygdrive/[drive]/[working folder]
format the dfs: bin/hadoop namenode -format
Execute the following: bin/start-all.sh

You should then have the following URLs available:
http://localhost:50070/
http://localhost:50030/

A Hadoop application is made up of one or more jobs. A job
consists of a configuration file and one or more Java classes, these will interact with the data that exists on the Hadoop distributed file system (HDFS).

Now to get those pesky log files into the HDFS. I created a little HDFS Wrapper class to allow me to interact with the file system. I have defaulted to my values (in core-site.xml).

HDFS Wrapper:
I also found a quick way to start searching the log file uploaded, is the Grep example included with Hadoop, and included it my HDFS test case below. Simple Wrapper Test: