Evolving in an AI world.

Sunday, September 11, 2016

Oracle Workspace Manager - Basic POC with Spring Boot and Flyleaf

Working with a process to update configuration and master data within an enterprise is always a challenging task. While investigating possible solutions on how to have someone change data via a UI, then have those changes tested, signed off and approved before taking it to production. I stumbled on to Oracle Workspace Manager.
According to their developer docs, it seemed fit this use case exactly:

Manage a collection of updates and insertions as a unit before incorporating them into production data
Workspace Manager lets you review changes and roll back undesirable ones before making the changes public. Until you make the changes public, they are invisible to other users of the database, who will access only the regular production data. You can organize the changes in a simple set of workspaces or in a complex workspace hierarchy. A typical example might be a life sciences application in which Workspace Manager supports the discovery and quality assurance (QA) processes by managing a collection of updates before they are merged with the production data.

You could think of Oracle Workspace Manager as light "git"-like db versioning.
Simply put:

You create a workspace (branch)...
You make your changes there.. other people can connect to your workspace .. also make changes... or alternatively create their own branch from yours.
You can refresh (fetch / merge) your workspace and resolve any conflicts that arise.
Once everything is sorted and all is well with your changes you merge it back into the "LIVE" (origin: develop / master) workspace.

I used an existing Docker image of a oracle standard edition version. Oracle Workspace manager is not available on XE unfortunately. This is a rather large image, and it does take a couple minutes to initialise. The image is available on here on Docker Hub.

Connect database with following setting:

hostname: localhost

port: 1521

sid: xe

service name: xe.oracle.docker

username: system

password: oracle

To connect using sqlplus:

sqlplus system/oracle@//localhost:1521/xe.oracle.docker

To connect on mac os - install instantclient and run from there:

./instantclient_12_1/sqlplus system/oracle@local

To setup the initial DB I tried out Flyway. All quite simple and easy to implement.
Under resources/db/migration there are a number of sql files that do the initial database setup.

create the tables: CODE and CODE_TYPE,
insert initial data
enable versioning on those tables.

When you enable versioning the following happens, the table is renamed and a view is created allowing the "recording" of changes that occur.

Reference: (oracle presentation available here)

So after the initial setup you will seen a number of tables and views:

To try this out...

Get the Oracle Docker image
Once the DB is started... run the boot DataApplication.
Use Postman on the REST resources below

To check the current workspace:
GET: http://localhost:9119/poc/workspace
To select all the information from the code table for the current workspace:
GET: http://localhost:9119/poc/workspace/data/code
Create code in the current workspace:
POST: http://localhost:9119/poc/workspace/data/code
BODY:
{
"id":2,
"descr": "some code",
"type": 1
}
To change workspace (LIVE is default and available):
PUT: http://localhost:9119/poc/workspace/{workspaceName}
To create a workspace:
POST: http://localhost:9119/poc/workspace/{workspaceName}
To remove a workspace:
DELETE: http://localhost:9119/poc/workspace/{workspaceName}

To merge a workspace:
PUT: http://localhost:9119/poc/workspace/merge/{workspaceName}

These REST resources just wrap some of the functions from the DBMS_WM Package.
This is maybe just the tip of a iceberg, as there is a ton of functionality available from this package.

All the code is available here.

Wednesday, September 7, 2016

Re-Inspired

It has been a very long time since I last posted.
Due to some crazy work pressures and a couple months of working 2 jobs, I just could not find the time nor inspiration to contribute to this blog.

However today a work colleague sent me the following: 52 Technologies in 2016

It's awesome.
It reminded me how much fun I had driving myself to learn, try, demo and blog about new tech all the time. Lets hope life does not get in the way.

Tuesday, December 23, 2014

Amazon AWS Elastic Beanstalk, Python, Flask and Sci-Stack Docker

This actually took me longer than I'd like to admit to get working, but in the end the solution is quite neat and simple, so it was probably worth it, and hopefully this could save other people some time.

The Amazon Docker file looks like this:
AWS Elastic Beanstalk Dockerfile - Github

This installs the contents of the root folder requirements.txt before running your Docker file. So for my application the basic "non-sci" packages could be installed simply enough.

Root Folder: requirements.txt:
Then to install the sci related packages... numpy, scipy, pandas, scikit-learn and nltk. I created another requirements.txt in an aws-post-install folder. This is to be run once the Amazon linux OS has been updated and all the required OS dependencies have been installed.

Post Docker requirements.txt:
My custom docker file, that builds ontop of the Amazon image looked as follows:

Docker File:

Next step is to get my docker image to be used directly so that the Elastic Beanstalk app doesn't have to do all the downloads and installs every time should be simple enough according the AWS you tube channel: https://www.youtube.com/watch?v=pLw6MLqwmew

Tuesday, August 26, 2014

Why Jython when you can microservice with Flask

Over the last little while I have been working on Sibbly it's my little pet project to try summarize, group, filter and target software development information on the web. All 'n all a rather ambitious task, but the worst thing that could happen is that I learn something, so there is really no risk. It is still currently in a very closed beta, only occasionally showing it to fellow work colleagues and getting some input.

After initially starting development for Sibbly on Ubuntu, as I was always planning on deploying on Ubuntu, I had migrated back to windows, and after a couple weeks of work when finally deploying to Ubuntu... Surprise! it obviously did work right off the bat.

The issue I ended up with was, there seems to be a classpath issue between Spring Boot, it's embedded Tomcat instance and Jython. The reason I use Jython is for an awesome library called Pygments.

So after much dismay and checking all the Java alternatives and attempted Pygment ports (jygments, jgments), I started thinking of alternate solutions.
Having recently read: Microservices I decided to look at a way of interacting with Python more indirectly.
This lead me to: Flask
Within a couple minutes thanks to: Awesome Flask Example
I had the following up and running:

What this little bit of Python does is wrap and expose the highlight and guess functionality from Pygments via a RESTful service accepting and producing JSON.

I deploy Sibbly on DigitalOcean
To install Python on my droplet, I followed the process below:


sudo apt-get install python-dev build-essential  
sudo apt-get install zlib1g-dev
sudo apt-get install libssl-dev openssl
sudo apt-get install python-pip
sudo pip install virtualenv
sudo pip install virtualenvwrapper

export WORKON_HOME="$HOME/.virtualenvs"
source /usr/local/bin/virtualenvwrapper.sh

sudo mkdir /opt/python3.4.1
wget http://python.org/ftp/python/3.4.1/Python-3.4.1.tgz
tar xvfz Python-3.4.1.tgz
cd Python-3.4.1
./configure --prefix=/opt/python3.4.1
make  
sudo make install

mkvirtualenv --python /opt/python3.4.1/bin/python3 py-3.4.1

workon py-3.4.1

pip install flask
pip install pygments

Once that was done to run the Flask app:

python app.py & disown

Sunday, August 10, 2014

Upgrading Spring 3.x and Hibernate 3.x to Spring Platform 1.0.1 (Spring + hibernate 4.x)

I recent volunteered to upgrade our newest project to the latest version of Spring Platform. What Spring Platform gives you is dependency & plugin management across the whole Spring framework's set of libraries.

Since we had fallen behind a little the upgrade did raise some funnies. Here are the things I ran into:

Maven:
Our pom files were still referencing:
hibernate.jar
ehcache.jar
These artefacts don't exit on the latest version, so replaced those with
hibernate-core.jar and ehcache-core.jar

We also still use the hibernate tools + maven run plugin to reverse engineer our db object.
This I needed to update to a release candidate:

Hibernate:

The code: "Hibernate.createBlob"... no longer exists

replaced with:

On the HibernateTemplate
return types are now List; not element...So needed to add casts for the lists being returned.

import org.hibernate.classic.Session;
replaced with:
import org.hibernate.Session;

Reverse engineer works a little differently...

Assigns Long to numeric...
Added:

Possible Errors:

Caused by: org.hibernate.service.UnknownUnwrapTypeException: Cannot unwrap to requested type [javax.sql.DataSource]

Add a dependency for c3p0:

And configure the settings in the cfg.xml for it:

Caused by: java.lang.ClassNotFoundException: org.hibernate.engine.FilterDefinition

Probably still using a reference to hibernate3 factory / bean somewhere, change to hibernate4:
org.springframework.orm.hibernate3.LocalSessionFactoryBean
org.springframework.orm.hibernate3.HibernateTransactionManager

Caused by: java.lang.ClassNotFoundException: Could not load requested class : org.hibernate.hql.classic.ClassicQueryTranslatorFactory There is minor change in new APIs, so this can be resolved by replacing property value with:

org.hibernate.hql.internal.classic.ClassicQueryTranslatorFactory.

Spring:
Amazingly some of our application context files still referenced the Spring DTD ... replaced with XSD

In Spring configs added for c3p0:

Spring removed the "local"=: so needed to just change that to "ref"=

Spring HibernateDaoSupport no longer has: "releaseSession(session);", which is a good thing so was forced to update the code to work within a transaction.

Possible Errors:

getFlushMode is not valid without active transaction; nested exception is org.hibernate.HibernateException: getFlushMode is not valid without active transaction

Removed from hibernate properties:

<prop key="hibernate.current_session_context_class">thread</prop>

Supply a custom strategy for the scoping of the "current"Session. See Section 2.5, “Contextual sessions” for more information about the built-in strategies

org.springframework.dao.InvalidDataAccessApiUsageException: Write operations are not allowed in read-only mode (FlushMode.MANUAL): Turn your Session into FlushMode.COMMIT/AUTO or remove 'readOnly' marker from transaction definition.

Another option is :

java.lang.NoClassDefFoundError: javax/servlet/SessionCookieConfig

Servlet version update:

Then deploying on weblogic javassist: $$_javassist_ cannot be cast to javassist.util.proxy.Proxy

The issue here was that there were different versions of javassist being brought into the ear. I all references removed from all our poms, so that the correct version gets pulled in from from Spring/Hibernate...

and then configured weblogic to prefer our version:

Saturday, July 19, 2014

TDD, Hamcrest, Shazamcrest

Recently we have started to try get a more TDD culture started at work, having always believed in thorough testing and decent code coverage it shouldn't have been too hard. However... teaching a old dog new tricks can sometimes require quite a bit of patience. Turns out breaking coding habits formulated of more than a decade of keyboard bashing is harder than it seems.

So with generating an enormous amount of test code, comes the usual task code & test maintenance and reuse.
One of the tools / libraries we have included is Hamcrest, which not only improves the readability of assertion failures, but allows you to create and extend custom matchers, which you can then reuse across multiple test scenarios.

I am not going to go into too much detail on Hamcrest here, where are a bunch of great resources / blogs / tutorials out there.. just a few:
http://www.baeldung.com/hamcrest-collections-arrays
https://weblogs.java.net/blog/johnsmart/archive/2011/12/12/some-useful-new-hamcrest-matchers-collections
http://edgibbs.com/junit-4-with-hamcrest/
http://www.planetgeek.ch/2012/03/07/create-your-own-matcher/

While creating a custom type safe matcher for one of our domain objects, I realised that was insane.. really.. this.getA == that.getA... mmmm no.
So I went searching for something could help and and after a bit, I found: Shazamcrest (bonus points for the name)
What Shazamcrest does is:
Serialize the objects to compare.
Compares them and then on fail throws a ComparisonFailure, which the major IDE's allow you use their build in diff display.

Great... no manual bean compares.

So I add the maven dependency, try it out on our complex domain object....

StackOverflowError.... It was a known limitation at the time. The json provider Shazamcrest was using:

GSON does not cater for circular reference serialization.

As both Shazamcrest and GSON being opensource, I decided to have a look and see if I could contribute, anything is better that writing a manual bean matcher. After some investigation I found that the guys on the GSON project have created a fix GraphAdapterBuilder, it is just not distributed with the actual library.

So after fork on the Shazamcrest GitHub project, a little bit of code and submitting a pull request:

https://github.com/shazam/shazamcrest/pull/5

The guys on the Shazamcrest project very quickly merged my changes in and published a new version to the maven repo (Thanks for that).

So be sure to use the 0.8 version if you are struggling with circular references.

Monday, May 26, 2014

Playing with Java 8 - Lambdas, Paths and Files

I needed to read a whole bunch of files recently and instead of just grabbing my old FileUtils.java that I and probably most developers have and then copy from project to project, I decided to have quick look at how else to do it...
Yes, I know there is Commons IO and Google IO, why would I even bother? They probably do it better, but I wanted to check out the NIO jdk classes and play with lambdas aswell.. and to be honest, I think this actually ended up being a very neat bit of code.

So I had a specific use case:
I wanted to read all the source files from a whole directory tree, line by line.

What this code does, it uses Files.walk to recursively get all the paths from the starting point, it creates a stream, which I then filter to only files that end with the required extension. For each of those files, I use Files.lines to create a stream of Strings, one per line. I trim that, filter out the empty ones and add them to the return collection.
All very concise thanks to the new constructs.

Saturday, April 26, 2014

Playing with Java 8 - Lambdas and Concurrency

So Java 8 was released a while back, with a ton of features and changes. All us Java zealots have been waiting for this for ages, all the way back to from when they originally announced all the great features that will be in Java 7, which ended up being pulled.

I have just recently had the time to actually start giving it a real look, I updated my home projects to 8 and I have to say I am generally quite happy with what we got. The java.time API the "mimics" JodaTime is a big improvement, the java.util.stream package is going useful, lambdas are going to change our coding style, which might take a bit of getting used to and with those changes... the quote, "With great power comes great responsibility" rings true, I sense there may be some interesting times in our future, as is quite easy to write some hard to decipher code. As an example debugging the code I wrote below would be "fun"...

The file example is on my Github blog repo

What this example does is simple, run couple threads, do some work concurrently, then wait for them all to complete. I figured while I am playing with Java 8, let me go for it fully...
Here's what I came up with:
Test:
Output:

0 [pool-1-thread-1] Starting: StringInputTask{taskName='Task 1'}
0 [pool-1-thread-5] Starting: StringInputTask{taskName='Task 5'}
0 [pool-1-thread-2] Starting: StringInputTask{taskName='Task 2'}
2 [pool-1-thread-4] Starting: StringInputTask{taskName='Task 4'}
2 [pool-1-thread-3] Starting: StringInputTask{taskName='Task 3'}
3003 [pool-1-thread-5] Done: Task 5
3004 [pool-1-thread-3] Done: Task 3
3003 [pool-1-thread-1] Done: Task 1
3003 [pool-1-thread-4] Done: Task 4
3003 [pool-1-thread-2] Done: Task 2
3007 [Thread-0] WaitingFuturesRunner - complete... adding results

Some of the useful articles / links I found and read while doing this:

Oracle: Lambda Tutorial
IBM: Java 8 Concurrency
Tomasz Nurkiewicz : Definitive Guide to CompletableFuture

Sunday, February 16, 2014

Local Wikipedia with Solr and Spring Data

Continuing with my little AI / Machine Learning research project... I wanted to have a decent sized repo of English text, that was not in a complete mess like a large percentage of data on the internet. I figured I would try Wikipedia, but what to do with about 40Gb of XML? how do I work / query with all that data. I figured based on recent work implementation where we load something like 200 000 000 records on into a Solr cache, Solr would be the way to go, so the is an example of my basic implementation.

Required for this example:

Wikipedia download (warning it is a 9.9Gb file, extracts to about 42Gb)
Solr
Spring Data (Great Blog / Examples on Spring Data: Petri Kainulainen's blog)

All the code and unit test for this post is on my blog GitHub Repo

When setting up Solr from scratch, you can have a look at Solr's wiki or documentation, their documentation is pretty good. There is also an example of importing Wikipedia here, I started with that and made some minor modifications.

For this specific example the Solr config needed (/conf):
For this example (and in the below config files),
Solr home: /Development/Solr
Index / Data: /Development/Data/solr_data/wikipedia
Import File: /Development/Data/enwiki-latest-pages-articles.xml

The full import into Solr took about 48 hours on my old 2011 i5 iMac and the index on my current setup is about 52Gb.

Data Config for the import:

Schema:

Solr Config:

The code for this ended up being quite clean, Spring Data - Solr, gives 2 main interfaces SolrIndexService, and SolrCrudRespository, you simply extend / implement these 2, wrap that in a single interface, autowire from a Spring Java context and you good to go.

Repository:

IndexService:

SolrService:

SpringContext:

Next thing for me to look at for sourcing data is Spring Social.

Sunday, January 12, 2014

BYG (Bing, Yahoo, Google) Search Wrapper

One small section of my Aria project will be to interface with the current search engines out there. To do this I will require a module that will give me a consistent interface to work with the 3 main providers; Bing, Yahoo! and Google. (and any future ones I may want to add). This is a basic example or that module.

First thing required is to set up accounts / projects and the like with the relevant providers.
I won't describe this process as they were all pretty well documented.

Bing Developer Center
Yahoo Developer Network
Google Developers Console

A couple tips for the above sites.

Bing: Setup both the web and synonym searches.
Yahoo: In the BOSS console, under manage account, put in a daily limit $ amount (or turn of limit), as they only allow 1 free query a day... so only the first request works.
Google: It doesn't seem that you can set it up to search the whole web, but after creating your custom search engine, you can select "Search the entire web but emphasize included sites" so don't worry about that.

All these providers allow for many options while searching ( e.g. images, location, news, video etc.) , however in this initial example I have limited it to just a pure and simple web search.

All the code will be available in my blog Github repository.

Going through the main points.

There is a BasicWebSearch interface, that takes the search term and returns SearchResults.

SearchResults contains results in a map based on a result type enum.

The implementations of BasicWebSearch namely: BingSearch, GoogleSearch and YahooSearch call the relevant search engine with the search term and then convert the results into a SearchResult. In the case of Yahoo and Bing, I map the JSON result to the SearchResult. Google however does that in their search client included in the dependencies.

Now for the main code bits:

SearchSettings

As this is just an example, I use included the search settings in the following class, be sure to replace with the relevant values.

UrlConnectionHandler

As both Bing and Yahoo use an HttpUrlConnection, I figured I would centralise the handling of that, the only difference between the 2 is that Bing used basic authentication and Yahoo I went with the OAuth implementation.

BingSearch

BingResultParser

YahooSearch

YahooResultParser

GoogleSearch

GoogleSearchResult
Google has a whole bunch of extra information being returned so I extended the base SearchResult so add all the information just in case I ever need it.

Maven Dependencies

Sunday, December 15, 2013

Predicting the next most probable part of speech.

I have recently been spending some of my spare time learning and about AI and machine learning , after a couple books, a bunch of tutorials and most of Andrew Ng's Coursera course. I decided enough with the theory, time for some real code.
During all my late night reading I also stumbled across some of the following. The Loebner Prize, and it's most recent winner Mitsuku, A.L.I.C.E and Cleverbot and to be honest, maybe given my naivety of the field of AI, I expected much more from the above "AI" / technology, most of these current chatbots are easily confused and honestly not very impressive.
Thankfully I also found Eric Horvitz's video of his AI personal assistant, which resonated with what I wanted to achieve with my ventures into AI.

So, with human interaction as a focus point, I started designing: "Aria" - (Artificially Intelligent Research Assistant - in reverse). Which, since most of my development experience is based in the Java Enterprise environment, will be built on a distributed enterprise scale using the amazing technologies that it offers to mention some Hadoop, Spark, Mahout, Solr, MySQL, Neo4J, Spring...

My moonshot/daydream goal is to better the interactions of people and computers, but in reality if I only learn to use, implement and enjoy all that is involved with AI and ML, I will see myself as successful.

So, for my first bit of functional machine learning...

Predicting the next most probable part of speech. One of the issues with natural language processing is that words used in different contexts end up having different meanings and synonyms. To try assist with this I figured I would train a neural network with the relevant parts of speech, and then use that to assist in understanding user submitted text.

This full code for this example is available on Github.

I used a number of Java open source libraries for this:
Encog
Neuroph
Stanford NLP
Google Guava

I used a dataset of 29 000 English sentences that I sourced from a bunch of websites and open corpus's. I won't be sharing those as I have no clue what the state of the copyright is, so unfortunately to recreate this you'd need to source our own data.

For the neural network implementation, I tried both Neuroph and Encog. Neuroph got my attention first with their great UI to allow me to experiment with my neural network visually in the beginning, but as soon as I created my training data with ended up being about 300MB of 0's and 1's it fell over and didn't allow me to use it. I then began looking at Encog again as I had used initially when just starting to read about ML and AI

When using Neuroph in code it worked with the dataset, but then only with BackPropagation the ResilientPropagation implementation never seemed to return.
So I ended up much preferring Encog, it's resilient propagation implementation (iRPROP+) worked well and reduces the network error to about 0.018 in under 100 iterations, without me having to fine tune the settings and network architecture.

How this works, I take text data, I use the Stanford NLP library to generate a list of the parts of speech in the document. I translate their Annotation into an internal enum, and then use that to build up a training data set. I persist that to file currently, just to save some time while testing. I then train and persist the neural network and test it.

The Parts of Speech Enum:

The creation of the training data:

Train the network:

Test: