Having a bunch of servers spewing out tons of logs is really a pain when trying to investigate an issue. A custom enterprise wide search would just be one of those awesome little things to have and literally save developers days of their lives not to mention their sanity. The "corporate architecture and management gestapo" will obviously be hard to convince, but the chance to write and setup my own "mini Google wannabe" MapReduce indexing is just too tempting. So this will be a personal little project for the next while. The final goal will be distributed log search, using Hadoop and Apache Solr. My environment this mostly consists of log files from:
Weblogic Application server, legacy CORBA components, Apache, Tibco and then a mixture of JCAPS / OpenESB / Glasshfish as well.
Setting Up Hadoop and Cygwin
First thing to get up and running will be Hadoop, our environment runs on windows and there in lies the first problem. To run Hadoop on Windows you are going to need Cygwin.
Download: Hadoop and Cygwin.
Install Cygwin, just make sure to include the Openssh package.
Once installed, using the Cygwin command prompt: ssh-host-config
This is to setup the ssh configuration, reply yes to everything except if it asks
"This script plans to use cyg_server, Do you want to use a different name?" Then answer no.
There seems to be a couple issues with regards to permissions between Windows (Vista in my case), Cygwin and sshd.
Note: Be sure to add your Cygwin "\bin" folder to your windows path (else it will come back and bite you when trying to run your first map reduce job)
and typical to Windows, a reboot is required to get it all working.
So once that is done you should be able to start the ssh server: cygrunsrv -S sshd
Check that you can ssh to the localhost without a passphrase: ssh localhost
If that requires passphrase, run the following:
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Now back to the Hadoop configuration:
Assuming that Hadoop was downloaded and unzipped into a working folder, ensure that the JAVA_HOME is set. Edit the [working folder]/conf/hadoop-env.sh.
The go into [working folder]/conf, and add the following to core-site.xml:
To mapred-site.xml add:
Go to the hadoop folder: cd /cygdrive/[drive]/[working folder]
format the dfs: bin/hadoop namenode -format
Execute the following: bin/start-all.sh
You should then have the following URLs available:
A Hadoop application is made up of one or more jobs. A job
consists of a configuration file and one or more Java classes, these will interact with the data that exists on the Hadoop distributed file system (HDFS).
Now to get those pesky log files into the HDFS. I created a little HDFS Wrapper class to allow me to interact with the file system. I have defaulted to my values (in core-site.xml).
I also found a quick way to start searching the log file uploaded, is the Grep example included with Hadoop, and included it my HDFS test case below.
Simple Wrapper Test:
Sunday, December 5, 2010
Enabling enterprise log searching - Playing with Hadoop
Subscribe to: Post Comments (Atom)
I have recently been slacking on content on my blog, between long stressful hours at work and to the wonderful toy that is an iPhone, I have...
I make no claim to be a "computer scientist" or a software "engineer", those titles alone can spark some debate, I regar...
I saw an article (well more of a rant) the other day, by Rob Williams Brain Drain in enterprise Dev . I have to say, I do agree with some o...
This series of posts will be about me getting to grips with JBoss Drools . The reasoning behind it is: SAP bought out my company's curre...
I recently finished 97 Things every programmer should know . Well to be completely honest I did skim over a couple of the 97, but all and al...
I was just wondering how I missed this article so far, this is a great piece of content I have ever seen in the entire Internet. Thanks for sharing this worth able information in here and do keep blogging like this.ReplyDelete
Hadoop Training Chennai | Big Data Training | Best Hadoop Training in Chennai
Wonderful post...Tally ERP9 Training Institute in Chennai | Tally ERP9 Training Institute in Velachery.ReplyDelete
Excellent post on Java!!!Everyone are repeating the same concept in their blog, but here I get a chance to know new things in Java programming language. I will also suggest your content to my friends to know about recent features of Java.ReplyDelete
Best Java Training in Chennai |
J2EE Training in Chennai
I have seen only the same thing is reapeating in many blogs, but your blog includes the unique content with many recent updates about the Hadoop technology. Thank you for sharing with us, I also like to share your blog to my friends.ReplyDelete
big data training in velachery |
Hadoop Course in Chennai
Thanks for sharing informative article on Salesforce technology. Your article helped me a lot to understand the career prospects in cloud computing technology. Salesforce Training in Chennai | Salesforce Training Institutes in ChennaiReplyDelete
Excellent post!!! In this competitive market, customer relationship management plays a significant role in determining a business success. That too, cloud based CRM product offer more flexibility to business owners to main strong relationship with the consumers.ReplyDelete
Cloud computing courses in Chennai|Cloud computing Training in Chennai
Machine Learning is a practice of studying algorithms and statistics and training the computer to perform a specific task for the recognition of specific data. data science course syllabusReplyDelete
Wow what a Great Information about World Day its exceptionally pleasant educational post. a debt of gratitude is in order for the post.ReplyDelete
data science course in India
For Sexy and hot girls entertaining servicesReplyDelete
call girls in Dubai
Mua vé tại Aivivu, tham khảoReplyDelete
vé máy bay đi Mỹ hạng thương gia
mua vé máy bay về việt nam từ mỹ
mua vé máy bay từ nhật về việt nam
vé máy bay khứ hồi từ đức về việt nam
vé máy bay từ canada về việt nam bao nhiêu tiền
gia ve may bay tu han quoc ve viet nam
vé máy bay chuyên gia nước ngoài
In the comments, developers are so happy form your efforts. You resolve their problems and make it easy for them. Similarly, Furnace Repair Services In Fort Worth TX brings the solution of securing the HVAC equipment's.ReplyDelete
The next stop is AtozTopNews. This website is up-to-date with the latest tech news, reviews of the most modern consumer technology, and advice on purchasing technology. There's also a distinctive "How To" section which includes blog posts written to consumers with their purchases, such as "How do you watch Star Wars movies in order '' We love it.ReplyDelete
Formatted much like a traditional paper, Tamiloneindia is one of the best news websites and reports on a variety of subjects. All of the conventional topics are covered (think US and world news, sports news, health, fitness, etc.), as well as topics like jobs, health, art, and even NYC event guides. Tamil One India is one of the bigger players on this list and will be sure to have excellent coverage on any breaking stories of relevance that come out as well.ReplyDelete
Free Spins – Free Spins are as simple as they sound. You get free wagers to place subject to T&C’s giving you a chance at winning real money for no stake effectively but do understand Casinos don’t give Money Away or they would be out of Business. You can win Big but you can also lose your stake. Please Gamble Responsibly.http://www.e-vegas.comReplyDelete