Tuesday, November 11, 2008

Of OLAP and the importance of open standards

In these times of economical crisis, many companies will turn to business intelligence (BI) as a source of wisdom and counsel. Millions of dollars will be invested in an effort to understand the extend of their respective problems and find solutions based on accurate and decision oriented datasets.

Since I have a fairly good amount of experience with work in heterogeneous environments and tackling data integration challenges, I thought I'd pitch in my two cents.

Why developers and project managers will have a hard time


The root of the problem is this. The Microsoft OLAP toolkit does not integrate so well with anything else than .NET technologies. SAS offers a Java API, yet it is not ready for production. (I worked with it for two years, and believe me, they are still a fairly long way to production quality code.) As a matter of fact, most software vendors in the OLAP world distribute some API to integrate their technologies, but you often end up with black boxes of questionable quality, flexibility or performance. Some even go as far as to obfuscate their libraries... this really doesn't help in the end.

Some vendors like Oracle went for the all-in-the-box solution. They offer a "complete" solution that can fit every possible need. Then again, what they are telling you is: if we don't have it, you probably don't need it."Probably"? You got to be kidding. Since when does software vendors know what you need and what your future will be? Better switch probably for hopefully.

In the best case, in order to meet your needs, you'll hack your way through at the expense of your project specifications. The final result can be nothing but deceiving. Your celebration will be bitter and probably short lived, I fear.

About the importance of collective work


You have a brand new application. Hooray! This is where the production phase kicks in.

What if you need to move your datamart to another OLAP server? What if there are not enough connections licenses to allow both production connections and all of your maintenance personnel on the OLAP server and they are forced to take turns to debug? What if the CEO decides to migrate to a new platform? What if [insert random but oh so frequent unforeseen event here]? Your thousand dollar code is now rendered useless; you can start crying now, you deserve it. In your quest for more money making, you've created a monster that was expensive and will continue to pump the money out of your institution pockets.

If you were good enough in systems design, you thought about a data layer. The data layer still remains to be rewritten entirely and it often represents at least a third of the overall effort. Close, but no cigar. This might sound like a catastrophic scenario, but it is oh so frequent.

Many people got tired of all this non-sense we decided to work together. We decided that enough time and money was wasted on individual efforts that were ruined in the end.  It was time to agree on standards and share the product of our collective effort.

Take Hibernate for example. It is now a de facto standard when it comes to data mappers. For the Java version alone, it represents 859 thousand lines of code worth 12.8 million dollars in work hours. Think you can top that with your in-house data layer in times of economical crisis?

About Java OLAP


OLAP is a world in itself. You can't take relational paradigms and apply them to the multidimensional world. The .NET toolbox does have very nice libraries to do some neat OLAP stuff, then again, you're locked-in with SSAS. This is a no-no.

On the Java side, things are even worse. There is currently a big void in the Java OLAP market. No OLAP standard emerged at all. Thanks to the selfishness of the big players of the industry, the JOLAP initiative was a total failure. It never reached the final version, so the JSR-69 specification died quietly.

We at Olap4j tried to fill that gap with an open initiative. Everyone can pitch in. And I mean EVERYONE.

What makes Olap4j so kewl


You know the expression vendor lock-in? I hope you do, I *really* do, or else you'll learn it the hard way. Olap4j aims at solving exactly this problem. You can develop applications on it's API and switch the underlying OLAP engine without rewriting a single line of code. Not bad heh? Olap4j is more than a database driver. It is an open API built right on top of the JDBC industry standard where everyone collaborates to specify a common base onto which to build.

It even includes transformation libraries and testing facilities.

I want to kick the tires and use it right now


So far, it has two implementations ready to use. The Mondrian driver allows you to run the much acclaimed Mondrian open source OLAP engine as an in-process data provider.

There is also the XML/A generic driver that can connect to pretty much anything that talks XML/A, whether it's over HTTP or anything else you fancy using. This particular driver allows you to build applications that can switch to and from any of these OLAP engines :

  • Hyperion Essbase

  • Microsoft SQL Server Analysis Services

  • Infor

  • Mondrian

  • Palo


The Olap4j project is gaining momentum and we truly hope to see it become the standard in the Java world.

Wednesday, September 24, 2008

Chain blog… they cought me.

If you're reading this, you've been included in this blog chain.ChainBlog picture


 




  1. Take a picture of yourself right now.

  2. Don’t change your clothes, don’t fix your hair... just take a picture.

  3. Post that picture with NO editing.

  4. Post these instructions with your picture.


I was caught by three techno blogs I follow, so don't be mad at me, Thanks to Julian, Matt and Nick. I had no webcam at hand but i thought about my phone, so there it is: Luc at the office Wednesday morning, before his coffee.


Cheers!

Monday, August 11, 2008

Data integration challenges tackled

logo_kettle_lrg.pngData integration in business environments can be a painful task. I mean REAL painful. The volume of data is huge, it does not cross-validate, it is dispersed in many heterogeneous formats, yadi yada. You know the song. Some day, I stumbled on Pentaho Data Integration (PDI).This was a real breakthrough.

First thing first, it's not subject to "vendor lock-in". It can read most data formats out there and can write it back to pretty much anything. This is a huge plus because gives it the ability to be used by a plenitude of user types and environments. Being written in Java also gives it an edge as an enterprise tool, for it is platform agnostic.

But the real advantages are not those trivial specifications. My love for PDI has much deeper roots. Simply put : it's powerful. Creating an integration process is a trivial matter. Drag and drop. Link. Execute. Those three simple steps will cover most of your business needs. Really, I mean it. Never again will I write a snippet of code to read a CSV file and write it's content in a database. Mark my words; NEVER! This is a waste of time and a developer who lives with his times should know that.

What about the real juicy stuff ?

As you suspected, there is much more to PDI than meets the eye. It can be clustered, it can use a database based repository for all processes, there are automatic documentation generation tools and is supported by a huge community. Many tutorials exist to address most business needs and challenges. It's well made, very stable and easily expandable with plugins for power users.

I strongly recommend to give it a try. The next version should be released soon and it will include many great new features. I met Matt Casters last June and had the chance to see for myself all the new functionalities that will make it to the next release. We're talking about visual performance bottleneck exploration and some more neat stuff you won't find anywhere else.

Cheers, and have a good time integrating !!

Tuesday, July 29, 2008

Pentaho on the iPhone

iphone_pentahoI successfully added the iPhone extension to my Pentaho platform today and I was more then impressed with the ease with which we can enable the whole platform to work seamlessly on those nifty little phones.

Oh yeah, I bought an iPhone too...

I'm slowly discovering the fun of having a cellular phone in my pocket. This is something that I never experienced before; never had a cell phone. I have to say that I'm glad it's a good phone, and sexy too.

The bottom line is : get one.

For those interested, here's the wiki page that says it all. Thanks to Will Gorman, senior developer at Pentaho,  who put this all up.

Monday, July 21, 2008

Field lessons - Securing a Linux server

There is this old saying which goes like :

"Linux is safe enough to keep it vanilla. Anything you add weakens it's security."


Okay, this is not an actual popular saying, but since most Linux server I saw in my career were configured in conformity to this piece of wisdom (sic), I decided to share some experience with basic and mandatory security measures to add on a Linux server... I'm just sooo tired of fixing broken servers that have been hacked.

There is a simple suite of programs to install and you'll be at the very least secured against kiddies and the like. Here it goes.

Securing the OS

Most of the time, the piece of software that was hacked was the OS itself. Not because there are awful flaws in Linux (or just any OS as a matter of fact), but because simple rules were not respected. How many of you who have configured servers can certify that they are protected against brute force attacks ? How many are protected against DoS attacks ? Linux, nor any other OS I've seen so far (correct me as you wish...) don't come with DoS or BF detection. Having a secured SSH access is mandatory these days, but what's the point of setting passwords when a simple brute force attack will break it.

Here are some solutions. The Advanced Policy Firewall (APF) is a simple Linux firewall that uses the iptables utility to create firewall rules on your system. Why APF and not iptables alone ? Because it integrated with a DoS detection tool and Brute Force Detection (BFD). The DoS tool will detect any Denial of Service attacks while BFD will monitor incoming connections and ban any IP who breaks easy to setup access throttling rules All these tools are free and compatible with most Linux flavors. Try em out! There are many more available from R-fx Networks, the company that maintains them.

As for the setup instructions, google for them; as always. There are many nifty tutorials out there and I won't copy them here :P

Web Applications Security

What if I told you that there is a generic way of applying a minimum security level to all your web applications at the OS level, thus simplifying the life of anyone who administers web servers. You might get frustrated by the fact that you didn't know this at the time you got hacked. You might even wonder how wonderful this would be for your web hosting server.

Well, I'm doing it.

I'll say it.

Ready ? There it is.

ModSecurity

Okay, this was the hard part. Now it will be much easier. It's a simple Apache HTTPD module that you add to your web server configuration and it will validate all requests against a set of nifty threat detectors. It uses regular expression to protect your applications against overflows, injections and whatever might be dangerous for them.

There is even a console available to monitor many installations and keep an eye out for alerts.

Easy to understand, easy to install. As always, google has all your answers.

The bottom line

The lesson to remember is that these tools take half a day of work to setup and they will save you sooo much trouble in the future that it is worthless to discuss the pertinence of using them. The tools are out there, for free. You'd be a fool not to use them.

CQFD

Tuesday, July 15, 2008

Olap4j and XML/A - One more step towards a true olap systems integration API

olap4j_logoI've been working for a month now on some enhancements to the Olap4j project to make it more powerful and compatible. The good news is, I succeeded. The previous version, 0.9.5, lacked some basic functionalities which you would expect from a production ready XML/A driver.

For one, it's HTTP proxy didn't support cookies. his was a big problem since the myriad of requests required to populate Olap4j's meta data objects each created a new user session on the web service back-end. This was a no-no, but now it's fixed and kicking ass.

I also worked on a SOAP query cache. This is was a big piece of software engineering, since I'm not used to thread safe coding. Thread safe thignys are usually in the lower levels of BI application servers and those issues are tackled from the start. Thanks to Java's java.util.concurrent package, this was a breeze.

Those changes are not part of any release nor in the SVN yet. I'm still waiting for peer review before the whole commit, but for people eager to see what it looks like, I've created a neat little package for y'all.

Now I can move back to my next release of the University of Montreal's Pentaho platform... all work and no play makes Luc a dull boy.

Cheers !