Elephants can’t run as fast as the Cheetah: Why we re-engineered the BI technology stack for Instant Analytics on Big Data.

If you are even remotely connected with data, analytics/BI, or data driven decision-making, it is almost certain that you have heard about Big Data and its silver bullet for analytics: Hadoop. However, for all of Hadoop’s capabilities, it did not help us kill the fundamental problems that plagued the BI world for ages.

This is a story of why DataRPM re-engineered the BI technology stack for Instant Analytics.

But before we tell our story, let’s go back to the very beginning. Many years ago Business Intelligence, or BI as it is commonly known, was born to cater to a simple need: enable any decision maker in any organization– big or small — to analyze any data they own, any way, anywhere and at any time to get instant answers to their business questions. Thereby help them derive informed decision-making.

So the very essence of BI lies in agility!

But, the traditional BI tools failed to deliver on this promise. The reasons were many, starting with the dependency on inflexible and fixed schema data warehouses that required significant upfront effort in data modeling resulting in long deployment cycles. Add to that the complex interfaces requiring a huge learning curve that made BI unusable by the decision makers. Also the complicated setup process brought in significant dependency on IT and technology teams, and the expensive licenses and costly infrastructure needs affected affordability. Combined, all of these factors made BI inaccessible to even mid-sized companies, let alone small businesses. And even for those who could afford BI, a significant percentage (70-80%) of the deployments resulted in failure, as also shown by the research done by Gartner.

Things improved with the emergence of the data discovery software  and the early cloud BI, which attempted to solve the user experience and cost problems of traditional BI.  They tackled the learning curve with more intuitive user interfaces, introduced simpler data modeling and data integration processes and offered relatively affordable price points for small volumes of data.

Then the world changed! Suddenly, there was an explosive growth of digital data, fuelled by social web, mobile and other new-age media. Each year, the world started generating more data than it had seen in the entire human history. Businesses now had access to huge volumes of data, generated from a wide variety of sources and flowing in at amazing velocity. And, they wanted their BI tools to analyze this, which came to be known as Big Data.

No BI solution was prepared for this revolution, even the new age and cloud BI players! And there is a good reason: the technology stacks that existed in the BI world weren’t designed for this! Ever since its inception, BI technology has been dependent on relational databases (RDBMS) as the underlying storage solution. All innovations were built as layers on top of RDBMS. However, Big Data challenged this very RDBMS based storage layer, with which it became increasingly difficult to keep pace with the volume, velocity and variety of Big Data.

It was then that Hadoop came to the rescue. Hadoop emerged as an open source adaption of the Big Data storage (GFS) and the parallel processing (Map-Reduce) technologies pioneered by Google. Naturally, Hadoop rose to fame as the tamer of Big Data and received the (mis)interpreted accolade as the single solution to all modern analytics needs on Big Data.

…. And this is where DataRPM’s story starts.

The founders of DataRPM come from rich business intelligence, data analytics, data warehousing, data science and big data backgrounds. We have been closely involved with BI deployments in the past and understand the critical pain points that make CxOs of organizations cringe when they hear BI, even today! Gartner, Forrester, and IDC all repeatedly point to the three key pain points of BI: time to market, complexity, and negative ROI. And, we know that these pains have worsened since the advent of Big Data.

This is why we set out on a mission to solve these issues, we ourselves have experienced on numerous occasions, by making BI:
a)  Plug and Play for any data
b)  Super Fast for ad-hoc analysis
c)  Dead Simple for analysis as well as for deployments
d)  Seamlessly scale to big data and handle dynamic changes
e)  Affordable…even to small businesses

Using an email client is something that any professional is at ease with – just point to your email server and you are done – it’s all intuitive! We wanted BI to be that simple for any user, not just experts. Our vision was that any BI user should be able to login from the browser, point to data sources and that’s it.. After that, users can intuitively analyze any data, of any size, anywhere, slicing-dicing, drilling, searching from any angle without requiring any code or configuration; all ad-hoc and in real-time with built-in social media for in-place discussions and collaborations around data.

Hadoop, because of its initial buzz, quite naturally drew our first interest, as handling big data analysis was among its key strengths. As technologists, we love the power Hadoop offers to developers.  However, we soon realized it would never solve the usability, speed, agility and search-ability needed to provide our users with instant analytics. Further, we realized Hadoop is a programmers’ tool. In fact,  it has taken analysis further away from people who are not fluent with code. Even analysts have started to depend more on the programmers and the IT team for their analysis, let alone the business users.

The guiding principle of DataRPM is speed and agility. It was evident to us  that Elephants don’t run fast enough and we needed the speed and agility of a Cheetah for our requirements.

We evaluated all existing BI technology stacks and came to the conclusion that it required a complete re-thinking of the stack from the ground up to achieve the disruptive solution that we wanted to create. Traditional databases clearly didn’t meet the flexibility, speed and scalability requirements. And in-memory storage was expensive to scale.

So, we went back to the drawing board and drew upon our diverse technology experiences.  Then, we hit our Eureka moment!  We had overlooked the original big data technology we all know: the search engine technology.

The search engine technology, as leaders like Google, Yahoo and Microsoft have demonstrated, is a time-tested technology from the consumer web world that always handled unstructured and big data storage with ease.  It also delivers real-time responses to ad-hoc and diverse queries, all from just commodity hardware. We have worked extensively on search technologies in the past — even founded and taken a vertical search company to a successful exit. So the application of the big data handling capability of  search technology to solve the problems of BI became a clear choice.

But, search technology cannot deliver BI and Analytics out of the box. This is where we innovated with our homegrown Analytics Engine, quite aptly code-named the Cheetah.  The Cheetah sits on top of the search engine layer and enables real-time statistical computations and aggregations, in-memory, in a parallel and distributed manner.

The choice of search-based technology helps us deliver real-time responses on data of any size for any ad-hoc query.  Add to that the ability to use inexpensive commodity hardware, which is why we can offer such a powerful solution at such low price points to our customers. Also, the search index storage layer enables us to support flexible schema and handle dynamic structural changes to the data. The other huge advantage is that it helps us automatically add a layer of security to our storage:  the data doesn’t reside in SQL queryable storage, but in highly fragmented and distributed binary index files that can be accessed and understood only by Cheetah.

So, does DataRPM have a place for Hadoop in our stack? Of course,  we love Hadoop, and we use it where its strengths lie. We use Hadoop as our ETL engine behind the scenes and to do data mash-ups across different data sources.

With this innovative technology stack, we have been able to achieve the goal of creating a truly disruptive and extremely affordable Big Data Instant Analytics platform that actually cures the problems plaguing the BI world for ages and further enables anyone to now build their own analytics solution in minutes and deploy in seconds, to their external or internal customers.

And there is lot more to it. But more on that later!

You must see it to believe it. Contact us today for a live demo!