Why replace Traditional OLAP Cubes?
In my job as a business intelligence specialist and data engineer, I was always using OLAP-Technologies for quick ad-hoc analysis with tools like Tableau, TARGIT, Power BI, Excel, etc. As the Microsoft BI stack is still used widely in small to large companies, the technology for it is mostly SQL Server Analysis Services (SSAS) or Microsoft Cubes. These are a compelling and fast way to aggregate and pre-calculate all of your corporate-related data and make it available to your operational business users. Nevertheless, these cubes are getting more and more to their limits, as you might have experienced in your own company. There are several problems I see or encountered:
- The multi-dimensional model of Analysis Services is not supported in the Azure Cloud and also does not look like it will soon (there is the tabular model of Microsoft Cubes, but they still have some limitations).
- There is a limit to the size of data it can process fast. It was developed a long time ago it is not optimized for the vast amount of data nowadays.
- Does not fit in the open-source and big data ecosystem due to:
- It is not distributed and cannot be parallelized to any new modern technologies (containers, clusters, etc.).
- It uses Microsoft Visual Studio to add or edit the cubes, which is hard to source-control and makes it impossible to modify for any other data-savvy person apart from the BI-Specialist.
- The query language MDX is considerably hard to understand and difficult to write more complex queries.
Therefore, I was researching alternatives on the World Wide Web. But before I go into more dept what I found, I want that you look at the architecture as a whole. One reason is that nowadays countless people wish to make use of data, that’s why you might want to make it available to all of them or at least have a more open architecture. In my opinion, Data Lake is the buzzword for it where everyone easily can access data instead of BI-Specialists only or worse, infrastructure guys having access to the FTP-Server.
To illustrate this I like the architecture from [DWBI1], which shows the architecture as generic as possible:
Generic big data architecture
What you also see is the difference between corporate data that traditionally goes into the Data Warehouse (DWH) and the real-time streaming data like social media, and IoT that goes into the data lake. As Data Lakes makes it easier to access data for data-savvy persons, you should always keep in mind where you put which data. If you want to know more about that topic, read my other blog post about [DataWarehouse vs Data Lake].
Back to the OLAP Cubes which would be on top of the Data Warehouse in the architecture above as In-Memory-Models or Semantic Layer. You understand OLAP is still used for essential tasks such as Data Exploration, Analytics, and Self-Service BI. I would say that Data Lakes do not have enough speed for any of these needs where speed and fast response time is critical in most organizations. On the other hand, Data Lakes are very beneficial for Data Scientists because they are interested in all the data without seconds of query-response time. Therefore a Data Scientist can explore, analyze, and run the machine learning models on top of the Data Lake very conveniently.
But what is coming up next now after, e.g. Microsoft Analysis Services? In the following chapter, I show you the possible different technologies to replace it with modern, scalable and fast replacements.