🧠Second Brain
Search
How data teams struggle to build an Analytics API
Cloud architecture is more complex than ever, especially with the latest explosion of tools and technology. Today, every data team wants data to be readily available to decision-makers in the company. Whether a Data Analyst, Product Manager, Data Scientist, Business or Data Analyst approaches them, it’s hard to provide a single interface to abstract all heterogeneous data stores away and let them query all the data. On top of that, new principles and architecture are picking up old ideas, for example, decentralised data products in Data Mesh and a centralised cloud data warehouse.
Xavier Gumara Rigol from Adevinta says that each dataset should have at least two interfaces with SQL as fast access and programmatic access via notebooks if more complex processing is needed.
On the other hand, if you have a single Postgres database or any other simplified architecture, it probably doesn’t make sense to build and route it through an Analytics API. Let’s have a look at different data teams nowadays and with what they struggle today:
- Machine Learning folks want an API to experiment with particular data within a Jupyter Notebook.
- Business Intelligence users need to report how the company is doing with their dashboard tool of choice. They need a SQL Connector. Response time must be within seconds as they want to slice and dice in real-time and demo the numbers in meetings. If possible company-wide KPI’s are already precalculated and ready to use.
- Power-users want to update and fix some incorrect data. They need an interface or clear documentation of how to do that. And more importantly, whether they are doing it on a data lake, an OLAP cube, configs, etc., shouldn’t matter.
- Internal applications and pipelines that apply the product logic with different requirements include ingesting new data, fixing invalidate states, automatic maintenance such as compacting massive data sources or implementing complex business logic.
- External customers want to extract data for their data warehouse.
- Managers want to see the overall numbers at a glance.
As these stakeholders have different use-cases and skills, it is tough to support them all. With a standardised GraphQL interface validated on the spot and documented build-in, we have the best approach today. It is also a chance to make updates consistent and save, instead of getting direct access to people :fire_engine:.
Authorisation and authentication are noteworthy instead of creating new groups and users in every system. It’s essential to implement that once. But that is very hard if you do not have such an API. Of course, you could integrate your identity and access management solution, but baked-in in the central API and with GraphQL is a pragmatic and elegant way.
Origin:Aa
Building an Analytics API with GraphQL: The Next Level of Data Engineering? | ssp.sh
References: Analytics API
Last Modified: 2022-02-19
Created 2022-02-19