Data governance encompasses a set of processes, roles, policies, standards, and metrics that are vital for the effective and efficient utilization of information in achieving an organization’s goals. It lays out the framework for ensuring data quality and data security within a business or organization. Essentially, data governance outlines who is authorized to perform specific actions on various data sets, under particular circumstances, and by which methods.
From Data & AI Landscape 2021:
The importance of tracking data across different repositories and pipelines is escalating, particularly for troubleshooting, compliance, and data governance purposes. This underscores the necessity for data lineage. The industry is preparing for these demands with initiatives like OpenLineage by Dakin, a cross-industry effort to standardize data lineage collection. For more insights, check out my Fireside Chat with Julien Le Dem, CTO of Datakin*, a key figure in the OpenLineage initiative.
The area of data access and governance is another critical component of DataOps (in its broadest sense) witnessing significant growth. Established startups like Collibra and Alation have offered Data Catalog solutions for some time, essentially providing a comprehensive inventory of data to aid data analysts in locating the necessary data. Recently, new players such as Atlan and Stemma, the enterprise behind the open-source data catalog Amundsen (originally developed at Lyft), have entered the market.
The role of data governance is becoming increasingly critical as the volume of stored data expands, necessitating a comprehensive overview of data, its timeframe, and quality.