Search

Search IconIcon to open search

What is Data Engineering

Last updatedUpdated: by Simon Späti · CreatedCreated: · 2 min read

Data engineering is the less famous sibling of data science. Data science is growing like no tomorrow, as is data engineering, but it is much less heard of. Compared to existing roles, it would be a software engineering plus business intelligence engineer, including big data capabilities across the Hadoop ecosystem, streaming, and computation at scale.

As businesses create more reporting artifacts, the need to collect, clean, and update data near real-time is increasing, driving daily complexity. That said, more programmatic skills are required, similar to those in software engineering. The emerging language at the moment is Python (more The Tool Language, Python) which is used in engineering with tools identical to Apache Airflow, Dagster, other Data Orchestrators, and data science with powerful libraries. Today, as a BI engineer, you use SQL for almost everything except when using external data from an FTP server, for example. You would use bash and PowerShell in the nightly batch jobs. But this is no longer sufficient, and because it requires a full-time job to develop and maintain all these requirements and rules (called pipelines), data engineering is needed.

# Further Reads


Origin: Data Engineering