Search
What is Data Engineering
Data engineering is the less famous sibling of data science. Data science is growing like no tomorrow, as is data engineering, but it is much less heard of. Compared to existing roles, it would be a software engineering plus business intelligence engineer, including big data capabilities across the Hadoop ecosystem, streaming, and computation at scale.
As businesses create more reporting artifacts, the need to collect, clean, and update data near real-time is increasing, driving daily complexity. That said, more programmatic skills are required, similar to those in software engineering. The emerging language at the moment is Python (more The Tool Language, Python) which is used in engineering with tools identical to Apache Airflow, Dagster, other Data Orchestrators, and data science with powerful libraries. Today, as a BI engineer, you use SQL for almost everything except when using external data from an FTP server, for example. You would use bash and PowerShell in the nightly batch jobs. But this is no longer sufficient, and because it requires a full-time job to develop and maintain all these requirements and rules (called pipelines), data engineering is needed.
# Further Reads
- History of Data Engineering, see my book chapter: The History and State of Data Engineering - 📖 Patterns of Data Engineering
- Data Engineering Concepts
- Data Engineering Vault
- Getting Started with DE
- Community and Learning with books, people, glossaries, RSS feeds, Whitepaper, Blogs, YouTube and Learning Data Engineering>
Origin: Data Engineering