Aug 16, 2023 Rahul Sharma
The modern data stack in 2023 is filled with bespoke tools for each component in the data engineering journey, from the “E(xtract)” to the “L(load)” to the “T(ransform)”. While tooling in “T” is relatively new (Hello dbt!), “EL” products have enjoyed a consistent user adoption over the years, from SSIS to Airbyte.
Courtesy: LakeFS
While the landscape has certainly been a bit overblown (see above), the modern data stack still boils down to the plain old ELT. As disk becomes cheaper, enterprises are happy to move and preserve historical data in the Data Lakes, and transform them into useful data products for BI and AI downstream.
In this post, we will focus on the “EL” of “ELT” and how different tools come together to accomplish it.
The first step of the modern data stack is to ingest the data from different sources (e.g., on-prem SQL Server, Third-party APIs, flat files). The ingestion itself can be divided into two steps:
As a developer, the first instinct would be to use the corresponding SDKs of source and destinations to write the code and perform the ingestion. However, this quickly becomes inefficient due to a few reasons:
This created a market for low/no code ingestion tools to allow both technical and non-technical users to easily ingest data from a multitude of sources. Prominent tools in this market are Fivetran, Airbyte, and Stitch to name a few. As of writing, both Airbyte and Fivetran support 350+ connectors (e.g., Azure SQL, AWS S3, Zendesk, Google Sheets) [1][2]. Below is a detailed comparison of the two.