As I scroll through social media – LinkedIn, X, Reddit, and even Facebook, I see many posts and advertisements for various solutions in the data space. Some are for entirely new data solutions such as Snowflake, Databricks, and BigQuery. Others are for ETL/ELT tools offering to solve an age-old problem – collecting data in one place. Still, others are new flavors of tools many veteran data engineers were early adopters of – SQL Server, Oracle, PostgreSQL, and others.
The tools that haven’t changed in my 20 years in the data and technology space aren’t the database systems, the cloud options, or even ETL/ELT tools. The tools that haven’t changed are the foundational concepts all data engineering teams need to apply to ANY solution for it to be successful. The emergence of exciting new cloud-based technology has captivated many to chase the shiny new tool because we see a demo that solves our current problem effortlessly, or at least that’s what the sales guys tell us.
In the next few sections, I’ll discuss several “tools” that are still critical in today’s data engineering environment but seem to get lost at times in the whirlwind of emerging tech.
Begin With The End In Mind
In his book, The Seven Habits of Highly Effective People, Steven Covey lists one of the effective habits as “begin with the end in mind.” Applying this principle to data engineering simply means that before a single database, table, or ingestion pipeline is coded, take the time to determine what the final data solution should be. The “end” in this case doesn’t need to be a full ERD with all the tables and primary keys established; however, at least establish a rough idea of the end solution.
Asking the basic questions we all learned in elementary school can help:
- Who is the end consumer of our data?
- Where will the data live?
- What is the footprint of the initial data set?
You get the idea. Spending some time upfront to sketch a rough idea of the solution gives shape to the project and a north star to reference as work begins.
Rely On Your Expertise
Know your area and tool(s) of expertise and rely on them. With so many different tools and platforms coming to the marketplace daily, it is easy to get caught up in the shiny and new. I’ve been there recently – Snowflake did that to me. Snowflake is everywhere on my social feeds – I took the plunge, achieved my SnowPro Core Certification, and am a member of the Certification Subject Matter Expert (SME) team. It’s very cool technology; however, it is still new, and I’m still learning.
I am certainly not saying that learning new technology is detrimental; however, I am saying that constantly changing your team’s technology approach will impact the team’s ability to deliver. Isn’t embracing new technology a good thing? Yes – but are you suddenly an expert on that technology compared to other data tools? Probably not. Knowing a particular toolset and knowing it well goes a long way for data engineering teams when trying to solve complex data problems. By sticking with your expertise, the time is spent on solving the problem with the data, not tracking down if the limitation is the data or the technology.
Documentation
Ugh! Really? Documentation? Yes, every developer hates it. I certainly do. But even some rough comments in code, a Jira ticket, Wiki, or design document will go a long way. Your “future self” will thank you.
I’m a drummer and played for several band leaders over the years. One of them had a saying that is applicable here: “A short pencil is always better than a long memory.” Applying this to data engineering means even rough documentation is better than relying on tribal knowledge of how a process came together. As issues arise in the future (and they will show up), even a rough outline of what a process is doing, or how data flows through the system helps track down a bug.
It doesn’t have to be pretty out of the gate. I’ve applied this recently with my team. Our documentation tool of choice is Confluence and it provides a really simple WYSIWYG interface for a wiki – but the part I like best is how easy it is to edit, update, or even move entire documents within the space. Simply “get it in there” and make it pretty as time permits.
Wrapping Up
Not all data engineering work is fun stuff like coding procedures, building ETL/ELT flows, or even learning new data technology. Some of the most “boring” and basic work like planning and documentation continue to pay dividends in the long run.
The prep work should be proportional to the overall task at hand. Spending weeks planning for a project that ends up being a single database with five tables and a few stored procs is likely overkill. But the prep work should still occur so the team knows where to go.
Don’t sacrifice the basics for the promise of a magic bullet.
Follow me on LinkedIn and Medium for more content on Data Management and demos including Snowflake, Streamlit, and SQL Server.