Best Practices to Become a Data Engineer

Steps to go from doing data analysis to ingesting and cleaning data in order get better insights.

Ralph Brooks
4 min readMay 7, 2021
Image courtesy of pexels.com

Q: I come from a business intelligence background. I’m looking to make the transition to a data engineer. How do I go about doing this? Do I need to learn NumPy? Do I need to learn Pandas? What are the key concepts that I need to understand in order to ramp up on data engineering quickly?

If you’re doing business intelligence, maybe you’re working with data visualization tools such as Power BI or Tableau. Either way, you’re doing a lot of analysis. At some point, you will be ready to ingest new data so that you can derive richer, deeper insights — you’re ready to start the journey to become a data engineer.

AUTHOR NOTE:

See a full copy of the article at https://www.whiteowleducation.com/best-practices-to-become-a-data-engineer/

Master the basics first.

The following are the six main things that you need to do in order to get to the next level in your career as a future data engineer:

Step 1: Learn the Basics of SQL

If you have never done ANY programming before, then the first place that you want to start is by learning to sift through data. The way this is done is by learning a language called SQL (Structured Query Language). Among others things, SQL is a tool that can be used to look at relevant information in TABLES and to filter information with WHERE and SELECT statements.

GOOD RESOURCES TO GET STARTED WITH SQL

  1. khanacademy.org — Khan Academy has a good set of videos that goes through the basics of SQL. The videos cover how to select data, and how to join data together.
  2. Head First SQL — When I was first starting out, I definitely looked at one or two. Head First books published by O’Reilly. Head First covers the basics of a topic while focusing on different ways to engage the brain so that you learn the material quickly.
  3. SQL Cookbook — SQL Cookbook gives step by step instructions on ways to look at data (“recipes”) and different ways to think about how to analyze data. The book helps someone form an intuition about how to approach data analysis.
  4. Google Cloud Reference Documentation for Big Query

I am a big fan of jumping into the deep end of the pool and learning how to swim quickly. Frankly, there’s no better way to do this than Standard SQL with BigQuery in Google Cloud.

If you start your SQL journey with Big Query, then you are learning about Google Cloud technology while you are learning data analysis. If you are going this route, then a good place to start is to go through the BigQuery for Data Warehousing tutorial.

When you learn this Google version of SQL (Standard SQL), you will not only learn how to analyze data, but you will also learn how to make predictions on data. For example, you could use this flavor of SQL to predict sales with a linear regression. As another example, you could also use this language to do a basic prediction as to whether or not a customer will make a transaction with a logistic regression.

Step 2: Learn the Basics of Python

Python (the programming language) has nothing to do with a snake which has the same name (Image courtesy of pexels.com)

If you want to get to the next level in your career, it is almost essential to learn some type of programming language. Personally I would recommend learning Python. Python’s flexible and it is relatively straight forward to learn. You can use to process streaming data with data pipelines, to analyze data with Jupyter notebooks, and to build artificial intelligence models. For me, it is one language that does a lot, and it is super flexible.

These are some top resources to learn Python, but they are not all free resources. In some cases, you “get what you pay for”, and paying for something may save you a lot of time in the long run.

BEST RESOURCES TO GET STARTED WITH PYTHON

  • Learn Python 3 the Hard Way — White Owl Education has no affiliation with the author of “Learn Python 3 the Hard Way”, but when I was learning Python this one was one of the books that I used in order to ramp up. One note here — The book is very specific about what editor to use and how to get through the class. I would follow the instructions in the book to the letter without deviation.
  • Official Python Tutorial — The official Python tutorial (which is part of the reference documentation) is actually pretty good, but it doesn’t endorse any particular software for writing Python. Because of this, you’re still better off using a book like “Learn Python the Hard Way” before jumping into the official tutorial.
  • Official Python Documentation — Think of the official Python Documentation like a dictionary. You’re not going to read this front to back, but it is definitely a reference that you may want to use from time to time.

Take a look at https://www.whiteowleducation.com/best-practices-to-become-a-data-engineer/ for the remainder of the article which covers the following:

Step 3: Learn How to Navigate Code Quickly

Step 4: Learn the Basics of NumPy

Step 5: Learn the Basics of Pandas

Step 6: Learn How to Build Data Pipelines

--

--

Ralph Brooks

I am the CEO of White Owl Education. Our company just released a course on 3D data visualization. Details at https://www.whiteowleducation.com/