So far in our MLOps journey, we have created ML research and ML model-building pipelines as well as saved them in serialized form. Saving models this way allows us to now take that serialized ML model and load it into an application.

We will now take the saved ML model and deploy it to an AWS lambda. Lambdas come in two flavors, Docker and file-based. In our case, we will discuss briefly the file-based option only and focus on the Docker-based deployment. My preference for choosing the Docker-based approach is simply the ease and smooth deployment process.

Containers

Docker is a…


Photo by Joshua Newton on Unsplash

In previous articles, we gained the basics of MLOps and set up our orchestrator. Now we will put together our Python applications and interactions with Databricks Spark.

Why Databricks Spark?

We will be using Databricks Spark as a general platform to run our code. In some cases, we will Spark to run code across the “cluster”. …


Photo by Gustavo Espíndola on Unsplash

In our first article, we introduced the basics of MLOps, now we will talk about our core application in our tech stack, Airflow. Airflow will be the central orchestrator for all batch-related tasks.

Swapping technologies

This tech stack is designed for flexibility and scalability. There should be no issues using alternative tooling. For example, if you wanted to replace Airflow you should have no issues swapping it out for another pipelining tool like Dagster or a cloud-specific pipeline tool.

Why Airflow?

Airflow does very well as a batch processing orchestrator because it’s not a GUI, it’s simple to use and yet very flexible. Airflow…


Photo by Sigmund on Unsplash

MLOps?

MLOps (Machine Learning Operations) is the practice of combining the lessons learned from DevOps for the productionisation of machine learning. Its role is to fill the gap between the data scientist and the machine learning consumers.

Machine Learning? Data Science?

Machine Learning can be understood as the process of applying a set of techniques to a group of data to create a limited “Picture of how the world works”, called a model. This process of creating a model is called training your model. Once you have a trained model you can use that model with new data to better understand the past (Data Mining)…


Photo by Vincentiu Solomon on Unsplash

“Do not collect weapons or practice with weapons beyond what is useful.” Miyamoto Musashi, Dokkodo

Students of the Ichi school Way of Strategy should train from the start with the (normal) sword and the long sword in either hand. This is a truth: when you sacrifice your life, you must make fullest use of your weaponry. It is false not to do so, and to die with a weapon yet undrawn. Miyamoto Musashi, The Book of 5 Rings

“Absorb what is useful, discard what is useless and add what is specifically your own” Bruce Lee

The Lakehouse

Databricks introduced the Lakehouse to…


When working on multiple Python projects it's common to run into issues with Python versioning, and package management. I am going to introduce two projects to help you tackle these common issues. I’m not going to take about the Conda project, simply because in my experience 90% of the time you run into significant issues with Conda and pip resolving package dependency issues. My approach here should work 100% of the time and allow you to control your Python environment fully.

Pyenv

Pyenv a the project you can use to control the Python’s version. More often than not issues can arise…


Photo by Fikri Rasyid on Unsplash

When starting a new project, it's a good idea to evaluate your data storage needs. I’m going to shy away from the term database and instead, I’ll use the term data store because oftentimes labels are loaded with baggage that will distract us. Before we begin I will warn you that I‘m going to be rather broad with how I define many concepts.

Before we even start out talking about data stores, let's first discuss a few general concepts.

OLTP

Online Transaction Processing

An OLTP will focus typically on small transactions or a unit of work treated as one unit. You…


History

Data Engineering is a relatively new concept, although the skills have been around for some time. If you Google around you will find that the skills, tools, and job responsibilities will vary significantly. My approach is a broad, modern approach to the data engineering role. Many hyperspecialized roles also exist such as Data Warehouse Developer or Big Data Developer. Although those are key components of a modern data engineer, they are but pieces of a larger picture.

Philosophy

My modern data engineering philosophy has 6 pillars:

  • Open Standard over closed propriety tools and languages:

Whenever engineering a solution, following open standards…


Photo by Phil Hearing on Unsplash

If you believe it, they believe it.

267th Ferengi Rule of Acquisition

All war is deception

Sun Tzu

The dangers of testing

Let's face it testing software can be hard. Even with the best intentions, our tests can easily break. This phenomenon is called brittle unit tests.

Brittle Unit tests

Unit testing has a very bad reputation, and I believe one issue so many very talented developers have with unit testing is that it's very easy to write brittle, and tightly coupled tests.

My current philosophy on testing is to focus on requirements and test what the code accomplishes not how it gets there. What I believe…


Photo by Joshua Sortino on Unsplash

Note: I have avoided discussing the many possible Spark options available on the market and instead, I am focusing on Databricks, and this is because they offer a very good easy to use product and they are vendor-neutral. In the article, I will refer to Spark on Databricks simply as Spark.

TLDR

Snowflake is a cloud-based vendor-neutral easy to use high concurrency data warehouse in the cloud. Both products follow the adage that you pay for what you use. Snowflake is really a modern iteration of the classic data warehouse. Your Snowflake data and resources will live as a tenant…

Brian Lipp

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store