Deploy a Machine Learning model on AWS Lambda with Zappa

This guide explains how to deploy a simple Machine Learning model as a REST API on AWS Lambda using the helper framework Zappa.

This is the same method that I use to deploy the free-to-use models at In fact, you can try out the model we'll build in this tutorial right now by visiting the following link:

You'll see some data in  your webpage that looks like this:

  "inputs": {
    "petal_length": "2", 
    "petal_width": "3.5", 
    "sepal_length": "1", 
    "sepal_width": "2"
  "predicted": "setosa"

If you're new to the idea of web applications, serverless functions, or REST API's, here's a high-level overview of what's happening:

  1. When you visit the url above, AWS will run a "one-off" piece of code to determine what data to send back. That one-off piece of code is called a Lambda Function, and AWS makes it easy to use any Python code you want.
  2. The code that the Lambda Function is running will parse the data from the end of the URL (ie sepal_length=1), and pass that data into a trained Machine Learning model. The model will then use that data to calculate a prediction, in this case setosa.
  3. The Lambda Function then returns that prediction to the web browser that visited the url, in the JSON format you see above. The whole process happens  in just a fraction of a second.

Of course, you don't have to visit the URL in a web browser. You could also use the  model's prediction directly in a script.

import requests

url = ''
params =  {
  'petal_length':  2,
  'petal_width': 3.5,
  'sepal_length': 1,
  'sepal_width': 2
resp = requests.get(url, params=params)
print(resp.json()['prediction']  # -> setosa

So with AWS Lambda, it's easy to publish your Machine Learning models so that they're publicly available, without requiring the user to download the whole model or retrain it from scratch.  (You can also see the code I've used on Github).

Using AWS Lambda to deploy your Machine Learning models has a few advantages over hosting your own web server. First, it's usually cheaper, since you only have to pay for each request that comes in  (and you get about 1 million free  requests per month). Second, AWS Lambda is highly reliable and highly scalable, so you can have many users utilize your model at the same time without having to worry that your own server can handle the traffic spikes.

The biggest downside, however, is  that the Lambda Functions have a 500MB cap on the size of the whole project, including package dependencies like Pandas or SciPy and the project's trained models. While 500MB is more than enough for many use cases, it's possible you'll run into issues if you're using lots of package dependencies or your trained models are very large.

Despite that limitation, I've been very happy with how easy it is to use AWS Lambda, so I wanted to create this guide for anyone else looking to quickly and easily make their models deployable over a REST API.


To follow along with this guide, you should be somewhat comfortable with the Python framework Flask, although you definitely don't need to be an expert. You should also be comfortable with the fundamentals of Machine Learning, including how to train a model SciKit-Learn and use it to make predictions.

You'll also need an environment with Python 3.7 installed and an AWS account. Make sure that you have an AWS Access Key ID and AWS Secret Access Key which have "admin" permissions for  your AWS Account, and that those keys are stored in your AWS credentials file.

$ cat ~/.aws/credentials

aws_access_key_id = ...
aws_secret_access_key = ...

Ensuring your credentials are available is important because we'll be using the wonderful Zappa framework for setting up our AWS Lambda Function and some related services.

While it might seem overwhelming to start with a whole new framework on top of using AWS Lambda for the first time, Zappa makes creating and updating your serverless functions a breeze, and is highly recommended. You won't need to use more than two Zappa commands for this tutorial, and it will save hours of time configuring the necessary AWS services.

Additionally, Zappa provides multiple very useful features on top of AWS Lambda that are especially relevant for deploying Machine Learning models. Those features include preventing "cold restarts" of your function, so that any data loading your models need to do only happens one time, rather than during every request, enabling your Function to return predictions in milliseconds.

Best of all, Zappa will automatically store the project's relatively large dependencies in S3, and download them the first time the function needs to run, allowing your project to be up to 500Mb large. Without that feature, you could only store up to 10MB if you deployed your function through AWS Lambda's web interface, which isn't enough space if you need to use a library like Scikit-Learn.

So that's enough preambling, let's get started!

Install dependencies

Create a  new directory for this project, and then set up a virtual environment that uses Python 3.7.

mkdir iris_model
cd iris_model
virtualenv venv --python=python3.7

Activate the virtual environment.

source venv/bin/activate

Install the dependencies for this project, and then reactivate the project so that the new executables installed by Flask and Zappa to your environment.

pip install zappa scikit-learn flask
source venv/bin/activate

Create a script to train your model

We'll just use SciKit-Learn's iris dataset and decision tree classifier to build a simple classification model. Additionally, we'll save the model using the joblib library that gets installed with SciKit-Learn.

Copy the following into a file called in the top level of your project's directory.


import joblib
from sklearn.datasets import load_iris
from sklearn import tree

X, y = load_iris(return_X_y=True)
clf = tree.DecisionTreeClassifier()
model =, y)
joblib.dump(model,  'iris.joblib')

Run the script to train the model and save it into a file called iris.joblib.


Later, we'll load that saved model using joblib.load.

Set up a basic Flask app

We'll create a simple Flask app with a single route that loads the saved model and calls the its predict method.

Place the following into a file in the top-level of your project's directory.

from sklearn.datasets import load_iris
from flask import Flask, request, jsonify
import joblib

app = Flask(__name__)

def iris():
    model = joblib.load('iris.joblib')
    data = request.args
    # Convert data sent in the request into a 
    # list containing a single tuple, so that we can
    # pass the data to the sklearn.predict function
    inputs = [(
    # Calculated the predicted value
    predicted_value = model.predict(inputs).astype(float).tolist().pop()
    predicted_label = load_iris()['target_names'][int(predicted_value)]
    return jsonify(inputs=data, predicted=predicted_label)

if __name__ == "__main__":

You can test that the Flask app is working correctly by running the app locally.

FLASK_ENV=development flask run

Now use the following link to test that the Machine Learning model's predictions come back correctly. Note that if you don't include the query params on the url, you'll get back an error message that says something similar to 400: Bad Request.

Once your project is working correctly in your local environment, it's time to deploy it on AWS Lamba.

Initialize Zappa

As mentioned before, we'll be using Zappa to help us deploy to AWS. Zappa provides the init command, which will configure all of the necessary AWS resources to deploy your function. Those resources include an S3 Bucket, the Lambda Function itself, and also an AWS API Gateway to control traffic coming into your Lambda Function.

zappa init

The command above will prompt you for several pieces of input.

  • Environment: select the name production. Zappa allows you to deploy to different "environments," so that you can have a staging version of your Lambda Function to test any changes before going into production. Since we're only creating one instance of the Lambda Function, we'll call it production.
  • Bucket name: accept the default. This is the bucket where Zappa will store your project's dependencies.
  • Import path to your Flask app: select Will automatically detect that this is a Flask app, and also identify the import path to the Flask instance (ie app).
  • Global deploy: select No since you won't need that for this tutorial. Using the global deploy will instruct Zappa to configure an AWS Cloudfront distribution, which will incur some small charges.

Once the configuration has finished running, you should have a new file in your directory called zappa_settings.json.  

You'll want to add one configuration value to that file: Ensure that the slim_handler key has the value true.

    "production": {
        "slim_handler": true

That setting ensure that your project's dependencies are minified as much as possible before being deployed.

Deploy the application with Zappa

Your application is now ready to be deployed.

zappa deploy production

At the end of the deploy, Zappa will output a URL that you can use to access your application. It should look something like this.

Append the following path to the end of that url, and see what response you get back.


Congratulations! Your model is now publicly available and deployed for anyone to use.

If you ever want to update the model, you can use the command zappa update.