Examples

This section provides a usage example for each flavor. All modules referenced in the examples below can be found in the examples folder of the github repository.

The interface for utilizing the model as a pyfunc type for generating predictions uses a single-row Pandas DataFrame configuration argument. Refer to the API documentation for a description of the supported columns in this configuration Pandas DataFrame.

Orbit

This example trains an Orbit Bayesian ETS model using the iclaims dataset which contains the weekly initial claims for US unemployment benefits against a few related Google trend queries from Jan 2010 - June 2018.

Installation

pip install mlflavors[orbit]

Model logging and loading

Run the train.py module to create a new MLflow experiment (that logs the training hyper-parameters, evaluation metrics and the trained model as an artifact) and to compute forecasts loading the trained model in native flavor and pyfunc flavor:

import json

import mlflow
import pandas as pd
from orbit.models import ETS
from orbit.utils.dataset import load_iclaims
from sklearn.metrics import mean_absolute_error, mean_absolute_percentage_error

import mlflavors

ARTIFACT_PATH = "model"

with mlflow.start_run() as run:
    df = load_iclaims()
    date_col = "week"
    response_col = "claims"

    test_size = 52
    train_df = df[:-test_size]
    test_df = df[-test_size:]

    ets = ETS(
        response_col=response_col,
        date_col=date_col,
        seasonality=52,
        seed=8888,
    )
    ets.fit(df=train_df)

    # Extract parameters
    parameters = {
        k: ets.get_training_meta().get(k)
        for k in [
            "num_of_obs",
            "response_sd",
            "response_mean",
            "training_start",
            "training_end",
            "date_col",
            "response_col",
        ]
    }
    parameters["training_start"] = str(parameters["training_start"])
    parameters["training_end"] = str(parameters["training_end"])

    # Evaluate model
    y_pred = ets.predict(df=test_df, seed=2023)["prediction"]
    y_test = test_df["claims"]

    metrics = {
        "mae": mean_absolute_error(y_test, y_pred),
        "mape": mean_absolute_percentage_error(y_test, y_pred),
    }

    print(f"Parameters: \n{json.dumps(parameters, indent=2)}")
    print(f"Metrics: \n{json.dumps(metrics, indent=2)}")

    # Log parameters and metrics
    mlflow.log_params(parameters)
    mlflow.log_metrics(metrics)

    # Log model using pickle serialization (default).
    mlflavors.orbit.log_model(
        orbit_model=ets,
        artifact_path=ARTIFACT_PATH,
        serialization_format="pickle",
    )
    model_uri = mlflow.get_artifact_uri(ARTIFACT_PATH)

# Load model in native orbit flavor and pyfunc flavor
loaded_model = mlflavors.orbit.load_model(model_uri=model_uri)
loaded_pyfunc = mlflavors.orbit.pyfunc.load_model(model_uri=model_uri)

# Convert test data to 2D numpy array so it can be passed to pyfunc predict using
# a single-row Pandas DataFrame configuration argument
X_test_array = test_df.to_numpy()

# Create configuration DataFrame
predict_conf = pd.DataFrame(
    [
        {
            "X": X_test_array,
            "X_cols": test_df.columns,
            "X_dtypes": list(test_df.dtypes),
            "decompose": True,
            "store_prediction_array": True,
            "seed": 2023,
        }
    ]
)

# Generate forecasts with native orbit flavor and pyfunc flavor
print(
    f"\nNative orbit 'predict':\n$ \
    {loaded_model.predict(test_df, decompose=True, store_prediction_array=True, seed=2023)}"  # noqa: 401
)
print(f"\nPyfunc 'predict':\n${loaded_pyfunc.predict(predict_conf)}")

# Print the run id wich is used for serving the model to a local REST API endpoint
# in the score_model.py module
print(f"\nMLflow run id:\n{run.info.run_id}")

To view the newly created experiment and logged artifacts open the MLflow UI:

mlflow ui

Model serving

This section illustrates an example of serving the pyfunc flavor to a local REST API endpoint and subsequently requesting a prediction from the served model. To serve the model run the command below where you substitute the run id printed during execution of the train.py module:

mlflow models serve -m runs:/<run_id>/model --env-manager local --host 127.0.0.1

Open a new terminal and run the score_model.py module to request a prediction from the served model:

import pandas as pd
import requests
from orbit.utils.dataset import load_iclaims

df = load_iclaims()
test_size = 52
test_df = df[-test_size:]

# Define local host and endpoint url
host = "127.0.0.1"
url = f"http://{host}:5000/invocations"

# Convert DateTime to string for JSON serialization
test_df_pyfunc = test_df.copy()
test_df_pyfunc["week"] = test_df_pyfunc["week"].dt.strftime(
    date_format="%Y-%m-%d %H:%M:%S"
)

# Convert to list for JSON serialization
X_test_list = test_df_pyfunc.to_numpy().tolist()

# Convert index to list of strings for JSON serialization
X_cols = list(test_df.columns)

# Convert dtypes to string for JSON serialization
X_dtypes = [str(dtype) for dtype in list(test_df.dtypes)]

predict_conf = pd.DataFrame(
    [
        {
            "X": X_test_list,
            "X_cols": X_cols,
            "X_dtypes": X_dtypes,
            "decompose": True,
            "store_prediction_array": True,
            "seed": 2023,
        }
    ]
)

# Create dictionary with pandas DataFrame in the split orientation
json_data = {"dataframe_split": predict_conf.to_dict(orient="split")}

# Score model
response = requests.post(url, json=json_data)
print(f"\nPyfunc 'predict':\n${response.json()}")

Sktime

This example trains a Sktime NaiveForecaster model using the Longley dataset for forecasting with exogenous variables.

Model logging and loading

Run the train.py module to create a new MLflow experiment (that logs the training hyper-parameters, evaluation metrics and the trained model as an artifact) and to compute interval forecasts loading the trained model in native flavor and pyfunc flavor:

import json

import mlflow
import pandas as pd
from sktime.datasets import load_longley
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.forecasting.naive import NaiveForecaster
from sktime.performance_metrics.forecasting import (
    mean_absolute_error,
    mean_absolute_percentage_error,
)

import mlflavors

ARTIFACT_PATH = "model"

with mlflow.start_run() as run:
    y, X = load_longley()
    y_train, y_test, X_train, X_test = temporal_train_test_split(y, X)

    forecaster = NaiveForecaster()
    forecaster.fit(
        y_train,
        X=X_train,
        fh=[1, 2, 3, 4],
    )

    # Extract parameters
    parameters = forecaster.get_params()

    # Evaluate model
    y_pred = forecaster.predict(X=X_test)
    metrics = {
        "mae": mean_absolute_error(y_test, y_pred),
        "mape": mean_absolute_percentage_error(y_test, y_pred),
    }

    print(f"Parameters: \n{json.dumps(parameters, indent=2)}")
    print(f"Metrics: \n{json.dumps(metrics, indent=2)}")

    # Log parameters and metrics
    mlflow.log_params(parameters)
    mlflow.log_metrics(metrics)

    # Log model using pickle serialization (default).
    mlflavors.sktime.log_model(
        sktime_model=forecaster,
        artifact_path=ARTIFACT_PATH,
        serialization_format="pickle",
    )
    model_uri = mlflow.get_artifact_uri(ARTIFACT_PATH)

# Load model in native sktime flavor and pyfunc flavor
loaded_model = mlflavors.sktime.load_model(model_uri=model_uri)
loaded_pyfunc = mlflavors.sktime.pyfunc.load_model(model_uri=model_uri)

# Convert test data to 2D numpy array so it can be passed to pyfunc predict using
# a single-row Pandas DataFrame configuration argument
X_test_array = X_test.to_numpy()

# Create configuration DataFrame for interval forecast with nominal coverage
# value [0.9,0.95], future forecast horizon of 3 periods, and exogenous regressor.
predict_conf = pd.DataFrame(
    [
        {
            "fh": [1, 2, 3],
            "predict_method": "predict_interval",
            "coverage": [0.9, 0.95],
            "X": X_test_array,
        }
    ]
)

# Generate interval forecasts with native sktime flavor and pyfunc flavor
print(
    f"\nNative sktime 'predict_interval':\n$ \
    {loaded_model.predict_interval(fh=[1, 2, 3], X=X_test, coverage=[0.9, 0.95])}"
)
print(f"\nPyfunc 'predict_interval':\n${loaded_pyfunc.predict(predict_conf)}")

# Print the run id wich is used for serving the model to a local REST API endpoint
# in the request_prediction.py module
print(f"\nMLflow run id:\n{run.info.run_id}")

To view the newly created experiment and logged artifacts open the MLflow UI:

mlflow ui

Model serving

This section illustrates an example of serving the pyfunc flavor to a local REST API endpoint and subsequently requesting a prediction from the served model. To serve the model run the command below where you substitute the run id printed during execution of the train.py module:

mlflow models serve -m runs:/<run_id>/model --env-manager local --host 127.0.0.1

Open a new terminal and run the score_model.py module to request a prediction from thebserved model:

import pandas as pd
import requests
from sktime.datasets import load_longley
from sktime.forecasting.model_selection import temporal_train_test_split

y, X = load_longley()
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X)

# Define local host and endpoint url
host = "127.0.0.1"
url = f"http://{host}:5000/invocations"

# Model scoring via REST API requires transforming the configuration DataFrame
# into JSON format. As numpy ndarray type is not JSON serializable we need to
# convert the exogenous regressor into a list. The wrapper instance will convert
# the list back to ndarray type as required by sktime predict methods. For more
# details read the MLflow deployment API reference.
# (https://mlflow.org/docs/latest/models.html#deploy-mlflow-models)
X_test_list = X_test.to_numpy().tolist()
predict_conf = pd.DataFrame(
    [
        {
            "fh": [1, 2, 3],
            "predict_method": "predict_interval",
            "coverage": [0.9, 0.95],
            "X": X_test_list,
        }
    ]
)

# Create dictionary with pandas DataFrame in the split orientation
json_data = {"dataframe_split": predict_conf.to_dict(orient="split")}

# Score model
response = requests.post(url, json=json_data)
print(f"\nPyfunc 'predict_interval':\n${response.json()}")

StatsForecast

This example trains a StatsForecast AutoARIMA model using the M5 Competition dataset. This dataset contains the daily sales of a product in a Walmart store and some exogenous regressors.

Installation

pip install datasetsforecast==0.0.8

Model logging and loading

Run the train.py module to create a new MLflow experiment (that logs the training hyper-parameters, evaluation metrics and the trained model as an artifact) and to compute forecasts loading the trained model in native flavor and pyfunc flavor:

import mlflow
import pandas as pd
from datasetsforecast.m5 import M5
from sklearn.metrics import mean_absolute_error, mean_absolute_percentage_error
from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA

import mlflavors
from mlflavors.utils.data import load_m5

ARTIFACT_PATH = "model"
DATA_PATH = "./data"
HORIZON = 28
LEVEL = [90, 95]

with mlflow.start_run() as run:
    M5.download(DATA_PATH)
    train_df, X_test, Y_test = load_m5(DATA_PATH)

    models = [AutoARIMA(season_length=7)]

    sf = StatsForecast(df=train_df, models=models, freq="D", n_jobs=-1)

    sf.fit()

    # Evaluate model
    y_pred = sf.predict(h=HORIZON, X_df=X_test, level=LEVEL)["AutoARIMA"]
    y_test = Y_test["y"]

    metrics = {
        "mae": mean_absolute_error(y_test, y_pred),
        "mape": mean_absolute_percentage_error(y_test, y_pred),
    }

    print(f"Metrics: \n{metrics}")

    # Log metrics
    mlflow.log_metrics(metrics)

    # Log model using pickle serialization (default).
    mlflavors.statsforecast.log_model(
        statsforecast_model=sf,
        artifact_path=ARTIFACT_PATH,
        serialization_format="pickle",
    )
    model_uri = mlflow.get_artifact_uri(ARTIFACT_PATH)

# Load model in native statsforecast flavor and pyfunc flavor
loaded_model = mlflavors.statsforecast.load_model(model_uri=model_uri)
loaded_pyfunc = mlflavors.statsforecast.pyfunc.load_model(model_uri=model_uri)

# Convert test data to 2D numpy array so it can be passed to pyfunc predict using
# a single-row Pandas DataFrame configuration argument
X_test_array = X_test.to_numpy()

# Create configuration DataFrame
predict_conf = pd.DataFrame(
    [
        {
            "X": X_test_array,
            "X_cols": X_test.columns,
            "X_dtypes": list(X_test.dtypes),
            "h": HORIZON,
            "level": LEVEL,
        }
    ]
)

# Generate forecasts with native statsforecast flavor and pyfunc flavor
print(
    f"\nNative statsforecast 'predict':\n$ \
    {loaded_model.predict(h=HORIZON, X_df=X_test, level=LEVEL)}"  # noqa: 401
)
print(f"\nPyfunc 'predict':\n${loaded_pyfunc.predict(predict_conf)}")

# Print the run id wich is used for serving the model to a local REST API endpoint
# in the score_model.py module
print(f"\nMLflow run id:\n{run.info.run_id}")

To view the newly created experiment and logged artifacts open the MLflow UI:

mlflow ui

Model serving

This section illustrates an example of serving the pyfunc flavor to a local REST API endpoint and subsequently requesting a prediction from the served model. To serve the model run the command below where you substitute the run id printed during execution of the train.py module:

mlflow models serve -m runs:/<run_id>/model --env-manager local --host 127.0.0.1

Open a new terminal and run the score_model.py module to request a prediction from the served model:

import pandas as pd
import requests

from mlflavors.utils.data import load_m5

DATA_PATH = "./data"
HORIZON = 28
LEVEL = [90, 95]
_, X_test, _ = load_m5(DATA_PATH)

# Define local host and endpoint url
host = "127.0.0.1"
url = f"http://{host}:5000/invocations"

# Convert DateTime to string for JSON serialization
X_test_pyfunc = X_test.copy()
X_test_pyfunc["ds"] = X_test_pyfunc["ds"].dt.strftime(date_format="%Y-%m-%d")

# Convert to list for JSON serialization
X_test_list = X_test_pyfunc.to_numpy().tolist()

# Convert index to list of strings for JSON serialization
X_cols = list(X_test.columns)

# Convert dtypes to string for JSON serialization
X_dtypes = [str(dtype) for dtype in list(X_test.dtypes)]

predict_conf = pd.DataFrame(
    [
        {
            "X": X_test_list,
            "X_cols": X_cols,
            "X_dtypes": X_dtypes,
            "h": HORIZON,
            "level": LEVEL,
        }
    ]
)

# Create dictionary with pandas DataFrame in the split orientation
json_data = {"dataframe_split": predict_conf.to_dict(orient="split")}

# Score model
response = requests.post(url, json=json_data)
print(f"\nPyfunc 'predict':\n${response.json()}")

PyOD

This example trains a PyOD KNN model using a synthetic dataset. Normal data is generated by a multivariate Gaussian distribution and outliers are generated by a uniform distribution.

Model logging and loading

Run the train.py module to create a new MLflow experiment (that logs the evaluation metrics and the trained model as an artifact) and to compute anomaly scores loading the trained model in native flavor and pyfunc flavor:

import json

import mlflow
import pandas as pd
from pyod.models.knn import KNN
from pyod.utils.data import generate_data
from sklearn.metrics import roc_auc_score

import mlflavors

ARTIFACT_PATH = "model"

with mlflow.start_run() as run:
    contamination = 0.1  # percentage of outliers
    n_train = 200  # number of training points
    n_test = 100  # number of testing points

    X_train, X_test, _, y_test = generate_data(
        n_train=n_train, n_test=n_test, contamination=contamination
    )

    # Train kNN detector
    clf = KNN()
    clf.fit(X_train)

    # Evaluate model
    y_test_scores = clf.decision_function(X_test)

    metrics = {
        "roc": roc_auc_score(y_test, y_test_scores),
    }

    print(f"Metrics: \n{json.dumps(metrics, indent=2)}")

    # Log metrics
    mlflow.log_metrics(metrics)

    # Log model using pickle serialization (default).
    mlflavors.pyod.log_model(
        pyod_model=clf,
        artifact_path=ARTIFACT_PATH,
        serialization_format="pickle",
    )
    model_uri = mlflow.get_artifact_uri(ARTIFACT_PATH)

# Load model in native pyod flavor and pyfunc flavor
loaded_model = mlflavors.pyod.load_model(model_uri=model_uri)
loaded_pyfunc = mlflavors.pyod.pyfunc.load_model(model_uri=model_uri)

# Create configuration DataFrame
predict_conf = pd.DataFrame(
    [
        {
            "X": X_test,
            "predict_method": "decision_function",
        }
    ]
)

# Generate anomaly scores with native pyod flavor and pyfunc flavor
print(
    f"\nNative pyod 'decision_function':\n$ \
    {loaded_model.decision_function(X_test)}"
)
print(f"\nPyfunc 'decision_function':\n${loaded_pyfunc.predict(predict_conf)[0]}")

# Print the run id wich is used for serving the model to a local REST API endpoint
# in the score_model.py module
print(f"\nMLflow run id:\n{run.info.run_id}")

To view the newly created experiment and logged artifacts open the MLflow UI:

mlflow ui

Model serving

This section illustrates an example of serving the pyfunc flavor to a local REST API endpoint and subsequently requesting a prediction from the served model. To serve the model run the command below where you substitute the run id printed during execution of the train.py module:

mlflow models serve -m runs:/<run_id>/model --env-manager local --host 127.0.0.1

Open a new terminal and run the score_model.py module to request a prediction from the served model:

import pandas as pd
import requests
from pyod.utils.data import generate_data

contamination = 0.1  # percentage of outliers
n_train = 200  # number of training points
n_test = 100  # number of testing points

_, X_test, _, _ = generate_data(
    n_train=n_train, n_test=n_test, contamination=contamination
)

# Define local host and endpoint url
host = "127.0.0.1"
url = f"http://{host}:5000/invocations"

# Convert to list for JSON serialization
X_test_list = X_test.tolist()

# Create configuration DataFrame
predict_conf = pd.DataFrame(
    [
        {
            "X": X_test_list,
            "predict_method": "decision_function",
        }
    ]
)

# Create dictionary with pandas DataFrame in the split orientation
json_data = {"dataframe_split": predict_conf.to_dict(orient="split")}

# Score model
response = requests.post(url, json=json_data)
print(f"\nPyfunc 'decision_function':\n${response.json()}")

SDV

This example trains a SDV SingleTablePreset synthesizer model using a fake dataset. The fake dataset describes some fictional guests staying at a hotel and the data is available as a single table.

Model logging and loading

Run the train.py module to create a new MLflow experiment (that logs the evaluation metrics and the trained model as an artifact) and to generate synthetic data loading the trained model in native flavor and pyfunc flavor:

import mlflow
import pandas as pd
from sdv.datasets.demo import download_demo
from sdv.evaluation.single_table import evaluate_quality
from sdv.lite import SingleTablePreset

import mlflavors

ARTIFACT_PATH = "model"

with mlflow.start_run() as run:
    real_data, metadata = download_demo(
        modality="single_table", dataset_name="fake_hotel_guests"
    )

    # Train synthesizer
    synthesizer = SingleTablePreset(metadata, name="FAST_ML")
    synthesizer.fit(real_data)

    # Evaluate model
    synthetic_data = synthesizer.sample(num_rows=10)
    quality_report = evaluate_quality(
        real_data=real_data, synthetic_data=synthetic_data, metadata=metadata
    )

    metrics = {
        "overall_quality_score": quality_report.get_score(),
    }

    # Log metrics
    mlflow.log_metrics(metrics)

    # Log model using pickle serialization (default).
    mlflavors.sdv.log_model(
        sdv_model=synthesizer,
        artifact_path=ARTIFACT_PATH,
        serialization_format="pickle",
    )
    model_uri = mlflow.get_artifact_uri(ARTIFACT_PATH)

# Load model in native sdv flavor and pyfunc flavor
loaded_model = mlflavors.sdv.load_model(model_uri=model_uri)
loaded_pyfunc = mlflavors.sdv.pyfunc.load_model(model_uri=model_uri)

# Create configuration DataFrame
predict_conf = pd.DataFrame(
    [
        {
            "modality": "single_table",
            "num_rows": 10,
        }
    ]
)

# Generate synthetic data with native sdv flavor and pyfunc flavor
print(
    f"\nNative sdv sampling:\n$ \
    {loaded_model.sample(num_rows=10)}"
)
print(f"\nPyfunc sampling:\n${loaded_pyfunc.predict(predict_conf)}")

# Print the run id wich is used for serving the model to a local REST API endpoint
# in the score_model.py module
print(f"\nMLflow run id:\n{run.info.run_id}")

To view the newly created experiment and logged artifacts open the MLflow UI:

mlflow ui

Model serving

This section illustrates an example of serving the pyfunc flavor to a local REST API endpoint and subsequently requesting a prediction from the served model. To serve the model run the command below where you substitute the run id printed during execution of the train.py module:

mlflow models serve -m runs:/<run_id>/model --env-manager local --host 127.0.0.1

Open a new terminal and run the score_model.py module to request a prediction from the served model:

import pandas as pd
import requests

# Define local host and endpoint url
host = "127.0.0.1"
url = f"http://{host}:5000/invocations"

# Create configuration DataFrame
predict_conf = pd.DataFrame(
    [
        {
            "modality": "single_table",
            "num_rows": 10,
        }
    ]
)

# Create dictionary with pandas DataFrame in the split orientation
json_data = {"dataframe_split": predict_conf.to_dict(orient="split")}

# Score model
response = requests.post(url, json=json_data)
print(f"\nPyfunc sampling:\n${response.json()}")