Examples

This section provides a usage example for each flavor. All modules referenced in the examples below can be found in the examples folder of the github repository.

The interface for utilizing the model as a pyfunc type for generating predictions uses a single-row Pandas DataFrame configuration argument. Refer to the API documentation for a description of the supported columns in this configuration Pandas DataFrame.

Orbit

This example trains an Orbit Bayesian ETS model using the iclaims dataset which contains the weekly initial claims for US unemployment benefits against a few related Google trend queries from Jan 2010 - June 2018.

Installation

pip install mlflavors[orbit]

Model logging and loading

Run the train.py module to create a new MLflow experiment (that logs the training hyper-parameters, evaluation metrics and the trained model as an artifact) and to compute forecasts loading the trained model in native flavor and pyfunc flavor:

import json

import mlflow
import pandas as pd
from orbit.models import ETS
from orbit.utils.dataset import load_iclaims
from sklearn.metrics import mean_absolute_error, mean_absolute_percentage_error

import mlflavors

ARTIFACT_PATH = "model"

with mlflow.start_run() as run:
    df = load_iclaims()
    date_col = "week"
    response_col = "claims"

    test_size = 52
    train_df = df[:-test_size]
    test_df = df[-test_size:]

    ets = ETS(
        response_col=response_col,
        date_col=date_col,
        seasonality=52,
        seed=8888,
    )
    ets.fit(df=train_df)

    # Extract parameters
    parameters = {
        k: ets.get_training_meta().get(k)
        for k in [
            "num_of_obs",
            "response_sd",
            "response_mean",
            "training_start",
            "training_end",
            "date_col",
            "response_col",
        ]
    }
    parameters["training_start"] = str(parameters["training_start"])
    parameters["training_end"] = str(parameters["training_end"])

    # Evaluate model
    y_pred = ets.predict(df=test_df, seed=2023)["prediction"]
    y_test = test_df["claims"]

    metrics = {
        "mae": mean_absolute_error(y_test, y_pred),
        "mape": mean_absolute_percentage_error(y_test, y_pred),
    }

    print(f"Parameters: \n{json.dumps(parameters, indent=2)}")
    print(f"Metrics: \n{json.dumps(metrics, indent=2)}")

    # Log parameters and metrics
    mlflow.log_params(parameters)
    mlflow.log_metrics(metrics)

    # Log model using pickle serialization (default).
    mlflavors.orbit.log_model(
        orbit_model=ets,
        artifact_path=ARTIFACT_PATH,
        serialization_format="pickle",
    )
    model_uri = mlflow.get_artifact_uri(ARTIFACT_PATH)

# Load model in native orbit flavor and pyfunc flavor
loaded_model = mlflavors.orbit.load_model(model_uri=model_uri)
loaded_pyfunc = mlflavors.orbit.pyfunc.load_model(model_uri=model_uri)

# Convert test data to 2D numpy array so it can be passed to pyfunc predict using
# a single-row Pandas DataFrame configuration argument
X_test_array = test_df.to_numpy()

# Create configuration DataFrame
predict_conf = pd.DataFrame(
    [
        {
            "X": X_test_array,
            "X_cols": test_df.columns,
            "X_dtypes": list(test_df.dtypes),
            "decompose": True,
            "store_prediction_array": True,
            "seed": 2023,
        }
    ]
)

# Generate forecasts with native orbit flavor and pyfunc flavor
print(
    f"\nNative orbit 'predict':\n$ \
    {loaded_model.predict(test_df, decompose=True, store_prediction_array=True, seed=2023)}"  # noqa: 401
)
print(f"\nPyfunc 'predict':\n${loaded_pyfunc.predict(predict_conf)}")

# Print the run id wich is used for serving the model to a local REST API endpoint
# in the score_model.py module
print(f"\nMLflow run id:\n{run.info.run_id}")

To view the newly created experiment and logged artifacts open the MLflow UI:

mlflow ui

Model serving

This section illustrates an example of serving the pyfunc flavor to a local REST API endpoint and subsequently requesting a prediction from the served model. To serve the model run the command below where you substitute the run id printed during execution of the train.py module:

mlflow models serve -m runs:/<run_id>/model --env-manager local --host 127.0.0.1

Open a new terminal and run the score_model.py module to request a prediction from the served model:

import pandas as pd
import requests
from orbit.utils.dataset import load_iclaims

df = load_iclaims()
test_size = 52
test_df = df[-test_size:]

# Define local host and endpoint url
host = "127.0.0.1"
url = f"http://{host}:5000/invocations"

# Convert DateTime to string for JSON serialization
test_df_pyfunc = test_df.copy()
test_df_pyfunc["week"] = test_df_pyfunc["week"].dt.strftime(
    date_format="%Y-%m-%d %H:%M:%S"
)

# Convert to list for JSON serialization
X_test_list = test_df_pyfunc.to_numpy().tolist()

# Convert index to list of strings for JSON serialization
X_cols = list(test_df.columns)

# Convert dtypes to string for JSON serialization
X_dtypes = [str(dtype) for dtype in list(test_df.dtypes)]

predict_conf = pd.DataFrame(
    [
        {
            "X": X_test_list,
            "X_cols": X_cols,
            "X_dtypes": X_dtypes,
            "decompose": True,
            "store_prediction_array": True,
            "seed": 2023,
        }
    ]
)

# Create dictionary with pandas DataFrame in the split orientation
json_data = {"dataframe_split": predict_conf.to_dict(orient="split")}

# Score model
response = requests.post(url, json=json_data)
print(f"\nPyfunc 'predict':\n${response.json()}")

Sktime

This example trains a Sktime NaiveForecaster model using the Longley dataset for forecasting with exogenous variables.

Model logging and loading

Run the train.py module to create a new MLflow experiment (that logs the training hyper-parameters, evaluation metrics and the trained model as an artifact) and to compute interval forecasts loading the trained model in native flavor and pyfunc flavor:

import json

import mlflow
import pandas as pd
from sktime.datasets import load_longley
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.forecasting.naive import NaiveForecaster
from sktime.performance_metrics.forecasting import (
    mean_absolute_error,
    mean_absolute_percentage_error,
)

import mlflavors

ARTIFACT_PATH = "model"

with mlflow.start_run() as run:
    y, X = load_longley()
    y_train, y_test, X_train, X_test = temporal_train_test_split(y, X)

    forecaster = NaiveForecaster()
    forecaster.fit(
        y_train,
        X=X_train,
        fh=[1, 2, 3, 4],
    )

    # Extract parameters
    parameters = forecaster.get_params()

    # Evaluate model
    y_pred = forecaster.predict(X=X_test)
    metrics = {
        "mae": mean_absolute_error(y_test, y_pred),
        "mape": mean_absolute_percentage_error(y_test, y_pred),
    }

    print(f"Parameters: \n{json.dumps(parameters, indent=2)}")
    print(f"Metrics: \n{json.dumps(metrics, indent=2)}")

    # Log parameters and metrics
    mlflow.log_params(parameters)
    mlflow.log_metrics(metrics)

    # Log model using pickle serialization (default).
    mlflavors.sktime.log_model(
        sktime_model=forecaster,
        artifact_path=ARTIFACT_PATH,
        serialization_format="pickle",
    )
    model_uri = mlflow.get_artifact_uri(ARTIFACT_PATH)

# Load model in native sktime flavor and pyfunc flavor
loaded_model = mlflavors.sktime.load_model(model_uri=model_uri)
loaded_pyfunc = mlflavors.sktime.pyfunc.load_model(model_uri=model_uri)

# Convert test data to 2D numpy array so it can be passed to pyfunc predict using
# a single-row Pandas DataFrame configuration argument
X_test_array = X_test.to_numpy()

# Create configuration DataFrame for interval forecast with nominal coverage
# value [0.9,0.95], future forecast horizon of 3 periods, and exogenous regressor.
predict_conf = pd.DataFrame(
    [
        {
            "fh": [1, 2, 3],
            "predict_method": "predict_interval",
            "coverage": [0.9, 0.95],
            "X": X_test_array,
        }
    ]
)

# Generate interval forecasts with native sktime flavor and pyfunc flavor
print(
    f"\nNative sktime 'predict_interval':\n$ \
    {loaded_model.predict_interval(fh=[1, 2, 3], X=X_test, coverage=[0.9, 0.95])}"
)
print(f"\nPyfunc 'predict_interval':\n${loaded_pyfunc.predict(predict_conf)}")

# Print the run id wich is used for serving the model to a local REST API endpoint
# in the request_prediction.py module
print(f"\nMLflow run id:\n{run.info.run_id}")

To view the newly created experiment and logged artifacts open the MLflow UI:

mlflow ui

Model serving

This section illustrates an example of serving the pyfunc flavor to a local REST API endpoint and subsequently requesting a prediction from the served model. To serve the model run the command below where you substitute the run id printed during execution of the train.py module:

mlflow models serve -m runs:/<run_id>/model --env-manager local --host 127.0.0.1

Open a new terminal and run the score_model.py module to request a prediction from thebserved model:

import pandas as pd
import requests
from sktime.datasets import load_longley
from sktime.forecasting.model_selection import temporal_train_test_split

y, X = load_longley()
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X)

# Define local host and endpoint url
host = "127.0.0.1"
url = f"http://{host}:5000/invocations"

# Model scoring via REST API requires transforming the configuration DataFrame
# into JSON format. As numpy ndarray type is not JSON serializable we need to
# convert the exogenous regressor into a list. The wrapper instance will convert
# the list back to ndarray type as required by sktime predict methods. For more
# details read the MLflow deployment API reference.
# (https://mlflow.org/docs/latest/models.html#deploy-mlflow-models)
X_test_list = X_test.to_numpy().tolist()
predict_conf = pd.DataFrame(
    [
        {
            "fh": [1, 2, 3],
            "predict_method": "predict_interval",
            "coverage": [0.9, 0.95],
            "X": X_test_list,
        }
    ]
)

# Create dictionary with pandas DataFrame in the split orientation
json_data = {"dataframe_split": predict_conf.to_dict(orient="split")}

# Score model
response = requests.post(url, json=json_data)
print(f"\nPyfunc 'predict_interval':\n${response.json()}")

StatsForecast

This example trains a StatsForecast AutoARIMA model using the M5 Competition dataset. This dataset contains the daily sales of a product in a Walmart store and some exogenous regressors.

Installation

pip install datasetsforecast==0.0.8

Model logging and loading

Run the train.py module to create a new MLflow experiment (that logs the training hyper-parameters, evaluation metrics and the trained model as an artifact) and to compute forecasts loading the trained model in native flavor and pyfunc flavor:

import mlflow
import pandas as pd
from datasetsforecast.m5 import M5
from sklearn.metrics import mean_absolute_error, mean_absolute_percentage_error
from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA

import mlflavors
from mlflavors.utils.data import load_m5

ARTIFACT_PATH = "model"
DATA_PATH = "./data"
HORIZON = 28
LEVEL = [90, 95]

with mlflow.start_run() as run:
    M5.download(DATA_PATH)
    train_df, X_test, Y_test = load_m5(DATA_PATH)

    models = [AutoARIMA(season_length=7)]

    sf = StatsForecast(df=train_df, models=models, freq="D", n_jobs=-1)

    sf.fit()

    # Evaluate model
    y_pred = sf.predict(h=HORIZON, X_df=X_test, level=LEVEL)["AutoARIMA"]
    y_test = Y_test["y"]

    metrics = {
        "mae": mean_absolute_error(y_test, y_pred),
        "mape": mean_absolute_percentage_error(y_test, y_pred),
    }

    print(f"Metrics: \n{metrics}")

    # Log metrics
    mlflow.log_metrics(metrics)

    # Log model using pickle serialization (default).
    mlflavors.statsforecast.log_model(
        statsforecast_model=sf,
        artifact_path=ARTIFACT_PATH,
        serialization_format="pickle",
    )
    model_uri = mlflow.get_artifact_uri(ARTIFACT_PATH)

# Load model in native statsforecast flavor and pyfunc flavor
loaded_model = mlflavors.statsforecast.load_model(model_uri=model_uri)
loaded_pyfunc = mlflavors.statsforecast.pyfunc.load_model(model_uri=model_uri)

# Convert test data to 2D numpy array so it can be passed to pyfunc predict using
# a single-row Pandas DataFrame configuration argument
X_test_array = X_test.to_numpy()

# Create configuration DataFrame
predict_conf = pd.DataFrame(
    [
        {
            "X": X_test_array,
            "X_cols": X_test.columns,
            "X_dtypes": list(X_test.dtypes),
            "h": HORIZON,
            "level": LEVEL,
        }
    ]
)

# Generate forecasts with native statsforecast flavor and pyfunc flavor
print(
    f"\nNative statsforecast 'predict':\n$ \
    {loaded_model.predict(h=HORIZON, X_df=X_test, level=LEVEL)}"  # noqa: 401
)
print(f"\nPyfunc 'predict':\n${loaded_pyfunc.predict(predict_conf)}")

# Print the run id wich is used for serving the model to a local REST API endpoint
# in the score_model.py module
print(f"\nMLflow run id:\n{run.info.run_id}")

To view the newly created experiment and logged artifacts open the MLflow UI:

mlflow ui

Model serving

This section illustrates an example of serving the pyfunc flavor to a local REST API endpoint and subsequently requesting a prediction from the served model. To serve the model run the command below where you substitute the run id printed during execution of the train.py module:

mlflow models serve -m runs:/<run_id>/model --env-manager local --host 127.0.0.1

Open a new terminal and run the score_model.py module to request a prediction from the served model:

import pandas as pd
import requests

from mlflavors.utils.data import load_m5

DATA_PATH = "./data"
HORIZON = 28
LEVEL = [90, 95]
_, X_test, _ = load_m5(DATA_PATH)

# Define local host and endpoint url
host = "127.0.0.1"
url = f"http://{host}:5000/invocations"

# Convert DateTime to string for JSON serialization
X_test_pyfunc = X_test.copy()
X_test_pyfunc["ds"] = X_test_pyfunc["ds"].dt.strftime(date_format="%Y-%m-%d")

# Convert to list for JSON serialization
X_test_list = X_test_pyfunc.to_numpy().tolist()

# Convert index to list of strings for JSON serialization
X_cols = list(X_test.columns)

# Convert dtypes to string for JSON serialization
X_dtypes = [str(dtype) for dtype in list(X_test.dtypes)]

predict_conf = pd.DataFrame(
    [
        {
            "X": X_test_list,
            "X_cols": X_cols,
            "X_dtypes": X_dtypes,
            "h": HORIZON,
            "level": LEVEL,
        }
    ]
)

# Create dictionary with pandas DataFrame in the split orientation
json_data = {"dataframe_split": predict_conf.to_dict(orient="split")}

# Score model
response = requests.post(url, json=json_data)
print(f"\nPyfunc 'predict':\n${response.json()}")

PyOD

This example trains a PyOD KNN model using a synthetic dataset. Normal data is generated by a multivariate Gaussian distribution and outliers are generated by a uniform distribution.

Model logging and loading

Run the train.py module to create a new MLflow experiment (that logs the evaluation metrics and the trained model as an artifact) and to compute anomaly scores loading the trained model in native flavor and pyfunc flavor:

import json

import mlflow
import pandas as pd
from pyod.models.knn import KNN
from pyod.utils.data import generate_data
from sklearn.metrics import roc_auc_score

import mlflavors

ARTIFACT_PATH = "model"

with mlflow.start_run() as run:
    contamination = 0.1  # percentage of outliers
    n_train = 200  # number of training points
    n_test = 100  # number of testing points

    X_train, X_test, _, y_test = generate_data(
        n_train=n_train, n_test=n_test, contamination=contamination
    )

    # Train kNN detector
    clf = KNN()
    clf.fit(X_train)

    # Evaluate model
    y_test_scores = clf.decision_function(X_test)

    metrics = {
        "roc": roc_auc_score(y_test, y_test_scores),
    }

    print(f"Metrics: \n{json.dumps(metrics, indent=2)}")

    # Log metrics
    mlflow.log_metrics(metrics)

    # Log model using pickle serialization (default).
    mlflavors.pyod.log_model(
        pyod_model=clf,
        artifact_path=ARTIFACT_PATH,
        serialization_format="pickle",
    )
    model_uri = mlflow.get_artifact_uri(ARTIFACT_PATH)

# Load model in native pyod flavor and pyfunc flavor
loaded_model = mlflavors.pyod.load_model(model_uri=model_uri)
loaded_pyfunc = mlflavors.pyod.pyfunc.load_model(model_uri=model_uri)

# Create configuration DataFrame
predict_conf = pd.DataFrame(
    [
        {
            "X": X_test,
            "predict_method": "decision_function",
        }
    ]
)

# Generate anomaly scores with native pyod flavor and pyfunc flavor
print(
    f"\nNative pyod 'decision_function':\n$ \
    {loaded_model.decision_function(X_test)}"
)
print(f"\nPyfunc 'decision_function':\n${loaded_pyfunc.predict(predict_conf)[0]}")

# Print the run id wich is used for serving the model to a local REST API endpoint
# in the score_model.py module
print(f"\nMLflow run id:\n{run.info.run_id}")

To view the newly created experiment and logged artifacts open the MLflow UI:

mlflow ui

Model serving

This section illustrates an example of serving the pyfunc flavor to a local REST API endpoint and subsequently requesting a prediction from the served model. To serve the model run the command below where you substitute the run id printed during execution of the train.py module:

mlflow models serve -m runs:/<run_id>/model --env-manager local --host 127.0.0.1

Open a new terminal and run the score_model.py module to request a prediction from the served model:

import pandas as pd
import requests
from pyod.utils.data import generate_data

contamination = 0.1  # percentage of outliers
n_train = 200  # number of training points
n_test = 100  # number of testing points

_, X_test, _, _ = generate_data(
    n_train=n_train, n_test=n_test, contamination=contamination
)

# Define local host and endpoint url
host = "127.0.0.1"
url = f"http://{host}:5000/invocations"

# Convert to list for JSON serialization
X_test_list = X_test.tolist()

# Create configuration DataFrame
predict_conf = pd.DataFrame(
    [
        {
            "X": X_test_list,
            "predict_method": "decision_function",
        }
    ]
)

# Create dictionary with pandas DataFrame in the split orientation
json_data = {"dataframe_split": predict_conf.to_dict(orient="split")}

# Score model
response = requests.post(url, json=json_data)
print(f"\nPyfunc 'decision_function':\n${response.json()}")

SDV

This example trains a SDV SingleTablePreset synthesizer model using a fake dataset. The fake dataset describes some fictional guests staying at a hotel and the data is available as a single table.

Model logging and loading

Run the train.py module to create a new MLflow experiment (that logs the evaluation metrics and the trained model as an artifact) and to generate synthetic data loading the trained model in native flavor and pyfunc flavor:

import mlflow
import pandas as pd
from sdv.datasets.demo import download_demo
from sdv.evaluation.single_table import evaluate_quality
from sdv.lite import SingleTablePreset

import mlflavors

ARTIFACT_PATH = "model"

with mlflow.start_run() as run:
    real_data, metadata = download_demo(
        modality="single_table", dataset_name="fake_hotel_guests"
    )

    # Train synthesizer
    synthesizer = SingleTablePreset(metadata, name="FAST_ML")
    synthesizer.fit(real_data)

    # Evaluate model
    synthetic_data = synthesizer.sample(num_rows=10)
    quality_report = evaluate_quality(
        real_data=real_data, synthetic_data=synthetic_data, metadata=metadata
    )

    metrics = {
        "overall_quality_score": quality_report.get_score(),
    }

    # Log metrics
    mlflow.log_metrics(metrics)

    # Log model using pickle serialization (default).
    mlflavors.sdv.log_model(
        sdv_model=synthesizer,
        artifact_path=ARTIFACT_PATH,
        serialization_format="pickle",
    )
    model_uri = mlflow.get_artifact_uri(ARTIFACT_PATH)

# Load model in native sdv flavor and pyfunc flavor
loaded_model = mlflavors.sdv.load_model(model_uri=model_uri)
loaded_pyfunc = mlflavors.sdv.pyfunc.load_model(model_uri=model_uri)

# Create configuration DataFrame
predict_conf = pd.DataFrame(
    [
        {
            "modality": "single_table",
            "num_rows": 10,
        }
    ]
)

# Generate synthetic data with native sdv flavor and pyfunc flavor
print(
    f"\nNative sdv sampling:\n$ \
    {loaded_model.sample(num_rows=10)}"
)
print(f"\nPyfunc sampling:\n${loaded_pyfunc.predict(predict_conf)}")

# Print the run id wich is used for serving the model to a local REST API endpoint
# in the score_model.py module
print(f"\nMLflow run id:\n{run.info.run_id}")

To view the newly created experiment and logged artifacts open the MLflow UI:

mlflow ui

Model serving

This section illustrates an example of serving the pyfunc flavor to a local REST API endpoint and subsequently requesting a prediction from the served model. To serve the model run the command below where you substitute the run id printed during execution of the train.py module:

mlflow models serve -m runs:/<run_id>/model --env-manager local --host 127.0.0.1

Open a new terminal and run the score_model.py module to request a prediction from the served model:

import pandas as pd
import requests

# Define local host and endpoint url
host = "127.0.0.1"
url = f"http://{host}:5000/invocations"

# Create configuration DataFrame
predict_conf = pd.DataFrame(
    [
        {
            "modality": "single_table",
            "num_rows": 10,
        }
    ]
)

# Create dictionary with pandas DataFrame in the split orientation
json_data = {"dataframe_split": predict_conf.to_dict(orient="split")}

# Score model
response = requests.post(url, json=json_data)
print(f"\nPyfunc sampling:\n${response.json()}")