The post will describe how the trained models can be persisted and reused across machine learning libraries and environments, i.e. how they can interoperate. To be more specific, let's first introduce some definitions: a trained model is an artefact produced by a machine learning algorithm as part of training which can be used for inference. Library refers to a software package like scikit-learn or Tensorflow. An environment is roughly speaking a combination of hardware and operating system. We will distinguish between training and inference environments. An example of a training environment is a data scientist's MacBook running macOS, and an inference environment could be a production server in the cloud.

There are multiple cases when model interoperability is important:

In this article, we will explore various options of model interoperability, look at the model interchange formats, including those provided by machine learning libraries, natively in programming languages, and designated interchange formats. We'll briefly touch upon Apple CoreML, Baidu MDL, NNVM, and the kinds of models they support.

Machine learning libraries

This section gives an overview of the ML library interoperability features.

scikit-learn

scikit-learn's recommended way of model persistence is to use Pickle. The alternative is the sklearn.externals.joblib module with dump and load functions which may be more efficient. Refer to the documentation for a code example. Such a persisted model cannot be directly used by other libraries but can be deployed to an inference environment if it happens to be able to run Python.

Linear and logistic regression, SVM, tree models and some of the preprocessing transformers can be converted to the CoreML format.

sklearn-porter allows transpiling trained scikit-learn models into C, Java, JavaScript and other languages. Scikit-Learn Compiled Trees creates C++ code for sklearn decision trees and ensembles but hasn't been updated for 11 months at the time of writing.

There's a sklearn2pmml library by Openscoring.io team that allows exporting scikit-learn models to PMML. It is implemented as a Python wrapper around a Java library and it is somewhat unclear which types of sklearn models are supported. It is also worth noting that Openscoring.io software is dual-licensed: a free copy is available under a viral GNU Affero General Public License, but it is possible to purchase their products under the 3-Clause BSD License.

XGBoost

XGBoost gradient boosted decision trees library core is written in C++ with APIs available for Python, R, Java and Scala. Model saving and loading is done into a library-specific format and is offered via a pair of methods, there are examples in Python and in R. Because the model persistence logic is delegated to the library core, an XGB model trained in R or Python can then be exported and loaded into an XGB inference module written in a different, possibly more performant language such as C++:

XGBoost language interoperability
Figure 1. A data scientist trains the model using XGB R API and saves it in the XGB internal format. A model is later imported using XGB C++ API and used in a C++ desktop application.

XGBoost models can be converted to Apple CoreML using coremltools.

Models can also be converted to PMML using jpmml-xgboost by Openscoring.io.

LightGBM

LightGBM is another popular decision tree gradient boosting library, created by Microsoft. Just like XGBoost, its core is written in C++ with APIs in R and Python. Code examples in R and Python show how to save and load models into the LightGBM internal format.

The library's command-line interface can be used to convert models to C++.

LightGBM models can be converted to PMML using jpmml-lightgbm by Openscoring.io.

CatBoost

CatBoost is a library by Yandex implementing gradient boosting on decision trees. It features C++ core, Python, R and command-line interfaces.

In R, built-in serialization in RDS won't work for CatBoost models but there are save_model and load_model methods covering basic import/export needs.

CatBoost Python package has a familiar pair of methods to save the model and to load it back. Python API also supports saving trained CatBoost models to Apple CoreML format:

import catboost
from sklearn import datasets

wine = datasets.load_wine()
cls = catboost.CatBoostClassifier(loss_function='MultiClass')

cls.fit(wine.data, wine.target)
cls.save_model("wine.mlmodel",
               format='coreml',
               export_parameters={'prediction_type': 'probability'})

produces a CoreML model

$ file wine.mlmodel
wine.mlmodel: PDP-11 pure executable not stripped - version 101

that can later be imported into a macOS or iOS app.

Spark MLLib

Starting from Spark 1.6, MLLib transformers and models can be persisted and later loaded.

MLLib Pipeline API also has export/import functionality which requires every component of the pipeline to implement the save/load methods.

Spark has PMML export feature for linear, ridge and lasso regression models, k-means clustering, SVM and binary logistic regression models.

Theano

Theano recommends Pickle model serialization for short-term storage and offers advice on long-term storage.

As for the production deployments, a custom solution with Theano packaged in a Docker container is possible, although that would still require having a C++ compiler in the container. There are some more bits of advice on Theano issue tracker but it's unlikely we will see more on this now that the library's development is discontinued.

Tensorflow

Offers a language-agnostic SavedModel format, meaning that you can store a model trained with Tensorflow Python package and then load it from the C++ API. The format uses Google Protocol Buffers for schema definition, kind of a logical step for Tensorflow being a Google product.

The default choice for serving Tensorflow models server-side would be Tensorflow Serving. For edge devices, a trained model can be converted to the CoreML format and deployed as part of a macOS or iOS app.

Keras

Keras is a deep learning library written by François Chollet in Python, it provides high-level abstractions for building neural network models. It allows defining the network architecture in clean and human-readable code and it delegates the actual training to Theano, Tensorflow or CNTK.

Keras documentation recommends using its built-in method of model persistence which gives the user a choice of storing both the neural network architecture and the trained weights, or just one of the two components. Keras uses HDF5 format popular in the scientific applications.

Here's a good guide on how to persist Keras models in R.

From interoperability viewpoint, Keras takes a special place because it allows using the same model training code and architecture across Theano, Tensorflow and CNTK. That works as long as the user does not utilize the underlying framework’s API directly, which is also a valid Keras use case! As for the trained weights, they can also be saved and loaded across Keras backends, some restrictions apply

PyTorch

Like Keras supports exporting the entire model or just the parameters as documented here. The implementation uses Python Pickle under the hood.

Deeplearning4j

Deeplearning4j, as the name suggests, is a deep learning library for Java. The underlying computation is not done in JVM but rather written in C and C++. It has an additional Keras API and can import trained Keras models allowing to chose between importing just the model architecture from .json file and importing a model together with weights from the .h5. Once loaded into DL4J, a model can be further trained or deployed into an inference environment for predictions.

DL4J also has its own model persistence format. Later in 2017, a direct import of Tensorflow models is planned (right now it is only possible to import a Tensorflow model if it is created in Keras).

Other R libraries

For those R libraries that don't have their own way of saving and loading models, R offers object serialization/deserialization using saveRDS/readRDS or save/load. The resulting format while being available to all R users is not scoring too high on the interoperability scale.

R models can nevertheless be deployed server-side by utilizing Rserve. A setup with a thin web service on top of Rserve is known to work well in practice, web service and Rserve can be optionally containerized. A REST interface for the model is then exposed from the web service which can be implemented in Java, C, C++ or another language supporting Rserve.

Exposing R model with Rserve
Figure 2. Arbitrary R model is deployed as a web-service. Rserve is used access R code from the web service application.

Library-specific save/load feature should normally be preferred since the serialized R object might not have all the necessary data or may have some extra data that is not necessary for storing a trained model.

Other Python libraries

Python's built-in persistence model, an analogue of the R object serialization, is called Pickle. scikit-learn has a concise code snippet showing the usage of Pickle to save and load the model.

Pickled models can be easily served by a Python-based web app. Here's a guide on how to do it with Flask, a lightweight web application framework for Python.

Exposing sklearn model with Flask
Figure 3. Arbitrary Python model is deployed as a web-service implemented with Flask.

Note that pickling might not be a suitable approach for long-term model storage, due to the fact that it doesn't store classes, only their instances. Therefore it may not be possible to deserialize a model trained by an older version of a library using a newer one. Model persistence functionality provided by the machine learning library is preferable when exists.

Inference time software packages

Tensorflow Serving

Tensorflow Serving is a product whose purpose is, unsurprisingly, to serve Tensorflow models. It exposes a gRPC endpoint that can be deployed into production infrastructure and called by other components that need to run machine learning models. It supports model versioning, enabling A/B testing and rolling upgrades.

Tensorflow Serving is not limited to Tensorflow models and can be tailored to execute arbitrary model code while providing nice abstractions around it. Such a custom model needs to have a wrapper written in C++ called Servable. It is a viable alternative for production deployments of any machine learning models, especially if your organization is service-oriented and has experience integrating gRPC endpoints.

Tensorflow Serving XGB model
Figure 4. A possible low-latency deployment scenario of a trained XGB model where Tensorflow Serving uses a custom servable. XGBoost Servable is something that the developer will have to come up with - Tensorflow Serving allows having custom servables and XGBoost offers a C++ API but in practice this hasn't been done yet.

Apple CoreML

Apple has recently released CoreML - a framework for executing trained machine learning models on iOS and macOS. In contrast to the libraries mention above, CoreML itself can't be used for training models. It is shipped with a set of pre-trained models for computer vision and natural language processing tasks. Models trained using other libraries can be converted into the CoreML .mlmodel format. It is a binary format using Google Protocol Buffers to describe the schema and with a publicly available specification. A great thing about building on top of Protocol Buffers is that they have support for virtually all major programming languages.

The introductory WWDC 2017 presentation on CoreML lists the supported by CoreML conversion tool ML libraries. The following libraries are supported: Caffe and Keras for neural nets (only outdated major versions at the time of announcement), scikit-learn and XGBoost for tree ensembles, LIBSVM and scikit-learn for SVM, and some more models from scikit-learn.

CoreML supported model formats
Figure 5. An overview of formats supported by Apple CoreML.

There's a page explaining how to convert a model into the CoreML format and the conversion coremltools themselves are open source.

Unofficial MXNet to Core ML converter tool is available, refer to Bring Machine Learning to iOS apps using Apache MXNet and Apple Core ML to get started.

Interesting to see that Apple decided not to support PMML as one of the import formats.

Baidu Mobile Deep Learning

Baidu's MDL is a new kid on the block, similar to Apple CoreML, it is a library for deploying deep learning models to mobile devices with support for iOS and Android. Models trained in PaddlePaddle and Caffe can be converted into MDL format.

Clipper

Clipper is a prediction serving system by UC Berkeley RISE Lab. It allows deploying models as microservices with REST interfaces that can be invoked by other systems within the IT landscape of the organization. For example, a rules engine may call out to a feature service to fetch the variables and then call Clipper for inference. Clipper's is similar to Tensorflow Serving:

with some differences:

NNVM

While this post was being written, AWS announced NNVM, a software package for compiling models created in deep learning libraries into optimized machine codes (LLVM Intermediate Representation for CPUs, CUDA and other for GPUs). The process consists of two stages:

  1. NNVM compiler, based on a trained model from a deep learning library such as Keras or MXNet, creates and optimizes a computational graph.
  2. TVM takes the computational graph intermediate representation as an input, implements and optimizes it to be executed on a target platform (CPU, GPU or mobile).

NNVM is still in early stages and currently supports MXNet and CoreML models. Caffe and Keras are supported indirectly via conversion to CoreML and explicit support for Keras is planned.

NNVM supported libraries and targets
Figure 6. An overview of frontend and backend formats supported by NNVM+TVM.

Note that in the referenced above announcement, a special terminology is employed where the ML library used for modelling is referred to as the frontend and the piece that will execute the model is called the backend. However such terminology can be slightly confusing for two reasons:

It is not exactly correct to list NNVM in the inference software packages, but at the moment it seems to be targeting this segment while relying on the other frontend frameworks for the model design and training.

Designated model interchange formats

Open Neural Network Exchange (ONNX)

ONNX is a neural network model interchange format with the goal of enabling interoperability between across deep learning libraries. Like CoreML uses Google Protocol Buffers for the schema definition. Announced in September 2017 by Microsoft and Facebook, it is in the early stages. Support for PyTorch, Caffe2 and CNTK is planned.

ONNX provides an open source format for AI models. It defines an extensible computation graph model, as well as definitions of built-in operators and standard data types. Initially we focus on the capabilities needed for inferencing (evaluation).

Soon after the initial announcement, IBM joined the initiative. Update: and AWS

Predictive Model Markup Language (PMML)

PMML is a model exchange format developed and maintained by the Data Mining Group.

Some types of models that PMML supports are neural networks, SVM, Naive Bayes classifiers and other. Data Mining Group website features a list of supported products and models.

PMML has third-party support, most often implemented by Openscoring.io, for many opensource machine and deep learning libraries. It is sometimes referred to as the "de facto standard" for model interoperability. That being said, Apple in their CoreML went for a custom model exchange format, Facebook and Microsoft teamed up to create ONNX; the position of PMML in the area of deep learning is not as strong. And XML is no longer the engineers' favourite format for exchanging data.

Portable Format for Analytics (PFA)

PMML's successor PFA is developed by the same organization, the Data Mining Group. It uses Avro which is in many ways similar to Google Protocol Buffers. It is unclear if PFA is currently supported by any software package.

Last words

In classical "shallow" machine learning area, PMML seems to be the only independent standard with a noticeable degree of adoption. Even without a good model interchange format, there's usually a way to deploy a model server-side by wrapping it in a custom web service or using a product like Tensorflow Serving or Clipper.

Mobile devices can access such a model using the client-server type of communication. Alternatively, a model can be transpiled into one of the languages supported by the mobile platform. CoreML supports some machine learning libraries and allows the trained models to be used for inference on mobile devices.

The world of deep learning is changing fast, just over the last few months a number of new and promising projects were announced, including ONNX and NNVM. It is interesting to see if ONNX and NNVM will be widely accepted by the community. A mature combination of the two could possibly lead to a situation where the user's choice of the deep learning library for training will not limit the target inference environment. I can only speculate but CoreML model format is likely there to stay, and it will benefit from ONNX and NNVM support.