Tensorflow models

modelkit provides different modes to use TF models, and makes it easy to switch between them.

  • calling the TF model using the tensorflow module
  • requesting predictions from TensorFlow Serving synchronously via a REST API
  • requesting predictions from TensorFlow Serving asynchronously via a REST API
  • requesting predictions from TensorFlow Serving synchronously via gRPC

TensorflowModel class

All tensorflow based models should derive from the TensorflowModel class. This class provides a number of functions that help with loading/serving TF models.

At initialization time, a TensorflowModel has to be provided with definitions of the tensors predicted by the TF model:

  • output_tensor_mapping a dict of arbitrary keys to tensor names describing the outputs.
  • output_tensor_shapes and output_dtypes a dict of shapes and dtypes of these tensors.


Be careful that _tensorflow_predict returns a dict of np.ndarray of shape (len(items),?) when _predict_batch expects a list of len(items) dicts of np.ndarray.

Other convenience methods

Post processing

After the TF call, _tensorflow_predict_* returns a dict of np.ndarray of shape (len(items),?).

These can be further manipulated by reimplementing the TensorflowModel._post_processing function, e.g. to reshape, change type, select a subset of features.

Empty predictions

Oftentimes we manipulate the item before feeding it to TF, e.g. doing text cleaning or vectorization. This sometimes results in making the prediction trivial, in which case we need not bother calling TF with anything.

modelkit provides a built-in mechanism do deal with these "empty" examples, and the default implementation of predict_batch uses it.

To make use of it, override the _is_empty method:

def _is_empty(self, item) -> bool:
    return item == ""

This will fill in missing values with zeroed arrays when empty strings are found, without calling TF.

To fill in values with another array, also override the _generate_empty_prediction method

def _generate_empty_prediction(self) -> Dict[str, Any]:
    """Function used to fill in values when rebuilding predictions with the mask"""
    return {
        name: np.zeros((1,) + self.output_shapes[name], self.output_dtypes[name])
        for name in self.output_tensor_mapping

Keras model

The TensorflowModel class allows you to build an instance of keras.Model from the underlying saved tensorflow model via the method get_keras_model().

TF Serving

modelkit provides an easy way to query Tensorflow models served via TF Serving. When TF serving is configured, the TF models are not run in the main process, but queried.

Running TF serving container locally

In order to run a TF Serving docker locally, one first needs to download the models and write a configuration file.

This can be achieved by

modelkit tf-serving local-docker --models [PACKAGE]

The CLI creates a configuration file for tensorflow serving, with the model locations refered to relative to the container file system. As a result, the TF serving container will expect that the MODELKIT_ASSETS_DIR is bound to the /config directory inside the container.

Specifically, the CLI:

  • Instantiates a ModelLibrary with all configured models in PACKAGE
  • Downloads all necessary assets in the MODELKIT_ASSETS_DIR
  • writes a configuration file under the local MODELKIT_ASSETS_DIR with all TF models that are configured

The container can then be started by pointing TF serving to the generated configuration file --model_config_file=/config/config.config:

docker run \
        --name local-tf-serving \
        -d \
        -p 8500:8500 -p 8501:8501 \
        -v ${MODELKIT_ASSETS_DIR}:/config \
        -t tensorflow/serving \

See also:

Internal TF serving settings

Several environment variables control how modelkit requests predictions from TF serving.

  • MODELKIT_TF_SERVING_ENABLE: Controls whether to use TF serving or use TF locally as a lib
  • MODELKIT_TF_SERVING_HOST: Host to connect to to request TF predictions
  • MODELKIT_TF_SERVING_PORT: Port to connect to to request TF predictions
  • MODELKIT_TF_SERVING_MODE: Can be grpc (with grpc) or rest (with requests for TensorflowModel, or with aiohttp for AsyncTensorflowModel)
  • MODELKIT_TF_SERVING_ATTEMPTS: number of attempts to wait for TF serving response

All of these parameters can be set programmatically (and passed to the ModelLibrary's settings):

lib_serving_grpc = ModelLibrary(
            "enable": True,
            "port": 8500,
            "mode": "grpc",
            "host": "localhost",

Using TF Serving during tests

modelkit provides a fixture to run TF serving during testing:

def tf_serving():
    lib = ModelLibrary(models=..., settings={"lazy_loading": True})
    yield tf_serving_fixture(request, lib, tf_version="2.8.0")

This will configure and run TF serving during the test session, provided docker is present.