Skip to content
Advertisement

TorchServe: How to convert bytes output to tensors

I have a model that is served using TorchServe. I’m communicating with the TorchServe server using gRPC. The final postprocess method of the custom handler defined returns a list which is converted into bytes for transfer over the network.

The post process method

def postprocess(self, data):
    # data type - torch.Tensor
    # data shape - [1, 17, 80, 64] and data dtype - torch.float32
    return data.tolist()

The main issue is at the client where converting the received bytes from TorchServe to a torch Tensor is inefficiently done via ast.literal_eval

# This takes 0.3 seconds
response = self.inference_stub.Predictions(
            inference_pb2.PredictionsRequest(model_name=model_name, input=input_data))
# This takes 0.84 seconds
predictions = torch.as_tensor(literal_eval(
            response.prediction.decode('utf-8')))

Using numpy.frombuffer or torch.frombuffer return the following error.

import numpy as np

np.frombuffer(response.prediction)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ValueError: buffer size must be a multiple of element size

np.frombuffer(response.prediction, dtype=np.float32)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ValueError: buffer size must be a multiple of element size

Using torch

import torch
torch.frombuffer(response.prediction, dtype = torch.float32)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ValueError: buffer length (2601542 bytes) after offset (0 bytes) must be a multiple of element size (4)

Is there an alternative, more efficient solution of converting the received bytes into torch.Tensor?

Advertisement

Answer

One hack I’ve found that has significantly increased the performance while sending large tensors is to return a list of json.

In your handler’s postprocess function:

def postprocess(self, data):
    output_data = {}
    output_data['data'] = data.tolist()
    return [output_data]

At the clients side when you receive the grpc response, decode it using json.loads

response = self.inference_stub.Predictions(
            inference_pb2.PredictionsRequest(model_name=model_name, input=input_data))
decoded_output = response.prediction.decode('utf-8')
preds = torch.as_tensor(json.loads(decoded_output))

preds should have the output tensor

Update:

There’s an even faster method and should completely solve the bottleneck. Use tf.io.serialize_tensor from tensorflow to serialize your tensor inside postprocess

def postprocess(self, data):
    return [tf.io.serialize_tensor(data.cpu()).numpy()]

Decode it using tf.io.parse_tensor

response = self.inference_stub.Predictions(
            inference_pb2.PredictionsRequest(model_name=model_name, input=input_data))
prediction = response.prediction
torch.as_tensor(tf.io.parse_tensor(prediction, out_type=tf.float32).numpy())
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement