TorchServe: How to convert bytes output to tensors

I have a model that is served using TorchServe. I’m communicating with the TorchServe server using gRPC. The final postprocess method of the custom handler defined returns a list which is converted into bytes for transfer over the network.

The post process method

def postprocess(self, data):
    # data type - torch.Tensor
    # data shape - [1, 17, 80, 64] and data dtype - torch.float32
    return data.tolist()

JavaScript
​x
 
def postprocess(self, data):
    # data type - torch.Tensor
    # data shape - [1, 17, 80, 64] and data dtype - torch.float32
    return data.tolist()
​

The main issue is at the client where converting the received bytes from TorchServe to a torch Tensor is inefficiently done via ast.literal_eval

# This takes 0.3 seconds
response = self.inference_stub.Predictions(
            inference_pb2.PredictionsRequest(model_name=model_name, input=input_data))
# This takes 0.84 seconds
predictions = torch.as_tensor(literal_eval(
            response.prediction.decode('utf-8')))

JavaScript
 
# This takes 0.3 seconds
response = self.inference_stub.Predictions(
            inference_pb2.PredictionsRequest(model_name=model_name, input=input_data))
# This takes 0.84 seconds
predictions = torch.as_tensor(literal_eval(
            response.prediction.decode('utf-8')))
​

Using numpy.frombuffer or torch.frombuffer return the following error.

import numpy as np

np.frombuffer(response.prediction)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ValueError: buffer size must be a multiple of element size

np.frombuffer(response.prediction, dtype=np.float32)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ValueError: buffer size must be a multiple of element size

JavaScript
 
import numpy as np
​
np.frombuffer(response.prediction)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ValueError: buffer size must be a multiple of element size
​
np.frombuffer(response.prediction, dtype=np.float32)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ValueError: buffer size must be a multiple of element size
​

Using torch

import torch
torch.frombuffer(response.prediction, dtype = torch.float32)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ValueError: buffer length (2601542 bytes) after offset (0 bytes) must be a multiple of element size (4)

JavaScript
 
import torch
torch.frombuffer(response.prediction, dtype = torch.float32)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ValueError: buffer length (2601542 bytes) after offset (0 bytes) must be a multiple of element size (4)
​

Is there an alternative, more efficient solution of converting the received bytes into torch.Tensor?

Answer

One hack I’ve found that has significantly increased the performance while sending large tensors is to return a list of json.

In your handler’s postprocess function:

def postprocess(self, data):
    output_data = {}
    output_data['data'] = data.tolist()
    return [output_data]

JavaScript
 
def postprocess(self, data):
    output_data = {}
    output_data['data'] = data.tolist()
    return [output_data]
​

At the clients side when you receive the grpc response, decode it using json.loads

response = self.inference_stub.Predictions(
            inference_pb2.PredictionsRequest(model_name=model_name, input=input_data))
decoded_output = response.prediction.decode('utf-8')
preds = torch.as_tensor(json.loads(decoded_output))

JavaScript
 
response = self.inference_stub.Predictions(
            inference_pb2.PredictionsRequest(model_name=model_name, input=input_data))
decoded_output = response.prediction.decode('utf-8')
preds = torch.as_tensor(json.loads(decoded_output))
​

preds should have the output tensor

Update:

There’s an even faster method and should completely solve the bottleneck. Use tf.io.serialize_tensor from tensorflow to serialize your tensor inside postprocess

def postprocess(self, data):
    return [tf.io.serialize_tensor(data.cpu()).numpy()]

JavaScript
 
def postprocess(self, data):
    return [tf.io.serialize_tensor(data.cpu()).numpy()]
​

Decode it using tf.io.parse_tensor

response = self.inference_stub.Predictions(
            inference_pb2.PredictionsRequest(model_name=model_name, input=input_data))
prediction = response.prediction
torch.as_tensor(tf.io.parse_tensor(prediction, out_type=tf.float32).numpy())

JavaScript
 
response = self.inference_stub.Predictions(
            inference_pb2.PredictionsRequest(model_name=model_name, input=input_data))
prediction = response.prediction
torch.as_tensor(tf.io.parse_tensor(prediction, out_type=tf.float32).numpy())
​

Advertisement

Answer