TorchServe: How to convert bytes output to tensors

Question

I have a model that is served using TorchServe. I'm communicating with the TorchServe server using gRPC. The final postprocess method of the custom handler defined returns a list which is converted into bytes for transfer over the network. The post process method The main issue is at the client where converting the received bytes from TorchServe to a torch

Accepted Answer

One hack I&#8217;ve found that has significantly increased the performance while sending large tensors is to return a list of json.In your handler&#8217;s postprocess function:def postprocess(self, data):    output_data = {}    output_data['data'] = data.tolist()    return [output_data]At the clients side when you receive the grpc response, decode it using json.loadsresponse = self.inference_stub.Predictions(            inference_pb2.PredictionsRequest(model_name=model_name, input=input_data))decoded_output = response.prediction.decode('utf-8')preds = torch.as_tensor(json.loads(decoded_output))preds should have the output tensorUpdate:There&#8217;s an even faster method and should completely solve the bottleneck. Use tf.io.serialize_tensor from tensorflow to serialize your tensor inside postprocessdef postprocess(self, data):    return [tf.io.serialize_tensor(data.cpu()).numpy()]Decode it using tf.io.parse_tensorresponse = self.inference_stub.Predictions(            inference_pb2.PredictionsRequest(model_name=model_name, input=input_data))prediction = response.predictiontorch.as_tensor(tf.io.parse_tensor(prediction, out_type=tf.float32).numpy())

Advertisement

Answer