gRPC for Data Engineers

If you’ve been around Data Engineering for a while, like me, you’ve noticed a few trends in the industry at wide, and in individual data engineers themselves. There seem to be a few types of data engineers, and it depends on where you’ve worked, and what your projects have looked like that put you here or there. Some data engineers focus on general ETL, Data Warehousing, and such things. They move data around and transform it using a myriad of tools. The other set of data engineers are more focused on infrastructure at a low level, they provide the underlying tools and services others use to make that data move around and transfer.

Which are you? One of those topics you may or may not be familiar with depending on your background is RPC or more specifically gRPC. What is it?

What is gRPC?

This might be confusing, or not, but stay with me. RPC and gRPC is a method of client-server communication, similar to, yet different from popular options you know as REST and OpenAPI. Where in a client and server are communicating via HTTP, exposed to the developer. For example, the popular Python package requests. It usually has to do with URL's and sending data over those port 80’s via some URL encoded call.

All code is available on GitHub.

Ok, so if that isn’t gRPC, what is it?

gRPC is that same client-server communication, usually using protobuffs, where the client can actually call a method on the server application … directly.

Many times, in software architecture, if you have to applications sitting on different machines, and one needs to pass information or data to another, to take action … say to add a new user, this would have been done via building a REST or OpenAPI. Some applications would build json messages and call an API endpoint via HTTP methods.

The benefit of gRPC is abstracting away some of that busy work, and maybe get a little smarter and faster about making the communication. Using a pre-defined format, “directly” exercise code on that target machine.

gRPC pieces and parts.

gRPC uses Google’s open-sourced message standard (“serializing structured data”) called Protocol Buffers or ProtoBuffs.

You define what you want the data message serialized to look like in a .proto file.

Once these definitions are complete, you use a compiler protoc to automatically generate the classes you need for your language of choice, like Python. These generated classes/files will allow you in your code to send and receive (serialize and deserialize) the data/messages.

Next, you will also use the .proto files to define services. This concept should be familiar to you. A service is going to a logical grouping of your methods/messages.

In review…

  • define a “service
  • define your protobuf “messages
  • compile (protoc) your .proto file(s) into code for your lanauge.

Try out gRPC with Python.

What a better way to try out gRPC with Python? First things first, install what we need.

 pip3 install grpcio grpcio-tools

Let’s start with a simple example.

Learn gRPC with Dune … be an evail Harkonnen trying to find the Muad’dib.

We are going to learn to use gRPC and using Python by playing a little Dune together. Since you are clearly an evil genius let’s pretend you are the Mentat for the House of Harkonnen and your very life depends on finding that sneaky Muad’dib.

There are of course Harkonnen soldiers scouring the face of Arrakis in search of the Muad’dib, they send you a message asking if this person they found in the Muad’dib or just some unfortunate Freeman. Being an evil genius Mentat, you decide to write an gRPC service to respond to these requests.

The first step is to define in our .proto file the service and the request and response to and from that service.

syntax = "proto3";

package dune;

service Dune{
 rpc isMuadDib(WeFoundHim) returns (DidYouReally) {}
}

message WeFoundHim{
 string message = 1;
}

message DidYouReally{
 string message = 1;
}

You can see we defined a service Dune and can send a isMuadDib for someone, and get a DidYouReally response. Pretty straightforward.

Our next step is to use our pip installed grpc_tools to compile and push out all our code needed to run this service.

python -m grpc_tools.protoc -I protos --python_out=. --grpc_python_out=. example.proto

Here I am simply stating my proto file lives in a folder called protos, and just output the Python files into the current directory. You can see the result.

Two files were generated example_pb2.py and example_pb2_grpc.py

The first file contains classes for the Request and Response Messages, the second file is the Client and Server classes.

Now that the base code has been generated for us we need to actually implement the logic of our isMuadDib. I mean we need something to happen when a Harkonnen soldier sends his request to us to know if we found the Muad’dib or not.

Let’s create a new file called dune_server.py and populate it with a new class and method definition as follows.

import example_pb2_grpc
import example_pb2
import random
import grpc
from concurrent import futures


class Dune(example_pb2_grpc.DuneServicer):
    def __init__(self):
        self.choices = ["yes", "no"]

    def isMuadDib(self, request, context):
        response = random.choice(self.choices).upper()
        name = request.message
        return example_pb2.DidYouReally(message=f"Hello minion, You ask me if this {name} is the Muad'dib .... {response}!")
    
    
def serve():
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    example_pb2_grpc.add_DuneServicer_to_server(Dune(), server)
    server.add_insecure_port('[::]:50051')
    server.start()
    server.wait_for_termination()


if __name__ == '__main__':
    serve()

So we imported our files that the compile pushed out for us. We made a class Dune and implemented our isMuadDib method which uses the power of a Mentat to decide if this the Muad’dib or not. Notice in the return of our method we call out the DidYouReally message definition that was defined in our .proto as the response. Also, our Dune class needs to inherit from the DuneServicer class that was autogenerated.

The serve the definition can be taken from the grpc example and quick start guide, but is straightforward.

Finally, we are going to make our last file … dune_client.py and populate it to run this whole thing.

import grpc
import example_pb2_grpc
import example_pb2


def run():
  channel = grpc.insecure_channel('localhost:50051')
  dune = example_pb2_grpc.DuneStub(channel)
  response = dune.isMuadDib(example_pb2.DidYouReally(message='Daniel Beach'))
  print("Greeter client received: " + response.message)

if __name__ == '__main__':
    run()

The client is simply creating a grpc channel on the wire and sending a isMuadDib request and getting a DidYouReally response. Simple enough.

Final task.

Start the server in a terminal window …

python3 dune_server.py

While that bugger is running, try running the client that is going to check if Daniel Beach is the Muad’dib or not.

python3 dune_client.py

And it works!!!

danielbeach@Daniels-MacBook-Pro gRPC % python3 dune_client.py 
Greeter client received: Hello minion, You ask me if this Daniel Beach is the Muad'dib .... NO!

This is great … first, I’m not the Muad’dib and won’t be executed by the Harkonnen soldiers! Second, this gRPC wasn’t bad at all, was it?!

Musings on gRPC and Dune, Harkonnen’s and the Muad’dib.

Honestly, I was quite impressed with the ease of implement a gRPC project in Python. It was smooth and was very simple.

It seems to me if you are building a service or project and you’re thinking about using REST API of some sort, and use something out of the box …. your code base will probably explode with the boilerplate and complexity of adding just the REST service. This doesn’t appear to be the case with gRPC on the surface.

Fewer lines of code are always good in my book. Fewer things to break and have to manage.

Also, the way you define a .proto file is genius. It really distills into a single and simple file what EXACTLY you are trying to do and the expected calls and responses. I’ve had to dig through REST API code before trying to figure out what is happening and it was never this easy or straightforward.

Also, if you haven’t read Dune yet, shame on you. The movie is coming out soon, get on it.

1 reply

Comments are closed.