How We Pushed a Million Keys to Redis in Seconds
Dealing with a lot of keys? Redis’ Pipe Mode is your friend.
By Parampreet Singh
Hello there!
In this post, I’ll share my ideas on how we populated Redis (running in a Kubernetes cluster)… in a matter of seconds.
Here’s what can you expect from this post:
1. How to connect to Redis server running in a Kubernetes cluster ?
2. What is Port-forwarding ?
3. How to use Redis mass insertion & push millions of keys in seconds ?
4. How to generate Redis Protocol ?
5. How to read /parse a CSV in Ruby ?
Wait, but why do I need to do this? 🤔
At Gojek, we use Redis in one of the services for caching drivers for faster lookups. Since we have deployed this service to new clusters, we needed to populate Redis with ~81K keys.
What we didn’t do (and should not be done)
Well, this. 👇
$ redis-cli -h "hostname" -p 6379 set "key" "value"
This simple and easy way of storing a key through redis-cli
is okay, but not for thousands or millions of keys. You don’t want to end up waiting for hours unless you are Regina Phalange! 😛
Using a normal Redis client to perform mass insertion is not a good idea. The naive approach of sending one command after the other is slow, because you have to pay for the round trip time for every command.
Let’s do something different!
We will use Redis mass insertion, but before going to that, let’s talk a bit about Redis Protocol.
Redis clients communicate with the Redis server using a protocol called RESP (REdis Serialization Protocol).
With that said, let’s go write some code! I like toying around with Ruby, so this was my language of choice.
gen_redis_proto
function will generate the protocol required for mass insertion.
2.6.3 > puts gen_redis_proto("SET","mykey","Hello World!").inspect
Running the above command in Ruby console, will give us the following protocol.
"*3\r\n$3\r\nSET\r\n$5\r\nmykey\r\n$12\r\nHello World!\r\n"
Well, this is how a command is represented and sent to the Redis Server through Redis Protocol.
*<args><cr><lf>
$<len><cr><lf>
<arg0><cr><lf>
<arg1><cr><lf>
...
<argN><cr><lf>
Where<cr>
means "\r" (or ASCII character 13) and<lf>
means "\n" (or ASCII character 10).
We can now run this script, but here’s a catch. Our Redis server runs in a Kubernetes cluster and we didn’t want to install Ruby and its gems inside a cluster. So now?
Enter port-forwarding! 👍
$ kubectl -n "namespace" port-forward "pod-name" 7000:6379
Connections made to local port 7000 are forwarded to port 6379 of the pod that is running the Redis server. With this connection in place we can use our local workstation to debug the database that is running in the pod.
Finally, we run our script to populate Redis 😬
$ ruby redis_mass_insert.rb | redis-cli -p 7000 --pipe
All data transferred. Waiting for the last reply...
Last reply received from server.
errors: 0, replies: 81003
We ran this script and it completed within a fraction of seconds!
But, how?
In 2.6 or later versions of Redis the redis-cli
utility supports a new mode called pipe mode that was designed in order to perform mass insertion.
Under the hood of pipe mode
According to the official doc:
- redis-cli — pipe tries to send data as fast as possible to the server.
- At the same time it reads data when available, trying to parse it.
- Once there is no more data to read from stdin, it sends a special ECHO command with a random 20 bytes string: we are sure this is the latest command sent, and we are sure we can match the reply checking if we receive the same 20 bytes as a bulk reply.
- Once this special final command is sent, the code receiving replies starts to match replies with these 20 bytes. When the matching reply is reached it can exit with success.
Naice, what’s next?
Well, I tried populating Redis locally with a million keys.
It worked like a charm, in just ~2 seconds. 😄
That’s it!
I really hope that this post gave you some new insights.
Thanks for reading! 💚
References