Skip to content

Remote RSYNC over High Speed (but latent) WAN (UDR/UDT)

Edit 2022-12-23: I have stopped using UDR/UDT in favor of setting my server’s TCP congestion control mechanism to BBR – the results are really good using BBR and it’s far less work.

Introduction

For a while now I’ve been managing and maintaining Ethereum Archive Nodes for the ArchiveNode.io project. This means that I find myself managing massive databases (6.4TB at the time of this writing). While usually the nodes just sit there doing their thing, sometimes I find myself in need of spinning up a new server or migrating to a different provider.

While I could just re-sync the nodes, it takes on the order of 1-2 months to sync an Archive Node (Turbo-Geth not withstanding). It’s far faster for me to simply transfer the databases over high speed WAN links, assuming that I can get decent speeds between my systems.

That’s usually not a problem if two nodes are near each other, E.G. both are in the US, or both are in Europe. But it does become a problem when I need to transfer across the pond.

With two 1Gbps connections using a normal rsync:

rsync -az --log-file=/home/nethermind/mainnet.log -e ssh root@archive02.archivenode.io:/nethermind/database/mainnet /data/nethermind/database/

I might get 500Mbps or more for servers in close proximity. However, I get about 100Mbps transferring between Europe and the US.

This isn’t a new problem, high latency has been known to be a killer for TCP connections for a long time now. So I’ve scoured the internet multiple times for ways to speed this up and tried various tools. I’ve finally found the one.

UDR

UDR (https://github.com/martinetd/UDR) is a wrapper for rsync that uses the UDT protocol (https://udt.sourceforge.io)

It’s mindbogglingly easy to use. Let’s set it up. I’m going to assume that you already have ssh key authentication turned on so that running ssh user@server already connects successfully from the client. I’m using Client/Server interchangeably here, it doesn’t much matter what end you start this on.

# Client & Server (assuming Ubuntu)

# Install Prereqs
sudo apt update
sudo apt install -y libssl-dev git rsync

# Make the udr program
git clone https://github.com/martinetd/UDR.git
cd UDR
make

# "Install" udr to everyone's path
sudo mv ./src/udr /usr/local/bin/udr

udr --help

Ok assuming you did the above and didn’t run into any dependency or build issues, udr --help should spit back the help for the udr app, and we’re good to go.

Now I can just slightly modify my rsync command, wrapping it in udr

udr rsync -az --log-file=/home/nethermind/mainnet.log root@archive02.archivenode.io:/nethermind/database/mainnet /data/nethermind/database/

Notice that I’m no longer using the -e ssh flag as udr will take care of the transport stuff.

What happens here is udr makes an ssh connection into the remote host, starts the udr process/server and then starts the rsync process over the UDT tunnel.

I’m now getting between 250-300Mbps between my servers. An almost 3x increase in transfer performance. When you’re talking about 6.5TB of data, that’s a massive improvement.

Firewall Stuff:
By default UDR starts with port 9000 and looks for an open port up to 9100 – you’re going to want to make sure that the client can connect to the server over one of these ports (or configure it)

Published inTech

3 Comments

  1. Ron Trickey Ron Trickey

    Any chance you can help me get this working on CentOS. The problem I’m running into is that CentOS doesn’t provide the libssl-del package. I’ve installed the openssl dev packages, but during the make, I get “crypto.h:30:10: fatal error: openssl/evp.h: No such file or directory”.

  2. hello,

    I try to configure udr client on both server but when I start udr a have error:
    [udr sender] connect: Connection setup failure: connection time out.
    rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
    rsync error: error in rsync protocol data stream (code 12) at io.c(600) [receiver=3.0.6]

    can you pease help me resolve issue

Leave a Reply

Your email address will not be published. Required fields are marked *