Skip to content

Remote RSYNC over High Speed (but latent) WAN (UDR/UDT)

Edit 2022-12-23: I have stopped using UDR/UDT in favor of setting my server’s TCP congestion control mechanism to BBR – the results are really good using BBR and it’s far less work.

Introduction

For a while now I’ve been managing and maintaining Ethereum Archive Nodes for the ArchiveNode.io project. This means that I find myself managing massive databases (6.4TB at the time of this writing). While usually the nodes just sit there doing their thing, sometimes I find myself in need of spinning up a new server or migrating to a different provider.

While I could just re-sync the nodes, it takes on the order of 1-2 months to sync an Archive Node (Turbo-Geth not withstanding). It’s far faster for me to simply transfer the databases over high speed WAN links, assuming that I can get decent speeds between my systems.

That’s usually not a problem if two nodes are near each other, E.G. both are in the US, or both are in Europe. But it does become a problem when I need to transfer across the pond.

With two 1Gbps connections using a normal rsync:

rsync -az --log-file=/home/nethermind/mainnet.log -e ssh root@archive02.archivenode.io:/nethermind/database/mainnet /data/nethermind/database/

I might get 500Mbps or more for servers in close proximity. However, I get about 100Mbps transferring between Europe and the US.

This isn’t a new problem, high latency has been known to be a killer for TCP connections for a long time now. So I’ve scoured the internet multiple times for ways to speed this up and tried various tools. I’ve finally found the one.

UDR

UDR (https://github.com/martinetd/UDR) is a wrapper for rsync that uses the UDT protocol (https://udt.sourceforge.io)

It’s mindbogglingly easy to use. Let’s set it up. I’m going to assume that you already have ssh key authentication turned on so that running ssh user@server already connects successfully from the client. I’m using Client/Server interchangeably here, it doesn’t much matter what end you start this on.

# Client & Server (assuming Ubuntu)

# Install Prereqs
sudo apt update
sudo apt install -y libssl-dev git rsync

# Make the udr program
git clone https://github.com/martinetd/UDR.git
cd UDR
make

# "Install" udr to everyone's path
sudo mv ./src/udr /usr/local/bin/udr

udr --help

Ok assuming you did the above and didn’t run into any dependency or build issues, udr --help should spit back the help for the udr app, and we’re good to go.

Now I can just slightly modify my rsync command, wrapping it in udr

udr rsync -az --log-file=/home/nethermind/mainnet.log root@archive02.archivenode.io:/nethermind/database/mainnet /data/nethermind/database/

Notice that I’m no longer using the -e ssh flag as udr will take care of the transport stuff.

What happens here is udr makes an ssh connection into the remote host, starts the udr process/server and then starts the rsync process over the UDT tunnel.

I’m now getting between 250-300Mbps between my servers. An almost 3x increase in transfer performance. When you’re talking about 6.5TB of data, that’s a massive improvement.

Firewall Stuff:
By default UDR starts with port 9000 and looks for an open port up to 9100 – you’re going to want to make sure that the client can connect to the server over one of these ports (or configure it)

Published inTech

3 Comments

  1. Martin Martin

    Excelent , i will try it!

  2. Ron Trickey Ron Trickey

    Any chance you can help me get this working on CentOS. The problem I’m running into is that CentOS doesn’t provide the libssl-del package. I’ve installed the openssl dev packages, but during the make, I get “crypto.h:30:10: fatal error: openssl/evp.h: No such file or directory”.

  3. hello,

    I try to configure udr client on both server but when I start udr a have error:
    [udr sender] connect: Connection setup failure: connection time out.
    rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
    rsync error: error in rsync protocol data stream (code 12) at io.c(600) [receiver=3.0.6]

    can you pease help me resolve issue

Leave a Reply

Your email address will not be published.