For a while now I’ve been managing and maintaining Ethereum Archive Nodes for the ArchiveNode.io project. This means that I find myself managing massive databases (6.4TB at the time of this writing). While usually the nodes just sit there doing their thing, sometimes I find myself in need of spinning up a new server or migrating to a different provider.
While I could just re-sync the nodes, it takes on the order of 1-2 months to sync an Archive Node (Turbo-Geth not withstanding). It’s far faster for me to simply transfer the databases over high speed WAN links, assuming that I can get decent speeds between my systems.
That’s usually not a problem if two nodes are near each other, E.G. both are in the US, or both are in Europe. But it does become a problem when I need to transfer across the pond.
With two 1Gbps connections using a normal
rsync -az --log-file=/home/nethermind/mainnet.log -e ssh email@example.com:/nethermind/database/mainnet /data/nethermind/database/
I might get 500Mbps or more for servers in close proximity. However, I get about 100Mbps transferring between Europe and the US.
This isn’t a new problem, high latency has been known to be a killer for TCP connections for a long time now. So I’ve scoured the internet multiple times for ways to speed this up and tried various tools. I’ve finally found the one.
It’s mindbogglingly easy to use. Let’s set it up. I’m going to assume that you already have ssh key authentication turned on so that running
ssh user@server already connects successfully from the client. I’m using Client/Server interchangeably here, it doesn’t much matter what end you start this on.
# Client & Server (assuming Ubuntu) # Install Prereqs sudo apt update sudo apt install -y libssl-dev git rsync # Make the udr program git clone https://github.com/martinetd/UDR.git cd UDR make # "Install" udr to everyone's path sudo mv ./src/udr /usr/local/bin/udr udr --help
Ok assuming you did the above and didn’t run into any dependency or build issues,
udr --help should spit back the help for the
udr app, and we’re good to go.
Now I can just slightly modify my
rsync command, wrapping it in
udr rsync -az --log-file=/home/nethermind/mainnet.log firstname.lastname@example.org:/nethermind/database/mainnet /data/nethermind/database/
Notice that I’m no longer using the
-e ssh flag as udr will take care of the transport stuff.
What happens here is udr makes an ssh connection into the remote host, starts the udr process/server and then starts the rsync process over the UDT tunnel.
I’m now getting between 250-300Mbps between my servers. An almost 3x increase in transfer performance. When you’re talking about 6.5TB of data, that’s a massive improvement.
By default UDR starts with port 9000 and looks for an open port up to 9100 – you’re going to want to make sure that the client can connect to the server over one of these ports (or configure it)