Reinventing the wheel: Automatic Bidirectional Directory Synchronization

As already mentioned, I am the proud owner of an online hard drive and I love it. Enough space for everything that's worth keeping, secure access over lots of different protocols (including rsync), an automatic backup function... It would be perfect if I had an Internet connection with an upstream of at least 100 MBit/s. But I haven't. (Actually the thing almost feels like a local disk when I use it from my computer at work. But that is an university network.) So I had to come up with a little more sophisticated architecture. I bought a GuruPlug server and added an external USB hard drive to it. The basic idea is to use the USB drive over the LAN while at home but still being able to have all data synchronized to the online hard drive in order to have automatic backup and fast access from everywhere else. I thought that the sychronization part would be the easiest as there is a plethora of software available for this task but when I had a closer look at the tools, I ran into problems:

  • rsync, the best known synchronization software only does unidirectional synching. (So it's a mirroring tool, not a synchronization tool, to be correct.) But I want to be able to change files at home on the local drive and on the go on the online drive and have them automatically sychronized overnight.
  • Unison does bidirectional synchronization, also using the heavily traffic-optimized rsync protocol. Sounds perfect. But unison does not really work on locally mounted remote drives as it apparently calculates a hash sum for each and every file which results in downloading the entire online hard drive. (Currently 135 GB in my case.) Usually one would use a unison server for that, calculating the hashes locally on both sides but I do not have shell access to the server - it's only a hard drive.
  • JFileSync, my personal favorite, does bidirectional synchronization and can run from the shell. But when I tried to use it for my "135 GB, 70000 files" directory, it blew up my server's memory as it does a complete comparison in the first step, putting the pending actions for all files in an XML structure, as far as I can tell. (I have to add that the directories were almost in sync at that time. So it's not only because of the huge number of actions for the first sychronization.)

Is it really that hard to have two directories synchronized with the files only compared by their size and their timestamp? I ended up writing my own SyncTool in Java. And while I was at it, I added the possibility to receive the synchronization protocol via Jabber.

Libraries used

I am standing on the shoulders of giants here, I have to say. The tool uses H2 to keep track of the files, the Apache Commons and log4j for the actual copy procedure and log output, JSAP to parse the command line and Smack to connect to Jabber. So all I did was implementing the actual synchronization algorithm: Read the file list, compare and synchronize, recurse. Done using Eclipse and Maven.

Where to get it

If you want to do something similar and don't have the time to reinvent the wheel once again, you can download the SyncTool from my homepage. Available as open source under the Apache License 2.0.

Kommentare

It’s very great

My view

Thanks for taking the time to discuss this, I feel strongly about it and love learning more on this topic. If possible, as you gain expertise, would you mind updating your blog with more information?.handwaterpump It's really going to be of great help to many.

website design cumbria -

website design cumbria - Search Marketing solutions to UK businesses looking to promote their websites on the internet and through the search engines.