Linux:Powerful Server Administration
上QQ阅读APP看书,第一时间看更新

Synchronizing files with Rsync

In this recipe, we will learn how to use the Rsync utility to synchronize files between two directories or between two servers.

How to do it…

Follow these steps to synchronize files with Rsync:

  1. Set up key-based authentication between source and destination servers. We can use password authentication as well, which is described later in this recipe.
  2. Create a sample directory structure on the source server. You can use existing files as well:
    ubuntu@src$ mkdir sampledir
    ubuntu@src$ touch sampledir/file{1..10}
    
  3. Now, use the following command to synchronize the entire directory from the source server to your local system. Note the / after sampledir. This will copy contents of sampledir in the backup. Without /, the entire sampledir will be copied to the backup:
    ubuntu@dest$ rsync -azP -e ssh ubuntu@10.0.2.8:/home/ubuntu/sampledir/ backup
    

    As this is the first time, all files from sampledir on the remote server will be downloaded in a backup directory on your local system. The output of the command should look like the following screenshot:

  4. You can check the downloaded files with the ls command:
    $ ls -l backup
    
  5. Add one new file on the remote server under sampledir:
    ubuntu@src$ touch sampledir/file22
    
  6. Now re-execute the rsync command on the destination server. This time, rsync will only download a new file and any other update files. The output should look similar to the following screenshot:
    ubuntu@dest$ rsync -azP -e ssh ubuntu@10.0.2.8:/home/ubuntu/sampledir backup
    
  7. To synchronize two local directories, you can simply specify the source and destination path with rsync, as follows:
    $ rsync /var/log/mysql ~/mysql_log_backup
    

How it works…

Rsync is a well known command line file synchronization utility. With Rsync, you can synchronize files between two local directories, as well as files between two servers. This tool is commonly used as a simple backup utility to copy or move files around systems. The advantage of using Rsync is that file synchronization happens incrementally, that is, only new and modified files will be downloaded. This saves bandwidth as well as time. You can quickly schedule a daily backup with a cron and Rsync. Open a cron jobs file with ctontab-e and add the following line to enable daily backups:

$ crontab -e # open crontab file
@daily rsync -aze ssh ubuntu@10.0.2.50:/home/ubuntu/sampledir /var/backup

In the preceding example, we have used a pull operation, where we are downloading files from the remote server. Rsync can be used to upload files as well. Use the following command to push files to the remote server:

$ rsync -azP -e ssh backup ubuntu@10.0.2.50:/home/ubuntu/sampledir

Rsync provides tons of command line options. Some options that are used in the preceding example are –a, a combination of various other flags and stands for achieve. This option enables recursive synchronization and preserves modification time, symbolic links, users, and group permissions. Option -z is used to enable compression while transferring files, while option -P enables progress reports and the resumption of interrupted downloads by saving partial files.

We have used one more option, -e, which specifies which remote shell to be used while downloading files. In the preceding command, we are using SSH with public key authentication. If you have not set public key authentication between two servers, you will be asked to enter a password for your account on the remote server. You can skip the -e flag and rsync will use a non-encrypted connection to transfer data and login credentials.

Note that the SSH connection is established on the default SSH port, port 22. If your remote SSH server runs on a port other than 22, then you can use a slightly modified version of the preceding command as follows:

rsync -azP -e "ssh -p port_number" source destination

Anther common option is --exclude, which specifies the pattern for file names to be excluded. If you need to specify multiple exclusion patterns, then you can specify all such patterns in a text file and include that file in command with the options --exclude-from=filename. Similarly, if you need to include some specific files only, you can specify the inclusion pattern with options --include=pattern or --include-from=filename.

Exclude a single file or files matching with a single pattern:

$ rsync -azP --exclude 'dir*' source/ destination/

Exclude a list of patterns or file names:

$ rsync -azP --exclude-from 'exclude-list.txt' source/ destination/

By default, Rsync does not delete destination files, even if they are deleted from the source location. You can override this behavior with a --delete flag. You can create a backup of these files before deleting them. Use the --backup and --backup-dir options to enable backups. To delete files from the source directory, you can use the --remove-source-files flag. Another handy option is --dry-run, which simulates a transfer with the given flags and displays the output, but does not modify any files. You should use --dry-run before using any deletion flags.

Use this to remove source files with --dry-run:

$ rsync --dry-run --remove-source-files -azP source/ destination/

There's more…

Rsync is a great tool to quickly synchronize the files between source and destination, but it does not provide bidirectional synchronization. It means the changes are synchronized from source to destination and not vice versa. If you need bi-directional synchronization, you can use another utility, Unison. You can install Unison on Debian systems with the following command:

$ sudo apt-get -y install unison

Once installed, Unison is very similar to Rsync and can be executed as follows:

$ unison /home/ubuntu/documents ssh://10.0.2.56//home/ubuntu/documents

You can get more information about Unison in the manual pages with the following command:

$ man unison

If you wish to have your own Dropbox-like mirroring tool which continuously monitors for local file changes and quickly replicates them to network storage, then you can use Lsyncd. Lsyncd is a live synchronization or mirroring tool, which monitors the local directory tree for any events (with inotify and fsevents), and then after few seconds spawns a synchronization process to mirror all changes to a remote location. By default, Lsyncd uses Rsync for synchronization.

As always, Lsyncd is available in the Ubuntu package repository and can be installed with a single command, as follows:

$ sudo apt-get install lsyncd

To get more information about Lsyncd, check the manual pages with the following command:

$ man lsyncd

See also