Backup of a personal Ecryptfs home directory

This post is in English since it includes some info I didn’t find on the Internet, and that could be of some interest for non Italians too. It’s just about sharing a hack I wrote in order to backup my personal Ecryptfs home directory, so not much theory here.

Some  Linux distributions, e.g. Ubuntu, offer a way to encrypt users’ home directories, based on Ecryptfs. A user’s home directory tree is encrypted and kept in a directory under /home/.ecryptfs/username/.Private, while keys and some other information are kept in files under /home/.ecryptfs/username/.ecryptfs. Keys are encrypted based on the user password, which is always weak in a “cryptographic” perspective, so while files in username/.Private can be exposed without relevant risk, the directory username/.ecryptfs must be protected. As the user logs in, providing his/her password to the system, encryption keys are recovered from the username/.ecryptfs files and used to transparently mount the files in username/.Private as cleartext files in the user’s home directory. More details here.

Now to backups. Note that I’m talking about backups for one single “personal” Linux system. I have different sets of backups. One is based on nightly full backups, encrypted with pgp. I have a copy of the pgp key in a safe place (protected with a very long passphrase). I started making this kind of backups many years ago and I’m used to them: they are very convenient when I need to recover files, burn DVDs for historical backups, carry them with me and so on. I only use full backups locally: I have plenty of disk space, and full backups are much more robust. However, they are not convenient for remote backups. A single, huge, compressed and encrypted file could be totally different from the one of the previous day, even if not much in the encrypted files has changed. This means a lot of data that needs to be uploaded to the remote server, and this is a big problem with an ADSL (I have a 384 Kbps uplink, which is something more than 40 Kbytes/s of actual upload speed: a 1Gbyte upload requires more than 6 hours) .

Remote backups are important: should somebody steal both my computer and the external drive, or in case of fire or some other accident, a remote copy would save my data. Should I have some problem with the remote server or provider instead, I still have my local copies.

I don’t want to use one of the many programs that are provided by remote backup service providers: I want to use some backup tool/procedure that I can fully control and verify, where files are encrypted locally and then transferred through a a protected channel, and where I can easily change the provider without relevant changes to my procedures. The simplest way is to transfer encrypted files with rsync on ssh.

The .Private directory seems to be perfect for this task: files are already encrypted one by one, as are filenames: synchronizing this directory with the remote server is all I need, provided that I store a copy of the username/.ecryptfs directory in the same safe place where I store the pgp key for the other backup set..

But the ecryptfs has some very annoying problems, related to the way it encrypts filenames. The first one is that there is no easy way to map encrypted filenames to cleartext filenames and vice versa. I know the algorithm is somewhere in the source code, but it’s not trivial (to me) how to  extract  it, and it seems that nobody created a useful utility yet. BTW, documentation is scarce, and this is a bad indicator when dealing with security components. Anyway, if I look for a specific file that I need to restore, there is no easy way to find out which encrypted one it is. This shouldn’t be a big problem, since I still have my local full pgp-encrypted backups, but I don’t want to depend on them: each procedure should be effective by itself.

The second, much more critical problem is that ecryptfs encrypted filenames are very long, easily over 100 bytes each, so the full path for a file can easily be longer than 1024 bytes. It turns out that this can be a problem for more filesystems (and drivers, and daemons) than I expected. As an example, my external USB disk has issues with long paths when formatted with NTFS, even if NTFS doesn’t seem to have this limit. As does my current remote backup provider.

So I wrote a simple script that deals with these issues for me. First, I use a simple trick in order to have the ecryptfs driver to translate the filenames for me: when I create (touch) a file with an encrypted name in a directory in the .Private directory, a file with the corresponding cleartext name appears in a directory in the home directory. By using this trick for every file name and directory name in the .Private directory, I can create a mapping between the encrypted file/directory names and the corresponding cleartext names. I can safely store this mapping in a file in the home directory, since it will be encrypted as any other file.

As an example, if a directory in my home is ~/map, and its corresponding encrypted directory is e.g.

.Private/ECRYPTFS_FNEK_ENCRYPTED.FXZCVqi.KZ-Vi-TZVKGFyqpSmgUAfA67jeWWugxfldBjDVhapYOUXLWrV0-

then if I touch a file named ECRYPTFS_FNEK_ENCRYPTED.FXZCVqi.KZ-Vi-TZVKGFyqpA67jeWWugxfldBjDVhapYOUXLWrV0SmgUAf-

and a file named claudio appears in ~/map, then I can store the mapping between claudio and

ECRYPTFS_FNEK_ENCRYPTED.FXZCVqi.KZ-Vi-TZVKGFyqpA67jeWWugxfldBjDVhapYOUXLWrV0SmgUAf-

I solved the second problem this way: each encrypted file is saved in a backup directory whose name is a hash of the dirname (path name not including the filename) for that file. Again, a mapping between paths and hashes is stored in a file. This way, files in the backup are all in directories at most of depth 1, so that path length is not an issue any more.

So I can create a full backup of the .Private directory on my NTFS-formatted external drive, and synchronize it with the remote server.

And here is the script: note that there are almost no safety checks, so be sure to understand what it does and to change it to fit your needs and your directories, or you may end up by losing your whole home directory. I know it could be improved in many ways but… well, I’m not that good with scripting, and had very little time to spend on this. Maybe one day I will make something better, or maybe you could send me some improvements 😉 After the script is run, the directory LDIR can be synchronized with the remote backup via rsync.

 

#!/bin/bash
WDIR=/home/$(id -nu)/rsync-tmp # working directory
LDIR=/media/46491C21645E50CE/backup-e/1 # local backup directories on an usb drive
LDIR2=/media/46491C21645E50CE/backup-e/2
EDIR=/home/.ecryptfs/$(id -nu)/.Private
DFILE=/$WDIR/dfile # mapping between directories and hashes
FFILE=/$WDIR/ffile # mapping between files and paths in the backup
MAPFILE=mapfile # mapping between tokens (file names and directory names) and their encrypted form ; note that only directory and files are copied (no devices, links etc.)
TDIR=map # directory for the trick used in order to discover file names
TDIRE=ECRYPTFS_FNEK_ENCRYPTED.FWZCVqy.zZ-FyqpSmgVi-UAfATZVKG67jeYp8fLmL0f8vf6I9OY4EnKmZT–/ECRYPTFS_FNEK_ENCRYPTED.FWZCVqy.zZ-Vi-TZVKAfA67jeYpoRaIQ9Natmxm4XRn6Xg1UUGFyqpSmgU– # encrypted path to TDIR (this is not my actual one 😉 )

# We ensure that the working directory and the backup directory exist
rm -rf $LDIR2
touch $LDIR/test
RETVAL=$?
if [ $RETVAL -eq 0 ] ; then  rm -rf $LDIR2 ; mv $LDIR $LDIR2 ; else echo backup failed >&2 ; exit 1 ; fi

mkdir -p $WDIR
mkdir -p $LDIR

# Now we collect the encrypted file and directory names; note that there
# are no spaces in encrypted file names

cd $EDIR

find . |tr / \\n |sort -u | tail -n +2 >$WDIR/tokens

# And now we find the corresponding cleartext names

cd $WDIR/$TDIR
rm -f $WDIR/$MAPFILE
rm -f $WDIR/$TDIR/*

cat $WDIR/tokens |while read i ; do
touch $EDIR/$TDIRE/$i
name=$(ls -A)
echo “$name $i” >>$WDIR/$MAPFILE
rm $EDIR/$TDIRE/$i
done

# Now we build the list of  mappings between directories and the hashed directories, and create the actual hashed directories

cd $EDIR
rm -f $DFILE

find . -type d |while read i ; do DIR=$(echo $i|md5sum|awk ‘{print $1}’) ; echo $i $DIR >> $DFILE; mkdir $LDIR/$DIR ;  done

# Now we create the list of files and put them in the proper directories

rm -f $FFILE

find . -type f|while read i ; do DIR=$LDIR/$(dirname $i|md5sum|awk ‘{print $1}’)/$(basename $i); echo $DIR $i >>$FFILE; cp -a $i $DIR; done

cp $DFILE $WDIR/$TDIR
cp $FFILE $WDIR/$TDIR
cp $WDIR/$MAPFILE $WDIR/$TDIR
cp $TDIRE/* $LDIR

This entry was posted in ict and tagged , , , , , . Bookmark the permalink.

2 Responses to Backup of a personal Ecryptfs home directory

  1. Leif says:

    I think ecryptfs uses the same inode number for the upper and lower files, so instead of mapping the names the way you do you could run find to get all the inodes from the upper and lower files and correlate them. It would probably be faster and doesn’t need to write to the disk.

    Also, see the utility ecryptfs-find which will do that for one file.

  2. blogadmin says:

    It does. I have actually found some posts on ecryptfs-find after writing this. Mapping file names through inodes wouldn’t work if the file is hardlinked with more than one filename, but this is something you hardly find in a home directory.
    So now I have this code that I’m using for mapping by hand. I’ll try to update the script sooner or later 😉 as well as write a “recover” script. Thanks 🙂
    The original file is ecryptfs-find by Dustin Kirkland and Sergio Mena de la Cruz, GPL2, so this code is GPL2 too 😉 The original code looks for the file in the whole system, I removed that part and just look in the home directory instead:


    #!/bin/sh -e
    if [ ! -e "$1" ]; then
    echo "usage: $0 /path/to/file" 1>&2
    exit 1
    fi
    # Use one utility for both directions; same method is used
    inum=$(ls -aid "$1" | awk '{print $1}')
    case "$1" in
    *ECRYPTFS_FNEK_ENCRYPTED.*)
    find /home/claudio -inum "$inum"
    ;;
    *)
    direction="encrypt"
    find /home/.ecryptfs/claudio/.Private -inum "$inum"
    ;;
    esac

Comments are closed.