2012-08-28

PHP - File Synchronization Script

Edit

2013-01-24

The code was moved to GitHub: sergio-bobillier/php-file-sync

Also I added a new feature:

USE_CHECKSUM
It will cause the script to use a checksum to compare files. This way the script won't copy files that haven't changed even if their modification dates differ. (Can be slow in some cases).

Fron now on all changes will be posted to GitHub so check the repository there to stay up to date.



Edit

2012-11-05

Some bug fixes.

  • When deleting a whole directory each file being deleted is listed
  • Fixed a bug that caused that when removing a whole directory only the last file in the directory was listed instead of the directory itself.



In brief:


Synchronizing files between two or more computers have never been easier, now you can synchronize files between various locations using a cloud service like Dropbox as hub. However this services although useful have their limits, for me the biggest one is size, 5 GBs are not enough for my music so I decided to write a little script to keep my music in sync between my Desktop and my Laptop computer.

The script is very simple and is well documented so for those of you that just want to watch the code here it is:

PHP - File Synchronization Script on GitHub

How to use:


To use the script:

Note:
Please do not put the script in one of the folders you are going to sync, this will cause unexpected results. Put it somewhere else like in the parent folder or your home directory.

  1. Copy settings-sample.php as settings.php and edit it to fit your needs. The options you can tweak are:
  2. debug mode: This mode will cause the script to print all the actions it takes to the console so you can know what it is doing or check what it is going to do before it actually does it.
  3. simulate: This will cause the script not to take any real actions, the script wont copy, delete or overwrite any file.
  4. After you have adjusted all the settings you can run the script by typing this command in your terminal:

    $ php -f sync-files.php

Tip:
You can set the script to debug mode and simulate at the same time and send all its output to a file to check what the script is going to do before running it for real. I know is hard to trust an script written by someone else specially if you are entrusting it with your precious tunes. ;)

To do it use this command:

$ php -f sync-files.php > sync-result.txt

Then check the contents of sync-reuslt.txt

When running the script for the first time the script guarantees you that after sync is done both locations will have the exact same files and folders. After the first synchronization the script will save a file called .last-sync with the timestamp of the last synchronization in the path it was run.

After the script ran for the first time and the .last-sync file is saved the script will keep the two paths synchronized in subsequent runs, this means that the script will copy any new files you create in any of the two paths to the other, if you delete a file in one of the locations it will be deleted from the other, also if you modify a file in one location the most recent version of the file will be copied to the other path and off course if you rename or move files the script will carry on this changes to the other path.

About the script


Here are some details about the script if you are interested:

What it does:


The script is very simple, what it does is basically this:

  1. It scans all files and folders in the first location, for each file it checks if the same file exists in the other path, if the file does not exists then:
    • If the file was modified after than the last synchronization it copies the file to the other location (Assuming it is a new file)
    • If it was not modified after the last synchronization the script deletes it from the current path. (Assuming it was deleted from the other path)
  2. If the same file exists in both paths then:
    • If the file is a directory then the script recursively synchronizes the directories in both paths
    • If the file is a regular file then it is copied to the other path (overwriting the other file) if it is newer than the file in the other path
  3. Then it does all this the other way around

Off course the script performs some checks to avoid going wildly overwriting and deleting files all the way but yes, this basic algorithm let the script keep the two paths in sync. Note for example that if you rename a file the script will think that the file with the old name was deleted and will delete it from the other path and will think that the file with the new name is a new file and will copy it to the other path, so that is the way it handles renaming and moving.

You can tell of course that the script is very simple and in some particular cases it wont be able to handle all the modifications well and will cause data loss, for example if the same file is changed in both locations the one that was last modified will be kept and the other will be overwritten.

How the script came to be:


As I explained above I have my music in both my Desktop and Laptop computers, at first I just copied the music from my Desktop to my Laptop, using a simple linux command it is easy to only copy files that were new or have been modified since the last time the copy was made.

The problem arises when you start adding, modifying or removing files in both locations, the cp command cannot handle that and thus you need a more elaborated tool.

So I wrote the script and started using it to keep my music in sync between my two computers. Then I realized that the script might be useful for someone else or maybe for a different purpose like keeping files in sync between two servers running a web application or something like that.

Finally


The script is very primitive, currently it is performing just fine to keep my music in sync but I know that it can be improved to better handle more complex modification to files, like merging files that were modified in both locations or warning about conflicts between files.

Also I haven't fully tested it in all particular cases so I can't guarantee that you wont lose a file or two or some modifications you made to a file in one of the locations, that is why I added the simulate and debug modes.

Any comments and suggestions are welcome, feel free to leave yours in the comment section below. Also if you want to use the script in your own application please feel free to do it, don't worry about licenses or giving credit or anything like that, sharing the knowledge is the way of the future.