clink 1.1.1 review

Download
by rbytes.net on

clink is a simple Python script that replaces duplicate files in Unix filesystems by symbolic links. Here are some key features of

License: GPL (GNU General Public License)
File size: 7K
Developer: Michael Opdenacker
0 stars award from rbytes.net

clink is a simple Python script that replaces duplicate files in Unix filesystems by symbolic links.

Here are some key features of "clink":
clink saves space. It works particularly well with automatically generated directory structures, such as compiling toolchains.
clink uses relative links, making it possible to move processed directory structures
clink is fast. It reads each file only once and its runtime is mainly the time taken to read files.
clink is light. It consumes very little RAM. No problem to run it on huge filesystems!
clink is easy to use. Just download the script and run it!
clink is free. It is released under the terms of the GNU General Public License.

Usage:

usage: clink [options] [files or directories]

Compacts folders by replacing identical files by symbolic links

Options:

--version show program's version number and exit
-h, --help show this help message and exit
-d, --dry-run just reports identical files, doesn't make any change.

How it works

clink reads all the files one by one, and computes their SHA (20 bytes) and MD5 (16 bytes) checksums. The trick to easily find identical files is a dictionary of files lists indexed by their SHA checksum.

All the files with the same SHA checksum are not immediately considered as identical. Their MD5 checksums and sizes are also compared then. There is an extremely low probability that files meeting all these 3 criteria at once are different. You are much more likely to face file corruption because of a hardware failure on your computer!

Hard links to the same contents are treated as regular files. Keeping one instance and replacing the others by symbolic links is harmless. Files implemented by symbolic links also have the advantage of not having their contents duplicated in tar archives.

Limitations:
File permissions: clink just keeps one copy of duplicate files. The permissions of this file may be less strict than those of other duplicates. If permissions matter, enforce them by yourself after running clink.
Directory structure: even when entire directories are identical, clink just creates links between files. This is not fully optimal in this case, but it keeps clink simple.

What's New in This Release:
This bugfix release ignores non-regular files (such as device files or named pipes) instead of aborting in an ugly way.

clink 1.1.1 keywords