DupeFinder 1.0.2 review

by rbytes.net on

DupeFinder is a simple application for locating, renaming, moving and deleting duplicate files in a directory structure. It's perf

License: GPL (GNU General Public License)
File size: 22K
Developer: Matt Heinzen
0 stars award from rbytes.net

DupeFinder is a simple application for locating, renaming, moving and deleting duplicate files in a directory structure.

It's perfect both for users who haven't kept their hard drives very well organized and need to do some cleaning to free space, and for users who like to keep lots of backup copies of important data "just in case" something bad should happen.

Here are some key features of "DupeFinder":
Although DupeFinder is a quite small application, it should have all of the features you will need to remove and reorganize large directories full of duplicate files:

Well designed graphical interface with full tooltip and "What's This?" question button support, useful in an application which you probably won't need to use frequently
Quick processing by eliminating analysis of unwanted data through file extension filtering
View files in external applications by double-clicking
Rename files in place or move to new locations
Default settings disallow deletion of all copies of duplicate files to prevent accidental data loss
Generate simple reports identifying groups of duplicate files for later processing

While everything works pretty well in most cases, there are a few issues with DupeFinder to be aware of. I hope to fix most of the following bugs sometime soon:

May crash if files containing "~" or ":" characters are encountered
May crash if self referencing symlinks are encountered
Zero byte files cannot be deleted
May not be able to delete files with Unicode characters in filename
Display does not update if identified duplicates are moved, renamed or modified external to DupeFinder

DupeFinder is built on two primary tools: the Python language and the Qt application toolkit. A Python interpreter and the Qt libraries are included in most desktop Linux, BSD and UNIX distributions. Mac OS X (at least the newer versions) includes Python, and Qt is also available for free, though it is not part of a standard install.

Qt is primarily a C++ toolkit, so this means that the PyQt Qt bindings for Python are also required. These are not standard on many/most Linux, etc. distributions, though they are available for all of the systems mentioned.

Finally, the md5sum utility must be available. This utility is standard on Linux and similar systems, though I've read on Mac OS X it goes by the name md5 instead. I have not confirmed this, but if so then simply change the single occurrence of md5sum in FindDupFiles.py to md5 to run the app on a Mac. Later versions of DupeFinder may use built in code to calculate md5 sums to eliminate this requirement.

Running DupeFinder on Windows should be possible but probably isn't worth the effort, unless most of the components are already in place for other applications. Qt and PyQt for Windows are only available with a commercial license (this will change when Qt 4 is released). Python is a separate install. An md5sum utility is needed (one does appear to be available from ActiveState). Alternatively it is probably possible to satisfy all of the dependencies through X11 on Cygwin.

One more thing: although DupeFinder is intended to be run graphically and interactively, the FindDupFiles.py script can be run standalone from the console. It takes a root search directory followed by any number of file extension filters as command line arguments and outputs the identified duplicate file groups (in no particular order) to STDOUT. This output can be piped to a pager such as less for immediate inspection or redirected straight to a text file using the ">" shell operator (on UNIX-like systems) for logging/reporting.

What's New in This Release:
A bug that prevented the file dialog from opening with newer versions of Qt was fixed.
This bug prevented selecting search directories, renaming files, and saving reports.
Some bugs relating to symbolic links were also fixed.
For safety reasons, symbolic links are ignored. (If symbolic links were included in search results along with normal files, it is possible that a user could delete the only true instance of a file while leaving only a symbolic link.)
Broken symbolic links (including self-referential symlinks), which previously caused the duplicate search to fail are now ignored.

DupeFinder 1.0.2 keywords