Dupseek is a command-line interactive perl program to find and remove duplicate files. A few strategies are possible for finding dup
Dupseek is a command-line interactive perl program to find and remove duplicate files.
A few strategies are possible for finding duplicate files in a big set, such as a heavily populated directory.
One of the most widely used consists of grouping files by size (because files of different size can't be identical) and then computing a short digital fingerprint (such as a md5 checksum) for the files.
Files with a different fingerprint are different, and files with the same digital fingerprint are very probably the same. Just to be sure, one can further check possible duplicates.
Here are some key features of "Dupseek":
This algorithm is much more efficient than competitors when dealing with large files of the same size. When files differ, reading usually stops after very few reads.
Dupseek (and destroy) can be interrupted at any moment. The user is then presented with partial results and can either intervene manually or go on with the reading and computation, on a group-by-group basis. Since subsequent reads happen sparsely in the file, if some files are still in the same group after many iterations, they are most probably identical, unless the differences are very small.
tags the same are very same size they are with the fingerprint are grouping files digital fingerprint the files files with duplicate files
Download Dupseek 1.3
Other software in this category
- Desktop Environment
- Science and Engineering
- Text Editing&Processing