Dupseek 1.3


Dupseek is a command-line interactive perl program to find and remove duplicate files. A few strategies are possible for finding dup
Developer:   Antonio Bellezza
      more software by author →
Price:  0.00
License:   GPL (GNU General Public License)
File size:   13K
Language:   
OS:   
Rating:   0 /5 (0 votes)
Your vote:  
enlarge screenshot


Dupseek is a command-line interactive perl program to find and remove duplicate files.

A few strategies are possible for finding duplicate files in a big set, such as a heavily populated directory.

One of the most widely used consists of grouping files by size (because files of different size can't be identical) and then computing a short digital fingerprint (such as a md5 checksum) for the files.

Files with a different fingerprint are different, and files with the same digital fingerprint are very probably the same. Just to be sure, one can further check possible duplicates.

Here are some key features of "Dupseek":
  • It starts by grouping files by size.
  • Then it starts reading small chunks of the files of the same size and comparing them. It creates smaller groups depending on these comparisons.
  • It goes on with bigger and bigger chunks (of size up to a hard-coded limit).
  • It stops reading from files as soon as they form a single-element group or they are read completely (which only happens when they have a very high probability of having duplicates).

    This algorithm is much more efficient than competitors when dealing with large files of the same size. When files differ, reading usually stops after very few reads.

    Dupseek (and destroy) can be interrupted at any moment. The user is then presented with partial results and can either intervene manually or go on with the reading and computation, on a group-by-group basis. Since subsequent reads happen sparsely in the file, if some files are still in the same group after many iterations, they are most probably identical, unless the differences are very small.

    Requirements:
  • File::Find directory recursion;
  • IO::File object-oriented file handles;
  • Getopt::Std option parsing
    tags the same  are very  same size  they are  with the  fingerprint are  grouping files  digital fingerprint  the files  files with  duplicate files  

    Download Dupseek 1.3


     http://www.beautylabs.net/software/dupseek-1.3.tgz


    Authors software

    Dupseek 1.3 (by Antonio Bellezza)
    Dupseek is a command-line interactive perl program to find and remove duplicate files.

    A few strategies are possible for finding dup


    Similar software

    Dupseek 1.3 (by Antonio Bellezza)
    Dupseek is a command-line interactive perl program to find and remove duplicate files.

    A few strategies are possible for finding dup

    mp3dup 0.3 (by Alexander Hav?ng)
    mp3dup looks for duplicate files in recursively searched directories

    clink 1.1.1 (by Michael Opdenacker)
    clink is a simple Python script that replaces duplicate files in Unix filesystems by symbolic links.

    Here are some key features of

    Fast File Validator 0.45 (by Christopher J. Madsen)
    Fast File Validator in short FFV is a program for verifying files against a checksum file and for creating such checksum files

    eDonkey Fingerprint 0.8 (by Donato Ferrante)
    eDonkey Fingerprint, a program to do fingerprint over eDonkey network

    GNU Diffutils 2.8.1 (by GNU Diffutils Team)
    Computer users often find occasion to ask how two files differ

    Cromfs 1.2.0 (by Joel Yliluoma)
    Cromfs is a compressed read-only filesystem for Linux

    JDiskReport 1.2.3 (by JGoodies)
    JDiskReport enables you to understand how much space the files and directories consume on your disk drives, and it helps you find obs

    FuLFS 0.0.3 (by E. Bloch)
    FuLFS is a simple hack to store and read very large files on/from a filesystem with small maximum file size

    Duff 0.4 (by Camilla Berglund)
    Duff is a command-line utility for quickly finding duplicates in a given set of files


    Other software in this category

    East-Tec DiskSanitizer 2.1 (by EAST Technologies)
    Don't give away sensitive information, valuable corporate trade secrets, business plans, personal files and letters, or traces of Int

    gentoo 0.11.56 (by Obsession Development)
    gentoo is a modern, powerful, flexible, and utterly configurable file manager for UNIX systems, written using the GTK+ toolkit

    SFM 1.5 (by Naskita)

    X-files 1.43 (by Juha Forst?n)
    X-Files is a graphical file management program for Unix/X-Window environment

    Xplore 1.2a (by Albert Graef)
    Xplore is a powerful and highly configurable Motif file manager with an Explorer-like user interface.

    Here are some key features o

  •     search


    Featured Software

    jEdit 4.3 pre8
    jEdit is an Open Source text editor written in Java

    Opera 9.02
    Surf the Internet in a safer, faster, and easier way with Opera browser

    GNU Aspell 0.60.4
    GNU Aspell is a Free and Open Source spell checker designed to eventually replace Ispell


    Subscribe in Rojo
    Google Reader
    Add to My Yahoo!

    Add to My AOL
    Subscribe with Bloglines
    Subscribe in NewsGator Online
    Add 'nixbit linux software' to Newsburst from CNET News.com
    del.icio.us nixbit linux software


    Top tags