Difference between revisions of "dedup"
Digitalpipe (Talk | contribs) |
Digitalpipe (Talk | contribs) |
||
Line 130: | Line 130: | ||
drwxrwxr-x 4 dave users 4096 2012-03-27 11:39 dataset1 | drwxrwxr-x 4 dave users 4096 2012-03-27 11:39 dataset1 | ||
drwxrwxr-x 2 dave users 4096 2012-03-27 11:39 dataset2 | drwxrwxr-x 2 dave users 4096 2012-03-27 11:39 dataset2 | ||
− | |||
./dataset1: | ./dataset1: |
Revision as of 08:48, 27 March 2012
A simple utility script used for de-duplication of data pools such as your pictures or documents. There's no deletion of the original files, but instead all unique data is copied to a separate directory so the originals can be deleted, backed up, or some other desired task. Not unlike our other bash-script-based software, this project relies on our clAPI framework for various functionality, so be sure this dependency is satisfied before using. It's also worth mentioning that the placement of the OPTION's must follow their respective ACTION (or parent script) which can be determined via the --help output. It might also help to read over the basics of clAPI to get a better understanding when running software from the command line.
Contents
[hide]Terms
This projects' codebase is licensed under the AGPLv3 unless a valid CPL has been purchased. More information about both of these licenses can be found under the "Our Licenses" link of our homepage.
ACTION's
Among our standard 'help', 'version', and 'update' ACTIONs, this project also contains two others - 'install' and 'show'. The 'install' ACTION simply installs the script in the "~/.bin" directory for XiniX and "/usr/bin" for typical GNU/Linux distros. To see how easy it is to install, see the examples section.
The other ACTION, 'show', will perform most of the desired work. It's important to note that the returned information will be different based on the value provided for the --target OPTION. For simplicity and ease-of-use, if --target contains just a server name, ip address, or FQDN (e.g. --target=servername.mydomain.local, --target="192.168.0.10", etc), the results will show all of the shares currently being offered by that server. However, if the --target value includes a share name and optional path (e.g. --target='\\servername\share', --target='\\192.168.0.10\share\dir\path'), lssmb will display the entire contents of that directory (which can further be refined by using the --match option). Notice that when providing a share and optional path in the examples, each value was enclosed by single quotes and the server was preceeded by double back-slashes as this is the common syntax for network interaction within a Microsoft Windows OS environment. If the network share value isn't enclosed by double back-slashes, an error will occur.
OPTION's
In order to help the local device communicate its' access credentials to the --target server, the script provides three additional OPTION's to specify these values which are --targetUser, --targetPass, and --targetDom. Each is case-sensative and provides the username, password, and optional domain/workgroup name respectively in order to perform authentication to the server.
The final two OPTION's deal with controlling what the output contains. The --details OPTION provides more information on the results and does NOT take any values. On the other hand, the --match OPTION allows you to specify a word or phrase as its value that will be used to return any matching file or directory names contained within the --target.
Examples
Installation is a simple 2-step process...
$ cd /path/to/uncompressed/package $ ./dedup install
Showing the contents of the data pools prior to a 'dedup' run...
/tmp/data $ ls -alR .: total 36 drwxrwxr-x 4 dave users 4096 2012-03-27 11:00 . drwxrwxrwt 18 dave users 20480 2012-03-27 11:37 .. drwxrwxr-x 4 dave users 4096 2012-03-27 10:59 dataset1 drwxrwxr-x 2 dave users 4096 2012-03-27 11:39 dataset2 ./dataset1: total 140268 drwxrwxr-x 4 dave users 4096 2012-03-27 10:59 . drwxrwxr-x 4 dave users 4096 2012-03-27 11:00 .. drwxrwxr-x 3 dave users 4096 2012-03-13 16:43 a -rw-r--r-- 1 dave users 143614369 2008-11-07 18:05 flash.tar.gz drwxrwxr-x 2 dave users 4096 2012-03-13 16:03 original ./dataset1/a: total 12 drwxrwxr-x 3 dave users 4096 2012-03-13 16:43 . drwxrwxr-x 4 dave users 4096 2012-03-27 10:59 .. drwxrwxr-x 2 dave users 4096 2012-03-27 10:36 b ./dataset1/a/b: total 16 drwxrwxr-x 2 dave users 4096 2012-03-27 10:36 . drwxrwxr-x 3 dave users 4096 2012-03-13 16:43 .. -rwxr-xr-x 1 dave users 642 2010-07-22 11:45 test.sh -rwxrwxr-x 1 dave users 517 2009-02-17 09:05 test.txt ./dataset1/original: total 500744 drwxrwxr-x 2 dave users 4096 2012-03-13 16:03 . drwxrwxr-x 4 dave users 4096 2012-03-27 10:59 .. -rw-r--r-- 1 dave users 512753664 2011-05-10 09:37 flash.img ./dataset2: total 140272 drwxrwxr-x 2 dave users 4096 2012-03-27 11:39 . drwxrwxr-x 4 dave users 4096 2012-03-27 11:00 .. -rw-rw-r-- 1 dave users 10 2012-03-27 11:39 a_new_one.txt -rw-r--r-- 1 dave users 143614369 2008-11-07 18:05 flash.tar.gz -rwxrwxr-x 1 dave users 517 2009-02-17 09:05 test.txt
Executing a de-duplication run...
$ dedup --noprompts sort --source=/tmp/data --target=/tmp/dedup Beginning the de-duplication process @ Tue Mar 27 11:39:34 EDT 2012 Checking system environment... (i) Directories... Temp: [checking] [exists] [writable] [success] [done] (i) Variables: [done] Beginning the 'sort' module... Entering "/tmp/data"... Entering "/tmp/data/dataset1"... Processing "flash.tar.gz": [unique] [checking] [creating] [success] [copying] [success] [done] Entering "/tmp/data/dataset1/a"... Entering "/tmp/data/dataset1/a/b"... Processing "test.sh": [unique] [checking] [creating] [success] [copying] [success] [done] Processing "test.txt": [unique] [copying] [success] [done] ** Finished, returning to "/tmp/data/dataset1/a". ** Finished, returning to "/tmp/data/dataset1". Entering "/tmp/data/dataset1/original"... Processing "flash.img": [unique] [checking] [creating] [success] [copying] [success] [done] ** Finished, returning to "/tmp/data/dataset1". ** Finished, returning to "/tmp/data". Entering "/tmp/data/dataset2"... Processing "a_new_one.txt": [unique] [checking] [creating] [success] [copying] [success] [done] Processing "flash.tar.gz": [duplicate] Processing "test.txt": [duplicate] ** Finished, returning to "/tmp/data". ** Finished, returning to "/tmp". Calling exit routines for the modules... (i) dedup script... Cleanup: [deleting] [success] [deleting] [success] [done] The job has completed successfully @ Tue Mar 27 11:39:59 EDT 2012
Showing the contents of the de-duplicated data...
/tmp/dedup $ ls -alR .: total 44 drwxrwxr-x 4 dave users 4096 2012-03-27 11:39 . drwxrwxrwt 18 dave users 20480 2012-03-27 11:45 .. -rw-rw-r-- 1 dave users 224 2012-03-27 11:39 20120327113934.db -rw-rw-r-- 1 dave users 1439 2012-03-27 11:39 20120327113934.log drwxrwxr-x 4 dave users 4096 2012-03-27 11:39 dataset1 drwxrwxr-x 2 dave users 4096 2012-03-27 11:39 dataset2 ./dataset1: total 140272 drwxrwxr-x 4 dave users 4096 2012-03-27 11:39 . drwxrwxr-x 4 dave users 4096 2012-03-27 11:39 .. drwxrwxr-x 3 dave users 4096 2012-03-27 11:39 a -rw-r--r-- 1 dave users 143614369 2012-03-27 11:39 flash.tar.gz drwxrwxr-x 2 dave users 4096 2012-03-27 11:39 original ./dataset1/a: total 12 drwxrwxr-x 3 dave users 4096 2012-03-27 11:39 . drwxrwxr-x 4 dave users 4096 2012-03-27 11:39 .. drwxrwxr-x 2 dave users 4096 2012-03-27 11:39 b ./dataset1/a/b: total 16 drwxrwxr-x 2 dave users 4096 2012-03-27 11:39 . drwxrwxr-x 3 dave users 4096 2012-03-27 11:39 .. -rwxr-xr-x 1 dave users 642 2012-03-27 11:39 test.sh -rwxrwxr-x 1 dave users 517 2012-03-27 11:39 test.txt ./dataset1/original: total 500748 drwxrwxr-x 2 dave users 4096 2012-03-27 11:39 . drwxrwxr-x 4 dave users 4096 2012-03-27 11:39 .. -rw-r--r-- 1 dave users 512753664 2012-03-27 11:39 flash.img ./dataset2: total 12 drwxrwxr-x 2 dave users 4096 2012-03-27 11:39 . drwxrwxr-x 4 dave users 4096 2012-03-27 11:39 .. -rw-rw-r-- 1 dave users 10 2012-03-27 11:39 a_new_one.txt
Developers
Dave Henderson [dhenderson (at) cliquesoft (dot) org]