简单文件去重脚本

近用到了文件去重,简单写了一个基于md5的小脚本。

脚本

脚本dedupicate.sh

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#!/bin/bash
WROK_DIR=/path/to/dir

file_num=0
del_num=0
OIFS=$IFS
IFS=$'\n'
cd $WROK_DIR
while read line; do
    md5sum $line | awk '{print $1}' >>$WROK_DIR/.temp.txt
    ((file_num = file_num + 1))
done <<<$(ls $WROK_DIR)

while read line; do
    if_del="false"
    while read name; do
        md5_val=$(md5sum $name | awk '{print $1}')
        if [[ $md5_val == $line && $if_del == "false" ]]; then
            if_del="true"
        elif [[ $md5_val == $line && $if_del == "true" ]]; then
            rm $name
            ((del_num = del_num + 1))
        fi
    done <<<$(ls $WROK_DIR)
done <<<$(sort $WROK_DIR/.temp.txt | uniq -d)
IFS=$OIFS
rm $WROK_DIR/.temp.txt
echo "该文件夹下总计$file_num个文件, 去除重复文件$del_num个。"

效率不太行。😮‍💨文件数量少也就罢了,数量一多慢得要命,改天再找找好的写法吧。🥱

——2022.5.7——

这里找到了一位大佬写的,修改一下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
#!/bin/bash
WORK_DIR=/path/to/dir
del_num=0

cd $WORK_DIR
find . -maxdepth 1 -type f -print0 | xargs -0 md5sum | sort >all.txt
cat all.txt | uniq -w 32 >uniq.txt
while read line; do
    if [ $line ]; then
        rm $line
        ((del_num = del_num + 1))
    fi
done <<<$(comm all.txt uniq.txt -2 -3 | cut -c 35-)
rm all.txt uniq.txt
echo "该文件夹下现有文件$(ls -l | grep "^-" | wc -l)个, 已去除重复文件$del_num个。"

感觉好多了。

OVER

updatedupdated2025-01-022025-01-02