<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Find duplicate files by content not name</title>
	<atom:link href="http://blog.sontek.net/2008/06/30/find-duplicate-files-by-content-not-name/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.sontek.net/2008/06/30/find-duplicate-files-by-content-not-name/</link>
	<description>Use a pencil, lets not build another space pen.</description>
	<pubDate>Fri, 21 Nov 2008 07:53:18 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.2</generator>
		<item>
		<title>By: T</title>
		<link>http://blog.sontek.net/2008/06/30/find-duplicate-files-by-content-not-name/#comment-1914</link>
		<dc:creator>T</dc:creator>
		<pubDate>Tue, 05 Aug 2008 01:59:36 +0000</pubDate>
		<guid isPermaLink="false">http://blog.sontek.net/?p=38#comment-1914</guid>
		<description>it has to be the coolest command ever ! 
Why didn't i think of it.

Two similar files will have same md5, isn't it ?</description>
		<content:encoded><![CDATA[<p>it has to be the coolest command ever !<br />
Why didn&#8217;t i think of it.</p>
<p>Two similar files will have same md5, isn&#8217;t it ?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jakub Szypulka</title>
		<link>http://blog.sontek.net/2008/06/30/find-duplicate-files-by-content-not-name/#comment-1661</link>
		<dc:creator>Jakub Szypulka</dc:creator>
		<pubDate>Fri, 04 Jul 2008 15:58:35 +0000</pubDate>
		<guid isPermaLink="false">http://blog.sontek.net/?p=38#comment-1661</guid>
		<description>Would be cool to have a GUI for that, or even integrated in the file browser.</description>
		<content:encoded><![CDATA[<p>Would be cool to have a GUI for that, or even integrated in the file browser.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: IAnjo</title>
		<link>http://blog.sontek.net/2008/06/30/find-duplicate-files-by-content-not-name/#comment-1641</link>
		<dc:creator>IAnjo</dc:creator>
		<pubDate>Thu, 03 Jul 2008 19:44:08 +0000</pubDate>
		<guid isPermaLink="false">http://blog.sontek.net/?p=38#comment-1641</guid>
		<description>Yeah fdupes works very well too!</description>
		<content:encoded><![CDATA[<p>Yeah fdupes works very well too!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ted Hanney</title>
		<link>http://blog.sontek.net/2008/06/30/find-duplicate-files-by-content-not-name/#comment-1635</link>
		<dc:creator>Ted Hanney</dc:creator>
		<pubDate>Thu, 03 Jul 2008 17:26:22 +0000</pubDate>
		<guid isPermaLink="false">http://blog.sontek.net/?p=38#comment-1635</guid>
		<description>A C prog that achieves the same
http://en.wikipedia.org/wiki/Fdupes</description>
		<content:encoded><![CDATA[<p>A C prog that achieves the same<br />
<a href="http://en.wikipedia.org/wiki/Fdupes" rel="nofollow">http://en.wikipedia.org/wiki/Fdupes</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sam Merrell</title>
		<link>http://blog.sontek.net/2008/06/30/find-duplicate-files-by-content-not-name/#comment-1557</link>
		<dc:creator>Sam Merrell</dc:creator>
		<pubDate>Wed, 02 Jul 2008 04:25:35 +0000</pubDate>
		<guid isPermaLink="false">http://blog.sontek.net/?p=38#comment-1557</guid>
		<description>If I remember right, xargs is easier on memory usage as well.</description>
		<content:encoded><![CDATA[<p>If I remember right, xargs is easier on memory usage as well.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John</title>
		<link>http://blog.sontek.net/2008/06/30/find-duplicate-files-by-content-not-name/#comment-1542</link>
		<dc:creator>John</dc:creator>
		<pubDate>Tue, 01 Jul 2008 12:38:57 +0000</pubDate>
		<guid isPermaLink="false">http://blog.sontek.net/?p=38#comment-1542</guid>
		<description>we use duplicate finder from ashisoft to find and remove duplicate files. 

You can find the free trial version at : http://www.ashisoft.com</description>
		<content:encoded><![CDATA[<p>we use duplicate finder from ashisoft to find and remove duplicate files. </p>
<p>You can find the free trial version at : <a href="http://www.ashisoft.com" rel="nofollow">http://www.ashisoft.com</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andreas Schneider</title>
		<link>http://blog.sontek.net/2008/06/30/find-duplicate-files-by-content-not-name/#comment-1538</link>
		<dc:creator>Andreas Schneider</dc:creator>
		<pubDate>Tue, 01 Jul 2008 08:56:48 +0000</pubDate>
		<guid isPermaLink="false">http://blog.sontek.net/?p=38#comment-1538</guid>
		<description>Normally you want to use xargs instead of -exec. The -exec option calls md5sum for every file found. With xargs, md5sum is only called once.

Tested with cold caches on ~200 files:

time find . -type f -exec md5sum '{}' \; &#124; sort &#124; awk 'dup[$1]++{print $2}'
...

real    0m6.567s
user    0m0.636s
sys     0m0.604s

---

time find . -type f -print0 &#124; xargs -0 md5sum &#124; sort &#124; awk 'dup[$1]++{print $2}'
...

real    0m5.454s
user    0m0.620s
sys     0m0.272s</description>
		<content:encoded><![CDATA[<p>Normally you want to use xargs instead of -exec. The -exec option calls md5sum for every file found. With xargs, md5sum is only called once.</p>
<p>Tested with cold caches on ~200 files:</p>
<p>time find . -type f -exec md5sum &#8216;{}&#8217; \; | sort | awk &#8216;dup[$1]++{print $2}&#8217;<br />
&#8230;</p>
<p>real    0m6.567s<br />
user    0m0.636s<br />
sys     0m0.604s</p>
<p>&#8212;</p>
<p>time find . -type f -print0 | xargs -0 md5sum | sort | awk &#8216;dup[$1]++{print $2}&#8217;<br />
&#8230;</p>
<p>real    0m5.454s<br />
user    0m0.620s<br />
sys     0m0.272s</p>
]]></content:encoded>
	</item>
</channel>
</rss>
