FAQTs - Knowledge Base - View Entry - How do I eliminate duplicate lines from my "data" file?

faqts : Computers : Internet : Domain Names : djbdns

+ Search

djbdns home, djbdns.org resources, Mailing list archives (MARC), Mailing list archive at GMANE,

Entry

How do I eliminate duplicate lines from my "data" file?

Sep 25th, 2002 20:04
Brian Coogan, Jonathan de Boyne Pollard,

This is really an exercise in simple text file processing with 
Unix tools, and not something that is specific to "djbdns".  
There are three popular answers:
1.  sort -u
This is the most common answer to this question.  
Unfortunately, this has the side effect of sorting the records 
in the file as well, which may not be desirable if one wants 
all of the records (of different types) for a particular domain 
name to be grouped together.
2.  Dan Bernstein's "cleanup" script 
   http://cr.yp.to/dnsroot/cleanup
This, too, has the side effect of sorting the records in the 
file.  Furthermore, it is unable to cope with the more unusual 
record types, failing if it encounters them.
3.  nawk '!x[$0]++'
It's simple.  It doesn't sort the data.  It's faster than the 
"cleanup" script.  It's one line long.  And it works.  (-:
The Perl equivalent for those without nawk is:
    perl -ne 'print unless $seen{$_}++;'