Entry
How do I eliminate duplicate lines from my "data" file?
Sep 25th, 2002 20:04
Brian Coogan, Jonathan de Boyne Pollard,
This is really an exercise in simple text file processing with
Unix tools, and not something that is specific to "djbdns".
There are three popular answers:
1. sort -u
This is the most common answer to this question.
Unfortunately, this has the side effect of sorting the records
in the file as well, which may not be desirable if one wants
all of the records (of different types) for a particular domain
name to be grouped together.
2. Dan Bernstein's "cleanup" script
http://cr.yp.to/dnsroot/cleanup
This, too, has the side effect of sorting the records in the
file. Furthermore, it is unable to cope with the more unusual
record types, failing if it encounters them.
3. nawk '!x[$0]++'
It's simple. It doesn't sort the data. It's faster than the
"cleanup" script. It's one line long. And it works. (-:
The Perl equivalent for those without nawk is:
perl -ne 'print unless $seen{$_}++;'