Re: file with duplication ignor anything where there is a duplicate. (466 Views)
Reply
Valued Contributor
rmueller58
Posts: 851
Registered: ‎02-19-2001
Message 1 of 16 (466 Views)
Accepted Solution

file with duplication ignor anything where there is a duplicate.

I have a flat file with "names" in it.

See below:

aanderson
abergman
abergman
aboell
aboell
abone
abridwell
abridwell
aburks
achowdhury

for records containing duplicates I want to ignor these all together and only get the records where there is a single record..
The file is an a-z so I can't just do a simple grep ignor..
Any insight appreciated..

Rex Mueller - Unix System ESU#3
Honored Contributor
Peter Nikitka
Posts: 1,575
Registered: ‎02-10-2003
Message 2 of 16 (466 Views)

Re: file with duplication ignor anything where there is a duplicate.

Hi,

since it seems, that the file is sorted, you can use 'uniq' (see man page).

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
Honored Contributor
Peter Nikitka
Posts: 1,575
Registered: ‎02-10-2003
Message 3 of 16 (466 Views)

Re: file with duplication ignor anything where there is a duplicate.

Hi,

since it seems that the file is sorted, you can use 'uniq' (see man page).

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
Acclaimed Contributor
James R. Ferguson
Posts: 21,184
Registered: ‎07-06-2000
Message 4 of 16 (466 Views)

Re: file with duplication ignor anything where there is a duplicate.

Hi Rex:

# cat ./report
#!/usr/bin/perl
use strict;
use warnings;
my %names;
my $key;
while (<>) {
$names{$_}++;
}
for $key (sort keys %names) {
print $key if $names{$key} == 1;
}
1;

...run as:

# ./report filename

Regards!

...JRF...
Honored Contributor
Sandman!
Posts: 2,220
Registered: ‎01-13-2005
Message 5 of 16 (466 Views)

Re: file with duplication ignor anything where there is a duplicate.

The requirement is to ignore those names that appear more than once in the input file and print only those that occur once?? If that's the case, try the awk construct below (assuming file has one column records only):

# awk '{x[$1]++}END{for(i in x) if(x[i]==1) print i}' file
Acclaimed Contributor
James R. Ferguson
Posts: 21,184
Registered: ‎07-06-2000
Message 6 of 16 (466 Views)

Re: file with duplication ignor anything where there is a duplicate.

Hi (again) Rex:

If you prefer, the Perl script I offered can be reduced to a commandline script:

# perl -ne '$names{$_}++;END{for $key (sort keys %names) {print $key if $names{$key}==1}}' filename

Regards!

...JRF...
Honored Contributor
OldSchool
Posts: 3,372
Registered: ‎09-09-2004
Message 7 of 16 (466 Views)

Re: file with duplication ignor anything where there is a duplicate.

perhaps something like:

sort filename | uniq > outfilename

would work for you?
Valued Contributor
rmueller58
Posts: 851
Registered: ‎02-19-2001
Message 8 of 16 (466 Views)

Re: file with duplication ignor anything where there is a duplicate.

Jim,

I tried the script the names and duplicates remain.. Any ideas?

Honored Contributor
Sandman!
Posts: 2,220
Registered: ‎01-13-2005
Message 9 of 16 (466 Views)

Re: file with duplication ignor anything where there is a duplicate.

Did you try the awk script I posted? Does the file contain mixed-case names or does it have all lowercase names?

Valued Contributor
rmueller58
Posts: 851
Registered: ‎02-19-2001
Message 10 of 16 (466 Views)

Re: file with duplication ignor anything where there is a duplicate.

Sandman You DA MAN!!! I will run it past the recipient to see if this is the data they are looking for.

THANKS!! Kudos to all
Honored Contributor
Sandman!
Posts: 2,220
Registered: ‎01-13-2005
Message 11 of 16 (466 Views)

Re: file with duplication ignor anything where there is a duplicate.

If the file has mixed-case names and you want to keep it that way, then the awk script I posted earlier will suffice. In case you want to ignore case of the names modify the awk construct as:

# awk '{x[tolower($1)]++}END{for(i in x) if(x[i]==1) print i}' file

~cheers
Valued Contributor
rmueller58
Posts: 851
Registered: ‎02-19-2001
Message 12 of 16 (466 Views)

Re: file with duplication ignor anything where there is a duplicate.

It's in the awk vault.. Thanks Sandman, I can see the others are useful, I can find places for them as well.

Merry Christmas all.
Honored Contributor
spex
Posts: 1,367
Registered: ‎05-14-1996
Message 13 of 16 (466 Views)

Re: file with duplication ignor anything where there is a duplicate.

Hi Rex,

This can be accomplished by commands alone:

$ sort file | uniq -c | grep '1 ' | cut -c6-

Merry Christmas!

PCS
Valued Contributor
rmueller58
Posts: 851
Registered: ‎02-19-2001
Message 14 of 16 (466 Views)

Re: file with duplication ignor anything where there is a duplicate.

Spex, I tried that it leaves the dups in place.. Need to have none of the records that have duplicates..

Acclaimed Contributor
James R. Ferguson
Posts: 21,184
Registered: ‎07-06-2000
Message 15 of 16 (466 Views)

Re: file with duplication ignor anything where there is a duplicate.

Hi Rex:

OK, silly me, I thought that your file contained only records with the listed fields.

Consider this file:

aanderson line-1
abergman line-2
abergman line-3
aboell line-4
aboell line-5
abone line-6
abridwell line-7
abridwell line-8
aburks line-9
achowdhury line-10

Now use:

# cat ./report
#!/usr/bin/perl
use strict;
use warnings;
my $file = shift;
my %names;
my @fields;
open (FH, "<", $file) or die "Can't open '$file': $!\n";
while () {
@fields = split;
$names{$fields[0]}++;
}
seek( FH, 0, 0);
while () {
@fields = split;
print if $names{$fields[0]} == 1;
}
1;

...thus:

# ./report file
aanderson line-1
abone line-6
aburks line-9
achowdhury line-10

...Perl counts the first field as zero whereas 'awk' would count it as one.

Regards!

...JRF...
Valued Contributor
rmueller58
Posts: 851
Registered: ‎02-19-2001
Message 16 of 16 (466 Views)

Re: file with duplication ignor anything where there is a duplicate.

Thats the deal Jim! Thanks AGAIN!..
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.