Re: join problem with awk/printf (407 Views)
Reply
Regular Advisor
Scott Lindstrom_2
Posts: 153
Registered: ‎06-02-2002
Message 1 of 13 (407 Views)
Accepted Solution

join problem with awk/printf

I have a script that outputs the result of the last backup for each host in this format:

hostnamea retcode policy_name date time

I now have a new requirement to join this with a file that contains the description of what runs on that host, eg :

hostnamea HR dev, DR dev

Up until now, I have been successful using join, and awk with printf. But now that the second file has a freefrom 'second' field, I am having problems. Any ideas on how I can end up with the following output (formatted with printf):

hostnamea retcode policy_name date time HR dev, DR dev

TIA,
Scott
Please use plain text.
Honored Contributor
harry d brown jr
Posts: 8,418
Registered: ‎12-12-2000
Message 2 of 13 (407 Views)

Re: join problem with awk/printf

Can you post exampleS of what you mean by "freeform" ? I suspect that you mean it can have any number of words.

live free or die
harry d brown jr
Live Free or Die
Please use plain text.
Regular Advisor
Scott Lindstrom_2
Posts: 153
Registered: ‎06-02-2002
Message 3 of 13 (407 Views)

Re: join problem with awk/printf

This phrase was an example:
HR dev, DR dev

(ie, HR development, Data Repository development)

Yes - the remainder of the line after the hostname can contain anything, including spaces and commas. That is where my problem lies.

Scott
Please use plain text.
Honored Contributor
harry d brown jr
Posts: 8,418
Registered: ‎12-12-2000
Message 4 of 13 (407 Views)

Re: join problem with awk/printf

If you are saying that the second line in the file contains something like this:

hostnamea HR development, Data Repository development, crazy stuff, more crazy stuff

then you can use "sed" or cut to grab the hostname out:

sed:
sed "s/^\([A-Za-z0-9]*\) \(.*\)/\1/"

cut:
cut -d" " -f1

to grab the additional stuff use cut again:

cut -d" " -f2-

If you want to transform the various stings like "HR development" into "HR dev" and "Data Respository development" into "DR dev" then that poses another challenge, especially if this is a free form field that some user is typing the information into, espeically if they can't spell.

live free or die
harry d brown jr
Live Free or Die
Please use plain text.
Regular Advisor
Scott Lindstrom_2
Posts: 153
Registered: ‎06-02-2002
Message 5 of 13 (407 Views)

Re: join problem with awk/printf

The second file is exactly as you state:

hostnamea HR development, Data Repository development, crazy stuff, more crazy stuff

The problem is as soon as I use join I lose formatting. So I use awk with printf, but then I lose anything after the first word in field2 (I would only get "HR" output).

Basically I need to join and pipe into an awk printf when file2 has a variable number of fields.

Here is what I'm playing with that does not work:

join -j1 1 -j2 1 /tmp/std_backup_list3 /tmp/swinfo | awk '{printf "%-10s\t%s\t%-30s\t%s %s %-40s\n", $1, $2, $3, $4, $5, $6, $7}'

Scott
Please use plain text.
Honored Contributor
harry d brown jr
Posts: 8,418
Registered: ‎12-12-2000
Message 6 of 13 (407 Views)

Re: join problem with awk/printf

So the "joined file" has a first line contains the host name
and the second line contains some free form stuff, like this:

---------------------
hostnamea
HR development, Data Repository development, crazy stuff, more crazy stuff
---------------------

If this is the case, then try this:

"join stuff here" |
awk ' BEGIN { firsttime = 1 }
{
if ( firsttime == 1 ) {
hostis = $0
firsttime = 0
} else
{
print hostis, $0
exit
}
}
'

live free or die
harry d brown jr

[root@vpart1 /var/appl/perlscripts]# ./daher
hostnamea HR development, Data Repository development, crazy stuff, more crazy stuff
[root@vpart1 /var/appl/perlscripts]#
Live Free or Die
Please use plain text.
Honored Contributor
harry d brown jr
Posts: 8,418
Registered: ‎12-12-2000
Message 7 of 13 (407 Views)

Re: join problem with awk/printf

I was a little confused, but now I think this:

[root@vpart1 /var/appl/perlscripts]# cat dah1
hostnameA 0 policy_name date time
hostnameB 1 bad_policy nodate never
hostnameC 2 old_policy someday sometime
hostnameZ 8 good_policy goodday goodtime


[root@vpart1 /var/appl/perlscripts]# cat dah2
hostnameA HR development, Data Repository development, crazy stuff, more crazy stuff
hostnameB stupid stuff, more stupid stuff
hostnameD weird stuff, more weird stuff
hostnameE eerie stuff, more eerie stuff
hostnameZ Security Respository stuff, more backup stuff


[root@vpart1 /var/appl/perlscripts]# cat daher

sort -k 1 dah1 dah2 |
awk ' BEGIN { firsttime = 1 }
{
if ( firsttime == 1 ) {
std_hostis = $1
std_retcode = $2
std_policy_name = $3
std_date = $4
std_time = $5
firsttime = 0
} else
{
if ( std_hostis == $1 ) {
printf "%-10s\t%s\t%-30s\t%s %s %-40s\n", std_hostis, std_retcode, std_policy_name, std_date, std_time, $0
firsttime=1
} else
{
std_hostis = $1
std_retcode = $2
std_policy_name = $3
std_date = $4
std_time = $5
firsttime=0
}
}
}
'

live free or die
harry d brown jr
Live Free or Die
Please use plain text.
Regular Advisor
Scott Lindstrom_2
Posts: 153
Registered: ‎06-02-2002
Message 8 of 13 (407 Views)

Re: join problem with awk/printf

Harry -

That looks like what I need! Let me give it a try and let you know.

Thanks!

Scott
Please use plain text.
Honored Contributor
Sandman!
Posts: 2,220
Registered: ‎01-13-2005
Message 9 of 13 (407 Views)

Re: join problem with awk/printf

IMHO you need not use join or printf to get the proper formatting. Try the awk construct below, it does what you're trying to accomplish.

The file containing "hostnamea retcode policy_name date time" must precede the file containing "hostnamea HR dev, DR dev", otherwise the output will be...
"hostnamea HR dev, DR dev retcode policy_name date time"
instead of...
"hostnamea retcode policy_name date time HR dev, DR dev"

===============================================
awk '{
if(x[$1]=="")
x[$1]=$0
else
for(i=2;i<=NF;++i)
x[$1]=x[$1]" "$i
} END{for(i in x) print x[i]}' firstfile secondfile
===============================================
~hope it helps
Please use plain text.
Regular Advisor
Scott Lindstrom_2
Posts: 153
Registered: ‎06-02-2002
Message 10 of 13 (407 Views)

Re: join problem with awk/printf

Harry - I think because my data is a bit different, the sort works different, and the script gives the wrong results. The output of your sort command is like this:

sort -k 1 dah1 dah2
hostnameA 0 policy_name date time
hostnameA HR development, Data Repository development, crazy stuff, more crazy stuff
hostnameB 1 bad_policy nodate never
hostnameB stupid stuff, more stupid stuff
hostnameC 2 old_policy someday sometime
hostnameD weird stuff, more weird stuff
hostnameE eerie stuff, more eerie stuff
hostnameZ 8 good_policy goodday goodtime
hostnameZ Security Respository stuff, more backup stuff

Always the backup results file followed by the host description file.

My sort output looks more like this, regardless of which file is specified first in the sort command:

host1 (leading spaces) DTP QTP
host1 0 STD_host1 07/10/2006 12:36:09
host2 (leading spaces) BW DTP QTP
host2 0 STD_host2 07/11/2006 01:57:38
host3 (leading spaces) Non-SAP Development
host3 0 STD_host3 07/10/2006 12:26:33

Unless I can get the sort to operate the same, I think I need to move on from this task. I thank you for all your assistance; this has been a learning experience for me!

Scott
Please use plain text.
Honored Contributor
Sandman!
Posts: 2,220
Registered: ‎01-13-2005
Message 11 of 13 (407 Views)

Re: join problem with awk/printf

Hi Scott,

I'm inclined to pursue a wee bit more owing to the intriguing nature of the problem and because imho i think i'ave finally hit the nail on the head :)

1. sort each of the files individually on the first field
# sort -k1,1 /tmp/std_backup_list3 > /tmp/std_backup_list3.out
# sort -k1,1 /tmp/swinfo > /tmp/swinfo.out

2. join the sorted output files from above into a single output file
# join -1 1 -2 1 /tmp/std_backup_list3.out /tmp/swinfo.out > /tmp/all.out

~cheers
Please use plain text.
Respected Contributor
Greg Vaidman
Posts: 252
Registered: ‎09-12-2000
Message 12 of 13 (407 Views)

Re: join problem with awk/printf

have you tried just using a different field separator to do the join?
for example:
sed 's/ /|/g' file1 > file1a
sed 's/ /|/' file2 > file2a
join -t"|" file1a file2a | tr '|' ' '
Please use plain text.
Honored Contributor
Hein van den Heuvel
Posts: 6,585
Registered: ‎05-19-2003
Message 13 of 13 (407 Views)

Re: join problem with awk/printf

Here is an other approach, similar to Sandman's...

It treats s.txt as a reference file to 'cross' with.

The file b.txt is that backup log.

Awk does all the work, by storing records from the software file in an associative array.

No need to sort... the data will be in the backup log order:

C:\Temp>type s.txt
hostnameA HR development, Data Repository development, crazy stuff, more crazy stuff
hostnameB stupid stuff, more stupid stuff
hostnameD weird stuff, more weird stuff
hostnameE eerie stuff, more eerie stuff
hostnameZ Security Respository stuff, more backup stuff

C:\Temp>type b.txt
hostnameA 0 policy_name date time
hostnameZ 8 good_policy goodday goodtime
hostnameB 1 bad_policy nodate never
hostnameC 2 old_policy someday sometime

C:\Temp>awk 'NR==FNR {key=$1; sub(key,""); S[key]=$0}
NR!=FNR {printf "%-10s\t%s\t%-30s\t%s \n", $1, $2, $3, $4, $5, S[$1]}' s.txt b.txt

hostnameA 0 policy_name date time HR development, Data Repository development, crazy stuff, more cr
azy stuff
hostnameZ 8 good_policy goodday goodtime Security Respository stuff, more backup stuff
hostnameB 1 bad_policy nodate never stupid stuff, more stupid stuff
hostnameC 2 old_policy someday sometime

The awk script decides from which file the data is by comparing the current line number NR with the line in current file number FNR. If they are the same, then it is the first file.

fwiw,
Hein.
Please use plain text.
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation