Re: scripting question (450 Views)
Reply
Frequent Advisor
Gary Glick
Posts: 93
Registered: ‎02-24-2004
Message 1 of 11 (450 Views)
Accepted Solution

scripting question

Hi all,

Here's what I need help with:
I receive a data file with several records inside and each record has several lines.

The file follows the following format:
H0001
D0001
D0001
D0001
T0001
H0002
D0002
D0002
T0002
H0003
...., etc.


The H indicates a Header Record, the D a Detail Record and T a Terminator Record.
The H,D or T is actually the first character of the line and the number of Detail lines varies.

What I need to do is split the single file into multiple files so that each H-D-D-D-T section of the file is in a separate file with a different name. The file name is doesn't matter much, perhaps a file.timestamp format. I'm not getting very far with it and was wondering if I could get some help or direction.

I've got perl on the system but I'm not very conversant with it, a shell script would be prefered for longer term support. Or, of course, I could start learning perl ;-)

Thanks a Lot

Gary
Please use plain text.
Honored Contributor
curt larson_1
Posts: 764
Registered: ‎08-23-2002
Message 2 of 11 (450 Views)

Re: scripting question

just a quicky so it is untested

file="f"
num=0

cat yourFile |
while read var
do
case $var in
H*) ((num = $num + 1))
print "$var" > ${file}${num}
;;
D*|H*) print "$var" >> ${file}${num}
;;
esac
done
Please use plain text.
Honored Contributor
Patrick Wallek
Posts: 13,731
Registered: ‎06-21-2000
Message 3 of 11 (450 Views)

Re: scripting question

Hmmm.....

Here's something off the top of my head ---

#!/usr/bin/sh

while read LINE
do
FIRST=$(echo $LINE | cut -c 1)
if [ "${FIRST}" = "H" ] ; then
FILE=H_$(date +%m%d%Y)_$(date +%H%M%S)
echo ${LINE} >> ${FILE}
else
echo ${LINE} >> ${FILE}
fi


This should give you a file named like H_05192004_154715 starting with an H line, it will write to that file until it finds another record with H as the first character then it will start a new file. I haven't tested, but I think it should work.
Please use plain text.
Honored Contributor
curt larson_1
Posts: 764
Registered: ‎08-23-2002
Message 4 of 11 (450 Views)

Re: scripting question

awk might be a bit faster

cat yourFile | awk '
BEGIN {name="f";num=0;}
/^H/ {
num += 1;
fname=sprintf("%s%d",name,num);
print $0 > fname;
next;
}
/^D/ {
print $0 >> fname
next;
}
/^T/ {
print $0 >> fname
}
Please use plain text.
Honored Contributor
John Poff
Posts: 2,448
Registered: ‎05-22-2001
Message 5 of 11 (450 Views)

Re: scripting question

Hi,

Here is one way to do it in Perl:

#!/usr/bin/perl
while (<>)
{
if (/^H/){
close (OUTF);
$count++;
$outfile="FILE." . $count;
open(OUTF,">$outfile") or die "Can't open output file $outfile";
}
print OUTF $_;
}


JP
Please use plain text.
Honored Contributor
Dave La Mar
Posts: 829
Registered: ‎03-27-2001
Message 6 of 11 (450 Views)

Re: scripting question

Gary -
We do just this thing on a similar data file.
Attached is a snip of the process with our naming convention edited.
The array allows the cylcle through and sed of the lines you want printed to separate files.
Not this snip is based on each new record starting with H, and H does not appear in the data portion.

Best of luck.

Regards,

dl
"I'm not dumb. I just have a command of thoroughly useless information."
Please use plain text.
Honored Contributor
Marvin Strong
Posts: 492
Registered: ‎03-01-2004
Message 7 of 11 (450 Views)

Re: scripting question

perl -ne 'if(/H00(\d+)/.../T00($1)/){open O, ">>$1";print O;close O;}' inputfile

This will create files named 1,2,3 etc. The numbers will correspond to the end of the H.

One of the other ways might be better.

Please use plain text.
Honored Contributor
Francisco J. Soler
Posts: 378
Registered: ‎06-15-1999
Message 8 of 11 (450 Views)

Re: scripting question

Hi Gary,

My two lines awk script:

awk '
/^H/ {count++ ; filename="prefix_" count}
{ print >> filename }' filein

where "prefix_" is a prefix you want to name the out files.

Frank.
Linux?. Yes, of course.
Please use plain text.
Honored Contributor
Patrick Wallek
Posts: 13,731
Registered: ‎06-21-2000
Message 9 of 11 (450 Views)

Re: scripting question

Here's an updated version of my script. This one DOES work, as I just had a chance to do some quick testing and debugging of it.

#!/usr/bin/sh

COUNT=0
while read LINE
do
FIRST=$(echo $LINE | cut -c 1)
echo $FIRST ; if [ "${FIRST}" = "H" ] ; then
FILE=H_${COUNT}_$(date +%m%d%Y)_$(date +%H%M%S)
echo ${LINE} >> ${FILE}
let COUNT=$COUNT+1
else
echo ${LINE} >> ${FILE}
fi
done < datfile

Please use plain text.
Honored Contributor
Michael Schulte zur Sur
Posts: 4,040
Registered: ‎06-18-2003
Message 10 of 11 (450 Views)

Re: scripting question

Hi,

There must be an easy solution with csplit. I try from memory, so please don't hit me. ;-).

hth,

Michael

csplit -f spl filetosplit /^H[0-9]*/
Please use plain text.
Frequent Advisor
Gary Glick
Posts: 93
Registered: ‎02-24-2004
Message 11 of 11 (450 Views)

Re: scripting question

thanks for all the responses.
I haven't had the opportunity to try them yet. I'll give 'em a shot tomorrow and see how it goes.

Don't fret points will be forthcoming :D
Please use plain text.
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation