Help with perl (396 Views)
Reply
Honored Contributor
Eric Antunes
Posts: 1,931
Registered: ‎06-15-2003
Message 1 of 11 (396 Views)

Help with perl

[ Edited ]

I'm new to perl scripting and I need to limit to 60 characters the following String in this pattern in a xml file:

 

<CompanyName>String</CompanyName>

 

But perl doesn't seem to recognize the instr(big, little) function:

 

#!/usr/bin/perl
use strict;
use warnings;
my @a = ();
my @b = ();
my @c = ();
my @d = ();
while (<>) {
  if (m{<CompanyName>}..m{</CompanyName>}) {
    if (m{</?CompanyName>}) {
      push @a, instr($_,">");
      push @b, instr($_,"</");
      push @c, substr(substr($_,@a+1,@b - @a-1),0,60);
      push @d, substr($_,0,@a)||@c||substr($_,@b,14);
      print @d;
      @a = ();
      @b = ();
      @c = ();
      @d = ();
      next;
    }
    print;
  }
  print;
}
1;
 

 

Eric

Each and every day is a good day to learn.
Please use plain text.
Honored Contributor
Eric Antunes
Posts: 1,931
Registered: ‎06-15-2003
Message 2 of 11 (387 Views)

Re: Help with perl

Now, it erases the entire pattern:

 

#!/usr/bin/perl
use strict;
use warnings;
my @a = ();
#my @b = ();
#my @c = ();
#my @d = ();
while (<>) {
  if (m{<CompanyName>}..m{</CompanyName>}) {
    if (m{<CompanyName>}) {
      push @a, $_;
      next;
    }
    if (m{>}+1..m{</-1}) {
      push @a, substr($_,0,60);
      next;
    }
    if (m{</CompanyName>}) {
      push @a, $_;
      next;
    }
    print @a;
    @a = ();
    next;
  }
  print;
}
1;

Each and every day is a good day to learn.
Please use plain text.
Honored Contributor
Eric Antunes
Posts: 1,931
Registered: ‎06-15-2003
Message 3 of 11 (385 Views)

Re: Help with perl

Now, it is almost working but it still doesn't limit the string to 60 characters:

 

#!/usr/bin/perl
use strict;
use warnings;
#my @d = ();
while (<>) {
  if (m{<CompanyName>}..m{</CompanyName>}) {
    if (m{<CompanyName>}) {
      print;
      next;
    }
    if (m{>}+1..m{</-1}) {
      print substr($_,0,60);
      next;
    }
    if (m{</CompanyName>}) {
      print;
      next;
    }
  }
  print;
}
1;

Each and every day is a good day to learn.
Please use plain text.
Honored Contributor
H.Merijn Brand (procura
Posts: 6,185
Registered: ‎10-13-1997
Message 4 of 11 (376 Views)

Re: Help with perl

[ Edited ]

Why jump through diffucult hoops?

 

while (<>) {
s{(<(CompanyName)>)(.{0,60}).*?</\1>}{<$1>$2</$1>};
}

 

 If the content for this tag is longer than 60 characters, truncate to 60

 

(/me still thinks you should use XML::Parser 

Enjoy, Have FUN! H.Merijn
Please use plain text.
Honored Contributor
Eric Antunes
Posts: 1,931
Registered: ‎06-15-2003
Message 5 of 11 (362 Views)

Re: Help with perl

Hi Merijn,

 

With your last script I get an empty file.

 

This is working but just for the first occurence:

 

#!/usr/bin/perl
use strict;
use warnings;
while (<>) {
  if (m{<CompanyName>}..m{</CompanyName>}) {
    if (m{</?CompanyName>}) {
      if (length($_) gt 87) {
        print substr($_,0,73);
        print "</CompanyName>\n";
        next;
      }
      print;
      next;
    }
  }
  print;
}
1;

 

Eric

Each and every day is a good day to learn.
Please use plain text.
Acclaimed Contributor
James R. Ferguson
Posts: 21,184
Registered: ‎07-06-2000
Message 6 of 11 (352 Views)

Re: Help with perl


Eric Antunes wrote:

Hi Merijn,

 

With your last script I get an empty file.

 


Hi Eric:

 

Try this:

 

#!/usr/bin/perl
use strict;
use warnings;
while (<>) {
    s{(<(CompanyName)>)(.{0,60}).*?</\2>}{<$1>$3</$2>};
    print;
}
1;

Regards!

 

...JRF...

Please use plain text.
Honored Contributor
H.Merijn Brand (procura
Posts: 6,185
Registered: ‎10-13-1997
Message 7 of 11 (345 Views)

Re: Help with perl

*I* was just "missing" the print. *You* overcomplicate the regex and generate invalid XML :)

$1 already includes < and >, so you'll end up with

 

<<CompanyName>>Whatever</CompanyName> 

Enjoy, Have FUN! H.Merijn
Please use plain text.
Acclaimed Contributor
James R. Ferguson
Posts: 21,184
Registered: ‎07-06-2000
Message 8 of 11 (338 Views)

Re: Help with perl


H.Merijn Brand (procura wrote:

*I* was just "missing" the print. *You* overcomplicate the regex and generate invalid XML :)

$1 already includes < and >, so you'll end up with

 

<<CompanyName>>Whatever</CompanyName> 


Yes, my friend, I missed the doubled angle backets :-( and needlessly complicated the regex :-((

 

Yes, too, the missing print was obvious.

 

BUT, your original version did not limit the string :-(

 

You had:

 

s{(<(CompanyName)>)(.{0,60}).*?</\1>}{<$1>$2</$1>};

whereas I should have used:

 

s{<(CompanyName)>(.{0,60}).*?</\1>}{<$1>$2</$1>};

Regards!

 

...JRF...

Please use plain text.
Honored Contributor
H.Merijn Brand (procura
Posts: 6,185
Registered: ‎10-13-1997
Message 9 of 11 (332 Views)

Re: Help with perl

That is what one gets if not testing code :/

I indeed obviously had one pair of parens too many.

 

For completeness sake - we both made too many simple mistakes -, here is the full version:

$ cat modify.pl
use strict;
use warnings;

while (<>) {
    s{<(CompanyName)>(.{0,60}).*?</\1>}{<$1>$2</$1>};
    # other modifications here
    print;
    }
$ perl -wc modify.pl
modify.pl syntax OK
$ perl modify.pl myfile.xml > modified.xml

 

 

Enjoy, Have FUN! H.Merijn
Please use plain text.
Honored Contributor
Eric Antunes
Posts: 1,931
Registered: ‎06-15-2003
Message 10 of 11 (314 Views)

Re: Help with perl

Exactly Merijn, you just posted the right script.

 

Although I didn't understand the s{} part, It worked wonderfuly!

 

But I will try to understand it.

 

Thank you,

 

Eric

Each and every day is a good day to learn.
Please use plain text.
Honored Contributor
H.Merijn Brand (procura
Posts: 6,185
Registered: ‎10-13-1997
Message 11 of 11 (312 Views)

Re: Help with perl

lemme (try to) explain:

s{<(CompanyName)>(.{0,60}).*?</\1>}{<$1>$2</$1>};

 make that more readable and still legal:

s{ <(CompanyName)>    # Search for the opening tag (keep tag name in $1)
   (.{0,60})  .*?     # Keep 0 to 60 characters in $2, ignore rest to
   </\1>              # The closing tag (\1 == $1 in the match part)
   }{<$1>$2</$1>}x;   # Replacement pattern

 all between parens is "captured". The first cature goes to $1, the next to $2 etc. If captures are nested, the outermost capture gets the lowest index: the index of the capture is the number of opening paren found. (unless you use (?|...) in newer perls, but we do not use that here).

So after "<CompanyName>" matched, $1 now contains "CompanyName".

The next line captures .{0,60}, which means "any character between 0 and 60 times". The patter .*? means a non-greedy match on any number of characters until the next part of the match which prevails over the otherwise greedy .* when we would not add the ?

as that is not in parens, it is just forgotten

The last part of the match is matching </\1> where \1 is the content of $1. We cannot use $1 there, as we are still inside the matching part. </\1> in this case is essentially the same as matching on </CompanyName>, which is more typing and more error-prone.

after the closing } of the match, the substitution pattern puts it all together again. The x after the last } enables us to split up the matching pattern over several lines and add whitespace and comments. 

 

Enjoy, Have FUN! H.Merijn
Please use plain text.
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation