Help with perl (557 Views)
Reply
Honored Contributor
Eric Antunes
Posts: 1,944
Registered: ‎06-15-2003
Message 1 of 11 (557 Views)

Help with perl

[ Edited ]

I'm new to perl scripting and I need to limit to 60 characters the following String in this pattern in a xml file:

 

<CompanyName>String</CompanyName>

 

But perl doesn't seem to recognize the instr(big, little) function:

 

#!/usr/bin/perl
use strict;
use warnings;
my @a = ();
my @b = ();
my @c = ();
my @d = ();
while (<>) {
  if (m{<CompanyName>}..m{</CompanyName>}) {
    if (m{</?CompanyName>}) {
      push @a, instr($_,">");
      push @b, instr($_,"</");
      push @c, substr(substr($_,@a+1,@b - @a-1),0,60);
      push @d, substr($_,0,@a)||@c||substr($_,@b,14);
      print @d;
      @a = ();
      @b = ();
      @c = ();
      @d = ();
      next;
    }
    print;
  }
  print;
}
1;
 

 

Eric

Each and every day is a good day to learn.
Honored Contributor
Eric Antunes
Posts: 1,944
Registered: ‎06-15-2003
Message 2 of 11 (548 Views)

Re: Help with perl

Now, it erases the entire pattern:

 

#!/usr/bin/perl
use strict;
use warnings;
my @a = ();
#my @b = ();
#my @c = ();
#my @d = ();
while (<>) {
  if (m{<CompanyName>}..m{</CompanyName>}) {
    if (m{<CompanyName>}) {
      push @a, $_;
      next;
    }
    if (m{>}+1..m{</-1}) {
      push @a, substr($_,0,60);
      next;
    }
    if (m{</CompanyName>}) {
      push @a, $_;
      next;
    }
    print @a;
    @a = ();
    next;
  }
  print;
}
1;

Each and every day is a good day to learn.
Honored Contributor
Eric Antunes
Posts: 1,944
Registered: ‎06-15-2003
Message 3 of 11 (546 Views)

Re: Help with perl

Now, it is almost working but it still doesn't limit the string to 60 characters:

 

#!/usr/bin/perl
use strict;
use warnings;
#my @d = ();
while (<>) {
  if (m{<CompanyName>}..m{</CompanyName>}) {
    if (m{<CompanyName>}) {
      print;
      next;
    }
    if (m{>}+1..m{</-1}) {
      print substr($_,0,60);
      next;
    }
    if (m{</CompanyName>}) {
      print;
      next;
    }
  }
  print;
}
1;

Each and every day is a good day to learn.
Honored Contributor
H.Merijn Brand (procura
Posts: 6,189
Registered: ‎10-13-1997
Message 4 of 11 (537 Views)

Re: Help with perl

[ Edited ]

Why jump through diffucult hoops?

 

while (<>) {
s{(<(CompanyName)>)(.{0,60}).*?</\1>}{<$1>$2</$1>};
}

 

 If the content for this tag is longer than 60 characters, truncate to 60

 

(/me still thinks you should use XML::Parser 

Enjoy, Have FUN! H.Merijn
Honored Contributor
Eric Antunes
Posts: 1,944
Registered: ‎06-15-2003
Message 5 of 11 (523 Views)

Re: Help with perl

Hi Merijn,

 

With your last script I get an empty file.

 

This is working but just for the first occurence:

 

#!/usr/bin/perl
use strict;
use warnings;
while (<>) {
  if (m{<CompanyName>}..m{</CompanyName>}) {
    if (m{</?CompanyName>}) {
      if (length($_) gt 87) {
        print substr($_,0,73);
        print "</CompanyName>\n";
        next;
      }
      print;
      next;
    }
  }
  print;
}
1;

 

Eric

Each and every day is a good day to learn.
Acclaimed Contributor
James R. Ferguson
Posts: 21,184
Registered: ‎07-06-2000
Message 6 of 11 (513 Views)

Re: Help with perl


Eric Antunes wrote:

Hi Merijn,

 

With your last script I get an empty file.

 


Hi Eric:

 

Try this:

 

#!/usr/bin/perl
use strict;
use warnings;
while (<>) {
    s{(<(CompanyName)>)(.{0,60}).*?</\2>}{<$1>$3</$2>};
    print;
}
1;

Regards!

 

...JRF...

Honored Contributor
H.Merijn Brand (procura
Posts: 6,189
Registered: ‎10-13-1997
Message 7 of 11 (506 Views)

Re: Help with perl

*I* was just "missing" the print. *You* overcomplicate the regex and generate invalid XML :)

$1 already includes < and >, so you'll end up with

 

<<CompanyName>>Whatever</CompanyName> 

Enjoy, Have FUN! H.Merijn
Acclaimed Contributor
James R. Ferguson
Posts: 21,184
Registered: ‎07-06-2000
Message 8 of 11 (499 Views)

Re: Help with perl


H.Merijn Brand (procura wrote:

*I* was just "missing" the print. *You* overcomplicate the regex and generate invalid XML :)

$1 already includes < and >, so you'll end up with

 

<<CompanyName>>Whatever</CompanyName> 


Yes, my friend, I missed the doubled angle backets :-( and needlessly complicated the regex :-((

 

Yes, too, the missing print was obvious.

 

BUT, your original version did not limit the string :-(

 

You had:

 

s{(<(CompanyName)>)(.{0,60}).*?</\1>}{<$1>$2</$1>};

whereas I should have used:

 

s{<(CompanyName)>(.{0,60}).*?</\1>}{<$1>$2</$1>};

Regards!

 

...JRF...

Honored Contributor
H.Merijn Brand (procura
Posts: 6,189
Registered: ‎10-13-1997
Message 9 of 11 (493 Views)

Re: Help with perl

That is what one gets if not testing code :/

I indeed obviously had one pair of parens too many.

 

For completeness sake - we both made too many simple mistakes -, here is the full version:

$ cat modify.pl
use strict;
use warnings;

while (<>) {
    s{<(CompanyName)>(.{0,60}).*?</\1>}{<$1>$2</$1>};
    # other modifications here
    print;
    }
$ perl -wc modify.pl
modify.pl syntax OK
$ perl modify.pl myfile.xml > modified.xml

 

 

Enjoy, Have FUN! H.Merijn
Honored Contributor
Eric Antunes
Posts: 1,944
Registered: ‎06-15-2003
Message 10 of 11 (475 Views)

Re: Help with perl

Exactly Merijn, you just posted the right script.

 

Although I didn't understand the s{} part, It worked wonderfuly!

 

But I will try to understand it.

 

Thank you,

 

Eric

Each and every day is a good day to learn.
Honored Contributor
H.Merijn Brand (procura
Posts: 6,189
Registered: ‎10-13-1997
Message 11 of 11 (473 Views)

Re: Help with perl

lemme (try to) explain:

s{<(CompanyName)>(.{0,60}).*?</\1>}{<$1>$2</$1>};

 make that more readable and still legal:

s{ <(CompanyName)>    # Search for the opening tag (keep tag name in $1)
   (.{0,60})  .*?     # Keep 0 to 60 characters in $2, ignore rest to
   </\1>              # The closing tag (\1 == $1 in the match part)
   }{<$1>$2</$1>}x;   # Replacement pattern

 all between parens is "captured". The first cature goes to $1, the next to $2 etc. If captures are nested, the outermost capture gets the lowest index: the index of the capture is the number of opening paren found. (unless you use (?|...) in newer perls, but we do not use that here).

So after "<CompanyName>" matched, $1 now contains "CompanyName".

The next line captures .{0,60}, which means "any character between 0 and 60 times". The patter .*? means a non-greedy match on any number of characters until the next part of the match which prevails over the otherwise greedy .* when we would not add the ?

as that is not in parens, it is just forgotten

The last part of the match is matching </\1> where \1 is the content of $1. We cannot use $1 there, as we are still inside the matching part. </\1> in this case is essentially the same as matching on </CompanyName>, which is more typing and more error-prone.

after the closing } of the match, the substitution pattern puts it all together again. The x after the last } enables us to split up the matching pattern over several lines and add whitespace and comments. 

 

Enjoy, Have FUN! H.Merijn
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.