Re: sed/awk/perl help (237 Views)
Reply
Frequent Advisor
allanm77
Posts: 88
Registered: ‎06-27-2011
Message 1 of 8 (300 Views)
Accepted Solution

sed/awk/perl help

[ Edited ]

Hi All,

 

I have a following piece of output from a wget command and I want to strip the html tags.

 

 

    <td>INACCESSIBLE</td>
    <td>hostname:port <a title="inspect" href="dest/hostname:port">[I]</a> <a title="debugger" href="http://hostname:port/debug">[D]</a> <a title="browser" href="http://hostname/Browser.jsp/Url=http%3A//hostname%3A8501/oa/ww&amp;serviceUrl=http%3A//hostname%3A1port/service">[SB]</a></td>

Want the output something like this -

 

INACCESSIBLE

hostname:port

 

I have been successful with a bunch of awk/sed but it would be great if it can be done in a single command.

 

Thanks,

Allan

Please use plain text.
Acclaimed Contributor
Dennis Handly
Posts: 24,385
Registered: ‎03-06-2006
Message 2 of 8 (287 Views)

Re: sed/awk/perl help

[ Edited ]

>Want the output something like this:

>hostname:port

 

Don't you want something like this?:

hostname:port [I] [D]  [SB]

 

Or do you just want to ignore anything in the <a ...> ... </a> blocks?

Please use plain text.
Frequent Advisor
allanm77
Posts: 88
Registered: ‎06-27-2011
Message 3 of 8 (281 Views)

Re: sed/awk/perl help

Yes, that would do too if the html works out and points to the right links.

Thanks,
Allan.
Please use plain text.
Frequent Advisor
allanm77
Posts: 88
Registered: ‎06-27-2011
Message 4 of 8 (258 Views)

Re: sed/awk/perl help

[ Edited ]

Dennis, perl would do too but sed / awk is preferable

thx.

Please use plain text.
Honored Contributor
H.Merijn Brand (procura
Posts: 6,185
Registered: ‎10-13-1997
Message 5 of 8 (237 Views)

Re: sed/awk/perl help

I'd say not to try to get into html digging using regular expressions (with whatever language). You'll end up changing them all the time, as the html changes. Been there done that.

There are several fine ways to "convert" HTML to something else. If you want just text, use the proper tools (e.g. lynx):

 

$ lynx -dump file.html
$ lynx -dump http://some.host.com/index.htm

 If you want to use a scripting language, WWW::Mechanize, LWP, LWP::Simple, LWP::UserAgent, and HTML::TreeBuilder are your friends in perl:

#!/usr/bin/perl

use strict;
use warnings;

use LWP::Simple;
use HTML::TreeBuilder;

my $site = get ("http://h30499.www3.hp.com/t5/Languages-and-Scripting/bd-p/itrc-150");
my $tree = HTML::TreeBuilder->new;
$tree->parse_content ($site) or die "Cannot parse as HTML\n";

# Print the whole page formatted
print $tree->as_HTML (undef, "  ", {});

# Print all <a> tags pointing to something with scripting in it
for ($tree->look_down (_tag => "a", href => qr{scripting}i)) {
    print "A: ", $_->as_text, "\t=> ", $_->attr ("href"), "\n";
    }

 

 

Enjoy, Have FUN! H.Merijn
Please use plain text.
Frequent Advisor
allanm77
Posts: 88
Registered: ‎06-27-2011
Message 6 of 8 (231 Views)

Re: sed/awk/perl help

Thanks Merjin!

 

But the problem is that the wget is part of a script , so if there is a Perl one-liner or sed/awk combination that is what is preferable.

 

Allan.

 

 

Please use plain text.
Frequent Advisor
allanm77
Posts: 88
Registered: ‎06-27-2011
Message 7 of 8 (217 Views)

Re: sed/awk/perl help

Used sed to fix this.

 

sed 's/<[^>]*>//

 

Thanks All.

Please use plain text.
Honored Contributor
H.Merijn Brand (procura
Posts: 6,185
Registered: ‎10-13-1997
Message 8 of 8 (212 Views)

Re: sed/awk/perl help

The "get ()" in the perl example is almost exactly what wget does. My perl scriplet is just an example. Modify to your hearts content.
One-lines to analyze HTML? Forget it! (it works once, but you'll end up with very very long lines to stick to a single line.
Enjoy, Have FUN! H.Merijn
Please use plain text.
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation