Thursday 15 December 2011

Easy way to extract HTML text from a .htm file

Easy way to extract HTML text from a .htm file

Suppose I have a .htm file that has a lot of complex HTML code. I want to run a command from the shell that says "For this file index.htm extract all anchor tags (<a href...) and everything inside them and ending with .../a> and post it to standard output and separate each with a newline." What utility should I use for this? Should I use sed? awk? vi?

No comments:

Post a Comment