I want to refine an HTML code using sed, as an extra refinement procedure after refining it using HTML Tidy, as HTML Tidy doesn’t look flexible enough for some requirements.
I used this command to add some tabs and/or line breaks to some tags and remove them from others:
s/<li>/\t&/g
s/\n<\/li>/<\/li>/g
li
has an attribute, so, how can I target an opening tag regardless of whether it has an attribute or not?</li>
at the end of the previous line.Consider this sample file:
$ cat sample.html
<li a=x>Point One
</li>
<li>Point Two
</li>
I believe that this sed
command does what you ask (this may require GNU sed):
$ sed -Ez 's|<li\b|\t<li|g; s|\n</li\b|</li|g' sample.html
<li a=x>Point One</li>
<li>Point Two</li>
-E
Use extended regex.
-z
Read nul-delimited data. Since a proper html file has not nul-characters, this has the effect of reading in the whole file at once.
s|<li\b|\t<li|g
This puts a tab in front of every occurrence of <li
followed by a word boundary.
s|\n</li\b|</li|g
This replaces every occurrence of newline followed by <li
followed by a word boundary with <li
.
<li>
on its own line$ sed -Ez 's|<li[^>]*>|&\n|g; s|\n</li\b|</li|g' sample.html
<li a=x>
Point One</li>
<li>
Point Two</li>
html can be complex and these sed
commands are only intended to work on simple cases.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments