Adding/removing some tabs and line breaks in an HTML code using sed

debugcn Published at Dev

Anas R.

I want to refine an HTML code using sed, as an extra refinement procedure after refining it using HTML Tidy, as HTML Tidy doesn’t look flexible enough for some requirements.

I used this command to add some tabs and/or line breaks to some tags and remove them from others:

s/<li>/\t&/g
s/\n<\/li>/<\/li>/g

The first command worked fine unless li has an attribute, so, how can I target an opening tag regardless of whether it has an attribute or not?
The second command didn’t work at all. I want here to put the closing tag </li> at the end of the previous line.

John1024

Consider this sample file:

$ cat sample.html 
<li a=x>Point One
</li>
<li>Point Two
</li>

I believe that this sed command does what you ask (this may require GNU sed):

$ sed -Ez 's|<li\b|\t<li|g; s|\n</li\b|</li|g' sample.html
        <li a=x>Point One</li>
        <li>Point Two</li>

How it works

-E

Use extended regex.
-z

Read nul-delimited data. Since a proper html file has not nul-characters, this has the effect of reading in the whole file at once.
s|<li\b|\t<li|g

This puts a tab in front of every occurrence of <li followed by a word boundary.
s|\n</li\b|</li|g

This replaces every occurrence of newline followed by <li followed by a word boundary with <li.

A variation: putting `<li>` on its own line

$ sed -Ez 's|<li[^>]*>|&\n|g; s|\n</li\b|</li|g' sample.html
<li a=x>
Point One</li>
<li>
Point Two</li>

Obligatory warning

html can be complex and these sed commands are only intended to work on simple cases.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-07-5

Comments

0 comments

From Dev

Removing html line breaks using Javascript

From Dev

Using sed for change some parts of code

From Dev

SED remove spaces and line breaks

From Dev

Copy region without line breaks and tabs in Vim

From Dev

CakePHP Xml Library Save with line breaks and tabs

From Dev

Line breaks using htmlspecialchars?

From Dev

How preserve line breaks and escape slashes using jQuery .html()

From Dev

Eliminating line breaks when using HTML Tables in a Web Application

From Dev

Conditional insert of line breaks to number sequence (preferably using bash, awk, or sed)

From Dev

Line breaks at the end of HTML tags?

From Dev

Line breaks causes spaces in the html

From Dev

Line breaks causes spaces in the html

From Dev

GWT HTML not showing line breaks

From Dev

Insert line breaks using jQuery

From Dev

Command-line recursive code cleanup, '){' to ') {', using grep/sed

From Dev

Regex pattern - ignore whitespace, line breaks, tabs etc

From Dev

insert line in HTML code job using shell

From Dev

insert line in HTML code job using shell

From Dev

insert a line using sed

From Dev

jQuery tabs bug when using it with another jQuery/JS code on some tab

From Dev

Destructuring code by using breaks in Java

From Dev

Destructuring code by using breaks in Java

From Dev

PHP include() some html code into a DOM object using html()

From Dev

New line breaks typescript code in Angular 2

From Dev

Line breaks in R Markdown text (not code blocks)

From Dev

TinyMCE removes line breaks within <pre><code>

From Dev

TinyMCE removes line breaks within <pre><code>

From Dev

PHP: How to keep line-breaks using nl2br() with HTML Purifier?

From Dev

Using regex to replace characters between two strings while ignoring html tags and new line breaks

Related Related

Article