I have parsed all rows containing urls from a text file and appended line breaks, and I want to make the links clickable in a new file.
How do I append <a href>
-tags around only the urls, using standard linux tools, preferably awk? It needs to be automatable in cron.
For example,
source file chaturls.txt:
12:30 <user> check this: https://link.to/stuff.jpg</br>
13:47 <user4> https://another.link.lol eyyyy</br>
desired output in new file, chatlinkified.html:
12:30 <user> check this: <a href='https://link.to/stuff.jpg'>https://link.to/stuff.jpg</a></br>
13:47 <user4> <a href='https://another.link.lol'>https://another.link.lol</a> eyyyy</br>
I tried awk '{printf "<a href=\"%s\">%s</a><br>", $0,$0}' chaturls.txt > chatlinkified.html
, but this makes the whole line an (invalid) clickable link.
sed -E 's@(https?://[^[:space:]/$.?#].[^[:space:]<]*)@<a href="\1">\1</a>@g' chaturls.txt > chatlinkified.html
You can use sed and refer back to the matched group with \1
. NB. here I separate using the @ instead of / (as in s/../../g), you are free the use any character and this saves some escapes.
The regex for finding the URL does some validation checks for the first character after the https?:// and then proceeds the match until a space or the starting bracket of another tag.
You can if you want to use a more simpler regex for the url like, given in one of the comments https?://[^ ]*)
which doesn't include this small validation.
You can find more extensive validated url regex here: https://mathiasbynens.be/demo/url-regex (But you have to convert from PHP regex to sed extended regex)
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments