Extract text content from HTML in Golang

user3173591

What's the best way to extract inner substrings from strings in Golang?

input:

"Hello <p> this is paragraph </p> this is junk <p> this is paragraph 2 </p> this is junk 2"

output:

"this is paragraph \n
 this is paragraph 2"

Is there any string package/library for Go that already does something like this?

package main

import (
    "fmt"
    "strings"
)

func main() {
    longString := "Hello world <p> this is paragraph </p> this is junk <p> this is paragraph 2 </p> this is junk 2"

    newString := getInnerStrings("<p>", "</p>", longString)

    fmt.Println(newString)
   //output: this is paragraph \n
    //        this is paragraph 2

}
func getInnerStrings(start, end, str string) string {
    //Brain Freeze
        //Regex?
        //Bytes Loop?
}

thanks

thwd

Don't use regular expressions to try and interpret HTML. Use a fully capable HTML tokenizer and parser.

I recommend you read this article on CodingHorror.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Extract text from html tag

From Dev

Extract HTML content from string in JQuery

From Dev

Extract text from html page in Go

From Dev

Extract text from html tags in an rss feed

From Dev

PHP Extract all text from html page

From Dev

Parse and extract data from a HTML Text

From Dev

Extract text from href link in html in xpath

From Dev

How to extract text from html page?

From Dev

Cheerio: Extract Text from HTML with separators

From Dev

Parse and extract data from a HTML Text

From Dev

Cheerio: Extract Text from HTML with separators

From Dev

How to extract text from html using XPATH

From Dev

Extract Text from HTML String Java

From Dev

Extract main text from HTML using Cheerio

From Dev

extract tag info from html text

From Dev

how to extract text from html using beautifulsoup?

From Dev

Remove html entities and extract text content using regex

From Dev

How to extract the img src content from a text in ruby

From Dev

Extract text content from cell (With bold, italic, etc)

From Dev

How to extract the img src content from a text in ruby

From Dev

Extract text content from Tika without specifying the file header

From Dev

How to extract all td's text content from an element in the array?

From Dev

Extract News article content from stored .html pages

From Dev

Extract html tag from content:encoded in Yahoo Pipes

From Dev

How to extract content of html tags from a string using javascript or angularjs?

From Dev

JQuery: extract content from ajax-read html page

From Dev

How to extract content of html tags from a string using javascript or angularjs?

From Dev

Extract a certain content from html using python BeautifulSoup

From Dev

Extract the content from a file between two match patterns (Extract only HTML from a file)

Related Related

  1. 1

    Extract text from html tag

  2. 2

    Extract HTML content from string in JQuery

  3. 3

    Extract text from html page in Go

  4. 4

    Extract text from html tags in an rss feed

  5. 5

    PHP Extract all text from html page

  6. 6

    Parse and extract data from a HTML Text

  7. 7

    Extract text from href link in html in xpath

  8. 8

    How to extract text from html page?

  9. 9

    Cheerio: Extract Text from HTML with separators

  10. 10

    Parse and extract data from a HTML Text

  11. 11

    Cheerio: Extract Text from HTML with separators

  12. 12

    How to extract text from html using XPATH

  13. 13

    Extract Text from HTML String Java

  14. 14

    Extract main text from HTML using Cheerio

  15. 15

    extract tag info from html text

  16. 16

    how to extract text from html using beautifulsoup?

  17. 17

    Remove html entities and extract text content using regex

  18. 18

    How to extract the img src content from a text in ruby

  19. 19

    Extract text content from cell (With bold, italic, etc)

  20. 20

    How to extract the img src content from a text in ruby

  21. 21

    Extract text content from Tika without specifying the file header

  22. 22

    How to extract all td's text content from an element in the array?

  23. 23

    Extract News article content from stored .html pages

  24. 24

    Extract html tag from content:encoded in Yahoo Pipes

  25. 25

    How to extract content of html tags from a string using javascript or angularjs?

  26. 26

    JQuery: extract content from ajax-read html page

  27. 27

    How to extract content of html tags from a string using javascript or angularjs?

  28. 28

    Extract a certain content from html using python BeautifulSoup

  29. 29

    Extract the content from a file between two match patterns (Extract only HTML from a file)

HotTag

Archive