首頁猿問 Unmarshal HTML...

Unmarshal HTML 嵌套在 XML 中

長風秋雁 2022-08-01 16:44:57

我從第三方收到一個xml文件，該文件在其中一個XML標記中具有HTML元素。我無法弄清楚如何解構它以獲取href URL。XML 示例：<SOME_HTML> <a href="http://www.google.com" target="_blank"> google</a></SOME_HTML>到目前為止，這是我所達到的，但沒有向結構中添加任何內容：type Href struct { Link string `xml:"href"`}type Link struct { URL []Href `xml:"a"`}type XmlFile struct { HTMLTag []Link `xml:"SOME_HTML"`}myFile := []byte(`<?xml version="1.0" encoding="utf-8"?><SOME_HTML> <a href="http://www.google.com" target="_blank"> google</a></SOME_HTML>`)var output XmlFileerr := xml.Unmarshal(myFile, &output)fmt.Println(output) // {[]}

查看完整描述

3 回答

青春有我

TA貢獻1784條經驗獲得超8個贊

你可以這樣做（https://play.golang.org/p/MJzAVLBFfms）：

type aElement struct {

Href string `xml:"href,attr"`

}

type content struct {

A aElement `xml:"a"`

}

func main() {

test := `<SOME_HTML><a href="http://www.google.com" target="_blank">google</a></SOME_HTML>`

var result content

if err := xml.Unmarshal([]byte(test), &result); err != nil {

log.Fatal(err)

}

fmt.Println(result)

}

反對回復 2022-08-01

瀟湘沐

TA貢獻1816條經驗獲得超6個贊

解析 xml 中的所有內容，假設 html 或其他標記（如）中也可能有多個標記。adiv

如果不需要這樣做，只需替換為類型（不是XmlFile.LinksXmlFile.LinkLink[]Link)

func main() {

type Link struct {

XMLName xml.Name `xml:"a"`

URL string `xml:"href,attr"`

Target string `xml:"target,attr"`

Content string `xml:",chardata"`

}

type Div struct {

XMLName xml.Name `xml:"div"`

Classes string `xml:"class,attr"`

Content string `xml:",chardata"`

}

type XmlFile struct {

XMLName xml.Name `xml:"SOME_HTML"`

Links []Link `xml:"a"`

Divs []Div `xml:"div"`

}

myFile := []byte(`<?xml version="1.0" encoding="utf-8"?>

<SOME_HTML>

<a href="http://www.google.com" target="_blank">google</a>

<a href="http://www.facebook.com" target="_blank">facebook</a>

</SOME_HTML>`)

var output XmlFile

err := xml.Unmarshal(myFile, &output)

if err != nil {

log.Fatal(err)

}

fmt.Println(output)

}

操場

編輯：在 xml 中添加了更多標簽，以顯示如何解析不同的標簽類型。

反對回復 2022-08-01

蕭十郎

TA貢獻1815條經驗獲得超13個贊

您可以使用常規XML解析器解析您發布的示例，但是XML語法存在許多例外，這些異常通常被接受為有效的HTML。

我能想到的最簡單的例子是：我所知道的所有html解釋器都明白（未關閉的標簽）與自關閉標簽相同。<br><br><br />

如果您不知道服務另一端的HTML是如何生成的，則最好使用HTML解析器。

例如，有golang.go/x/net/html包，它提供了幾個函數來解析HTML：

https://play.golang.org/p/3hUogiwdRPO

func findFirstHref(n *html.Node, indent string) string {

if n.Type == html.ElementNode {

fmt.Println(" * scanning:" + indent + n.Data)

}

if n.Type == html.ElementNode && n.Data == "a" {

for _, a := range n.Attr {

if a.Key == "href" {

return a.Val

}

for c := n.FirstChild; c != nil; c = c.NextSibling {

href := findFirstHref(c, indent+" ")

if href != "" {

return href

}

return ""

}

func main() {

doc1, err := html.Parse(strings.NewReader(sample1))

if err != nil {

fmt.Println(err)

} else {

fmt.Println("href in sample1:", findFirstHref(doc1, ""))

}

doc2, err := html.Parse(strings.NewReader(sample2))

if err != nil {

fmt.Println(err)

} else {

fmt.Println("href in sample2:", findFirstHref(doc2, ""))

}

const (

sample1 = `<?xml version="1.0" encoding="utf-8"?>

<SOME_HTML>

google</a>

</SOME_HTML>`

// sample2 is an invalid XML document (it has unclosed "<br>" tags):

sample2 = `

<p> line1 <br> line2

Some <br> text

</a>

</p>

)

反對回復 2022-08-01

3 回答
0 關注
141 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

Unmarshal HTML 嵌套在 XML 中

Unmarshal HTML 嵌套在 XML 中

3 回答

添加回答