已解決430363個問題，去搜搜看，總會有你想問的

在 Go 中匹配 html 標記之外的 html 文本的最佳方法是什么？

首頁猿問在 Go 中匹配 html...

在 Go 中匹配 html 標記之外的 html 文本的最佳方法是什么？

小唯快跑啊 2022-04-26 10:48:37

我有一堆我正在解析的 html，<a>如果它們包含某些文本，我需要刪除它們。通常，我會使用 Goquery，但我正在搜索的文本通常不在 html 標記本身的范圍內。例如，這個 html：<html><body>This is the start. <a href="http://example.com/path">We don't want to match this text.</a><a href="http://www.example.com/another/path" style="font-family:Arial, Helvetica, 'sans-serif'; color:#838383;font-size:12px; line-height:14px"></a> match this text.<a href="blah">We also don't want to match this text</a></body></html>我正在使用這個正則表達式，但它失敗并匹配我不想匹配的文本：(?is)<a[^>]+href=["'](?P<link>.*?)["']*.?> match this text\.https://regex101.com/r/iEXpqc/1

查看完整描述

1 回答

回首憶惘然

TA貢獻1847條經驗獲得超11個贊

像這樣，使用路徑（不是去，但邏輯可以重新實現）：

xmlstarlet ed -d '//a[contains(text(), "want to match")]' file.html

輸出

<?xml version="1.0"?>

<html>

<body>

This is the start.

<a href="http://www.example.com/another/path" style="font-family:Arial, Helvetica, 'sans-serif'; color:#838383;font-size:12px; line-height:14px"/> match this text.

</body>

</html>

筆記

-L如果要即時更換，請添加開關

反對回復 2022-04-26

1 回答
0 關注
176 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

在 Go 中匹配 html 標記之外的 html 文本的最佳方法是什么？

在 Go 中匹配 html 標記之外的 html 文本的最佳方法是什么？

1 回答

添加回答