已解決430363個問題，去搜搜看，總會有你想問的

使用BeautifulSoup從html中提取除script標簽內容之外的文本

首頁猿問使用BeautifulSoup從h...

使用BeautifulSoup從html中提取除script標簽內容之外的文本

Python

呼啦一陣風 2021-09-11 20:23:40

我有這樣的html<span class="age"> Ages 15 <span class="loc" id="loc_loads1"> </span> <script> getCurrentLocationVal("loc_loads1",29.45218856,59.38139268,1); </script></span>我正在嘗試Age 15使用BeautifulSoup所以我寫了python代碼如下代碼：from bs4 import BeautifulSoup as bsimport urllib3URL = 'html file'http = urllib3.PoolManager()page = http.request('GET', URL)soup = bs(page.data, 'html.parser')age = soup.find("span", {"class": "age"})print(age.text)輸出：Age 15 getCurrentLocationVal("loc_loads1",29.45218856,59.38139268,1);我只想要標簽Age 15內的功能script。有沒有辦法只獲取 text: Age 15？或者有什么方法可以排除script標簽的內容？PS：script標簽太多，URL不同。我不喜歡從輸出中替換文本。

查看完整描述

2 回答

幕布斯7119047

TA貢獻1794條經驗獲得超8個贊

用 .find(text=True)

前任：

from bs4 import BeautifulSoup

html = """<span class="age">

Ages 15

</span>

getCurrentLocationVal("loc_loads1",29.45218856,59.38139268,1);

</script>

</span>"""

soup = BeautifulSoup(html, "html.parser")

print(soup.find("span", {"class": "age"}).find(text=True).strip())

輸出：

Ages 15

反對回復 2021-09-11

臨摹微笑

TA貢獻1982條經驗獲得超2個贊

遲到的答案，但為了將來參考，您還可以使用分解（）從中刪除所有script元素html，即：

soup = BeautifulSoup(html, "html.parser")

# remove script and style elements

for script in soup(["script", "style"]):

script.decompose()

print(soup.find("span", {"class": "age"}).text.strip())

# Ages 15

反對回復 2021-09-11

2 回答
0 關注
367 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

使用BeautifulSoup從html中提取除script標簽內容之外的文本

使用BeautifulSoup從html中提取除script標簽內容之外的文本

2 回答

添加回答