首頁猿問 Python 3...

Python 3 Beautifulsoup：獲取帶有特定文本的span標簽值，該文本也隨機放置在

Html5

呼如林 2024-01-22 15:50:52

我嘗試在這里搜索這個，但老實說找不到答案，因為這應該很容易用 Selenium 來完成，但由于性能是一個重要因素，所以我正在考慮用 Beautifulsoup 來代替。場景：我需要抓取根據用戶輸入以隨機方式生成的不同商品的價格，請參見下面的代碼：<div class="sk-expander-content" style="display: block;"><ul> <li> <span>Third Party Liability</span> <span>€756.62</span> </li> <li> <span>Fire & Theft</span> <span>€15.59</span> </li></ul></div>如果這些選項是靜態的并且總是顯示在 html 中的相同位置，那么很容易抓取價格，但由于這些選項可以放置在中的任何位置div sk-expander-content，我不確定如何以動態方式找到它們。最好的方法是編寫一個方法來傳遞我們正在查找的范圍文本并返回歐元值。跨度標簽的結構始終相同，第一個跨度始終是商品名稱，第二個跨度始終是價格。我首先想到的是下面的代碼，但我不確定這是否足夠強大或者是否有意義：html = driver.page_sourcesoup = BeautifulSoup(html, "html.parser")div_i_need = soup.find_all("div", class_="sk-expander-content")[1]def price_scraper(text_to_find): for el in div_i_need.find_all(['ul', 'li', 'span']): if el.name == 'span': if el[0].text == text_to_find: return(el[1].text)我們將非常感謝您的幫助。

查看完整描述

2 回答

猛跑小豬

TA貢獻1858條經驗獲得超8個贊

使用正則表達式。

import re

html='''<div class="sk-expander-content" style="display: block;">

<ul>

<li>

<span>Third Party Liability</span>

</li>

<li>

<span>Fire & Theft</span>

</li>

</ul>

</div>

<ul>

<li>

<span>Fire & Theft</span>

</li>

<li>

<span>Third Party Liability</span>

</li>

</ul>

</div>'''

soup = BeautifulSoup(html, "html.parser")

for item in soup.find_all(class_="sk-expander-content"):

for span in item.find_all('span',text=re.compile("€(\d+).(\d+)")):

print(span.find_previous_sibling('span').text)

print(span.text)

輸出：

Third Party Liability

€756.62

Fire & Theft

€15.59

Fire & Theft

€756.62

Third Party Liability

€15.59

更新：如果您想獲取第一個節點值。然后使用find()而不是find_all()。

import re

html='''<div class="sk-expander-content" style="display: block;">

<ul>

<li>

<span>Third Party Liability</span>

</li>

<li>

<span>Fire & Theft</span>

</li>

</ul>

</div>

<ul>

<li>

<span>Fire & Theft</span>

</li>

<li>

<span>Third Party Liability</span>

</li>

</ul>

</div>'''

soup = BeautifulSoup(html, "html.parser")

for span in soup.find(class_="sk-expander-content").find_all('span',text=re.compile("€(\d+).(\d+)")):

print(span.find_previous_sibling('span').text)

print(span.text)

反對回復 2024-01-22

慕蓋茨4494581

TA貢獻1850條經驗獲得超11個贊

from bs4 import BeautifulSoup

import re

html = """

<ul>

<li>

<span>Third Party Liability</span>

</li>

<li>

<span>Fire & Theft</span>

</li>

</ul>

</div>

"""

soup = BeautifulSoup(html, 'html.parser')

target = soup.select("div.sk-expander-content")

for tar in target:

data = [item.text for item in tar.findAll("span", text=re.compile("€"))]

print(data)

輸出：

['€756.62', '€15.59']

注意：我使用了selectwhich returnResultSet來查找所有div.

反對回復 2024-01-22

2 回答
0 關注
279 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

Python 3 Beautifulsoup：獲取帶有特定文本的span標簽值，該文本也隨機放置在

Python 3 Beautifulsoup：獲取帶有特定文本的span標簽值，該文本也隨機放置在

2 回答

添加回答

Python 3 Beautifulsoup：獲取帶有特定文本的span標簽值，該文本也隨機放置在