首頁猿問如何根據python中的條件提取文本

如何根據python中的條件提取文本

Python

慕尼黑8549860 2023-03-30 10:28:10

我有如下的湯數據。<a href="/title/tt0110912/" title="Quentin Tarantino">Pulp Fiction</a><a href="/title/tt0137523/" title="David Fincher">Fight Club</a><a href="blablabla" title="Yet to Release">Yet to Release</a><a href="something" title="Movies">Coming soon</a>我需要這些標簽中的文本數據a，也許href=/title/*wildcharacter*我的可能看起來像這樣。titles = []for a in soup.find_all("a",href=True): if a.text: titles.append(a.text.replace('\n'," "))print(titles)但是在這種情況下，我會從所有a標簽中獲取文本。我只需href要有"/title/***".

查看完整描述

3 回答

守著一只汪

TA貢獻1872條經驗獲得超4個贊

我猜你想要這樣：

from bs4 import BeautifulSoup

html = '''<a href="/title/tt0110912/" title="Quentin Tarantino">

Pulp Fiction

</a>

Fight Club

</a>

Yet to Release

</a>

Coming soon

</a>

'''

soup = BeautifulSoup(html, 'html.parser')

titles = []

for a in soup.select('a[href*="/title/"]',href=True):

if a.text:

titles.append(a.text.replace('\n'," "))

print(titles)

輸出：

[' Pulp Fiction ', ' Fight Club ']

反對回復 2023-03-30

三國紛爭

TA貢獻1804條經驗獲得超7個贊

您可以使用正則表達式來搜索屬性的內容（在本例中為 href）。

反對回復 2023-03-30

慕哥9229398

TA貢獻1877條經驗獲得超6個贊

1.) 要獲取所有以開頭的<a>標簽，您可以使用 CSS 選擇器。href="/title/"a[href^="/title/"]

2.) 要去除標簽內的所有文本，您可以使用.get_text()with 參數strip=True

soup = BeautifulSoup(html_text, 'html.parser')

out = [a.get_text(strip=True) for a in soup.select('a[href^="/title/"]')]

print(out)

印刷：

['Pulp Fiction', 'Fight Club']

反對回復 2023-03-30

3 回答
0 關注
169 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

如何根據python中的條件提取文本

如何根據python中的條件提取文本

3 回答

添加回答