已解決430363個問題，去搜搜看，總會有你想問的

如何使用 BeautifulSoup 在 Python 中接收網站鏈接

首頁猿問如何使用...

如何使用 BeautifulSoup 在 Python 中接收網站鏈接

Python

慕的地6264312 2022-06-22 18:00:09

我想從一個站點（https://www.vanglaini.org/）收集鏈接：/hmarchhak/102217 并將其打印為https://www.vanglaini.org/hmarchhak/102217。請幫忙看圖import requestsimport pandas as pdfrom bs4 import BeautifulSoupsource = requests.get('https://www.vanglaini.org/').textsoup = BeautifulSoup(source, 'lxml')for article in soup.find_all('article'): headline = article.a.text summary=article.p.text link = article.a.href print(headline) print(summary) print(link)print()這是我的代碼。

查看完整描述

1 回答

慕無忌1623718

TA貢獻1744條經驗獲得超4個贊

除非我遺漏了一些標題和摘要似乎是相同的文本。您可以使用:hasbs4 4.7.1+ 來確保您article有一個孩子href；這似乎去掉了article不屬于主體的標簽元素，我懷疑這實際上是你的目標

from bs4 import BeautifulSoup as bs

import requests

base = 'https://www.vanglaini.org'

r = requests.get(base)

soup = bs(r.content, 'lxml')

for article in soup.select('article:has([href])'):

headline = article.h5.text.strip()

summary = re.sub(r'\n+|\r+',' ',article.p.text.strip())

link = f"{base}{article.a['href']})"

print(headline)

print(summary)

print(link)

反對回復 2022-06-22

1 回答
0 關注
127 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

如何使用 BeautifulSoup 在 Python 中接收網站鏈接

如何使用 BeautifulSoup 在 Python 中接收網站鏈接

1 回答

添加回答