已解決430363個問題，去搜搜看，總會有你想問的

使用 beautiful-soup 提取特定標簽的元素

首頁猿問使用 beautiful-soup...

使用 beautiful-soup 提取特定標簽的元素

Html5

呼喚遠方 2023-10-04 16:02:01

我想從特定標簽中提取元素。例如 - 一個站點中有四個。每個標簽都有其他兄弟標簽，如 p、h3、h4、ul 等。我想分別查看 h2[1] 元素、h2[2] 元素。這就是我到目前為止所做的。我知道 for 循環沒有任何意義。我也嘗試附加文本但無法成功。然后我嘗試按特定字符串進行搜索，但它給出了該特定字符串的唯一標簽，而不是所有其他元素from bs4 import BeautifulSouppage = "https://www.us-cert.gov/ics/advisories/icsma-20-079-01"resp = requests.get(page)soup = BeautifulSoup(resp.content, "html5lib")content_div=soup.find('div', {"class": "content"})all_p= content_div.find_all('p')all_h2=content_div.find_all('h2')i=0for h2 in all_h2: print(all_h2[i],'\n\n') print(all_p[i],'\n') i=i+1還嘗試使用追加 tags = soup.find_all('div', {"class": "content"}) container = [] for tag in tags: try: container.append(tag.text) print(tag.text) except: print(tag)我是編程方面的新手。請原諒我糟糕的編碼能力。我只想看到一切都在“緩解”之下。因此，如果我想將其存儲在數據庫中，它將解析與一列上的緩解相關的所有內容。

查看完整描述

1 回答

ITMISS

TA貢獻1871條經驗獲得超8個贊

["p","ul","h2","div"]您可以使用findNextwith查找靜態標簽列表recursive=False以保持在頂層：

import requests

from bs4 import BeautifulSoup

import json

resp = requests.get("https://www.us-cert.gov/ics/advisories/icsma-20-079-01")

soup = BeautifulSoup(resp.content, "html.parser")

content_div = soup.find('div', {"class": "content"})

h2_list = [ i for i in content_div.find_all("h2")]

result = []

search_tags = ["p","ul","h2","div"]

def getChildren(tag):

text = []

while (tag):

tag = tag.findNext(search_tags, recursive=False)

if (tag is None):

break

elif (tag.name == "div") or (tag.name == "h2"):

break

else:

text.append(tag.text.strip())

return "".join(text)

for i in h2_list:

result.append({

"name": i.text.strip(),

"children": getChildren(i)

})

print(json.dumps(result, indent=4, sort_keys=True))

反對回復 2023-10-04

1 回答
0 關注
186 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

使用 beautiful-soup 提取特定標簽的元素

使用 beautiful-soup 提取特定標簽的元素

1 回答

添加回答