已解決430363個問題，去搜搜看，總會有你想問的

使用 Python 解析 XML：將文本保留在屬性內，同時刪除其周圍的標簽

首頁猿問使用 Python 解析...

使用 Python 解析 XML：將文本保留在屬性內，同時刪除其周圍的標簽

Python

瀟瀟雨雨 2022-07-05 19:45:21

Input:<milestone n="14" unit="verse" /> The name of the third river is<placeName key="tgn,1130850" authname="tgn,1130850">Hiddekel</placeName>: this is the one which flows in front of Assyria. The fourthriver is the <placeName key="tgn,1123842" authname="tgn,1123842">Euphrates</placeName>. 期望的輸出：<milestone n="14" unit="verse" /> The name of the third river is Hiddekel: this is the one which flows in front of Assyria. The fourth river is the Euphrates. 您好，我想找到一種方法來從子元素 ( placeName) 中提取文本并將其放回較大的文本正文中。我在 XML 文件的其他地方也有類似的問題，例如人名。我希望能夠在不擺脫里程碑的情況下提取名稱和地點。謝謝您的幫助！當前代碼：for p in chapter.findall('p'): i = 1 for text in p.itertext(): file.write(body.attrib["n"] + " " + chapter.attrib["n"] + ":" + str(i) + text) i = i + 1

查看完整描述

1 回答

繁華開滿天機

TA貢獻1816條經驗獲得超4個贊

可以用beautifulsoup和unwrap()方法來完成：

from bs4 import BeautifulSoup as bs

snippet = """your html above"""

soup = bs(snippet,'lxml')

pl = soup.find_all('placename')

for p in pl:

p.unwrap()

soup

輸出：

The name of the third river is

Hiddekel: this is the one which flows in front of Assyria. The fourth

river is the Euphrates.

</body></html>

反對回復 2022-07-05

1 回答
0 關注
176 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

使用 Python 解析 XML：將文本保留在屬性內，同時刪除其周圍的標簽

使用 Python 解析 XML：將文本保留在屬性內，同時刪除其周圍的標簽

1 回答

添加回答

使用 Python 解析 XML：將文本保留在屬性內，同時刪除其周圍的標簽