Input:<p><milestone n="14" unit="verse" /> The name of the third river is<placeName key="tgn,1130850" authname="tgn,1130850">Hiddekel</placeName>: this is the one which flows in front of Assyria. The fourthriver is the <placeName key="tgn,1123842" authname="tgn,1123842">Euphrates</placeName>. </p>期望的輸出:<p><milestone n="14" unit="verse" /> The name of the third river is Hiddekel: this is the one which flows in front of Assyria. The fourth river is the Euphrates. </p>您好,我想找到一種方法來從子元素 ( placeName) 中提取文本并將其放回較大的文本正文中。我在 XML 文件的其他地方也有類似的問題,例如人名。我希望能夠在不擺脫里程碑的情況下提取名稱和地點。謝謝您的幫助!當前代碼:for p in chapter.findall('p'): i = 1 for text in p.itertext(): file.write(body.attrib["n"] + " " + chapter.attrib["n"] + ":" + str(i) + text) i = i + 1
1 回答

繁華開滿天機
TA貢獻1816條經驗 獲得超4個贊
可以用beautifulsoup和unwrap()方法來完成:
from bs4 import BeautifulSoup as bs
snippet = """your html above"""
soup = bs(snippet,'lxml')
pl = soup.find_all('placename')
for p in pl:
p.unwrap()
soup
輸出:
<html><body><p>
<milestone n="14" unit="verse"></milestone>
The name of the third river is
Hiddekel: this is the one which flows in front of Assyria. The fourth
river is the Euphrates.
</p>
</body></html>
添加回答
舉報
0/150
提交
取消