首頁猿問 Python。獲取沒有空格的內部 XML

Python。獲取沒有空格的內部 XML

Python

九州編程 2023-06-06 15:59:15

我有一個這樣的 XML 文件：<?xml version="1.0" encoding="UTF-8"?><data> <head> <version>1.0</version> <project>hello, world</project> <date>2020-08-15</date> </head> <file name="helloworld.py"/> <file name="helloworld.ps1"/> <file name="helloworld.bat"/></data>我需要在元素之間沒有空格的 head 元素中獲取數據，如下所示：<version>1.0</version><project>hello, world</project><date>2020-08-15</date>然后散列它?，F在，我必須進行一些字符串操作才能將其合并為一行：root = ET.parse('myfile.xml').getroot()header = ET.tostring(root[0]).decode('utf-8')import reheader = re.sub('\n','',header)header = re.sub('>\s+<','><',header)header = header.replace('<head>','')header = header.replace('</head>','')header = header.strip()有沒有更簡單的方法來做到這一點？Powershell XML 對象有一個簡單的 InnerXML 屬性，它為您提供一個元素中沒有空格的 XML 作為字符串。Python 是否有一種方法可以使這更容易？

查看完整描述

3 回答

精慕HU

TA貢獻1845條經驗獲得超8個贊

下面（不使用任何外部庫 - 只是核心 python）

import xml.etree.ElementTree as ET

root = ET.parse('input.xml')

head = root.find('.//head')

combined = ''.join(['<{}>{}</{}>'.format(e.tag,e.text,e.tag) for e in list(head)])

print(combined)

輸入.xml

<?xml version="1.0" encoding="UTF-8"?>

<data>

<head>

<project>hello, world</project>

</head>

</data>

輸出

<version>1.0</version><project>hello, world</project><date>2020-08-15</date>

反對回復 2023-06-06

開滿天機

TA貢獻1786條經驗獲得超13個贊

如果您可以使用外部庫，BeautifulSoup 在這方面做得很好。

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#making-the-soup

這是您的文檔的示例。

from bs4 import BeautifulSoup as bs

xml_doc = """<?xml version="1.0" encoding="UTF-8"?>

<data>

<head>

<project>hello, world</project>

</head>

</data>"""

page_soup = bs(xml_doc)

page_soup.head.getText()

page_soup.head.getText().strip().replace('\n','').replace(' ','')

這將返回 head 標簽的子標簽的內容，并去除換行符和空格。

反對回復 2023-06-06

紅糖糍粑

TA貢獻1815條經驗獲得超6個贊

每種方法都可能有問題。有的方法還會刪除有用的空格，有的方法在節點有屬性的時候就麻煩了。所以我會給你第三種方法。這也可能是一種不完美的方法:)

from simplified_scrapy import SimplifiedDoc,utils

# xml_doc = utils.getFileContent('myfile.xml')

xml_doc = """<?xml version="1.0" encoding="UTF-8"?>

<data>

<head>

<project>hello, world</project>

</head>

</data>"""

doc = SimplifiedDoc(xml_doc)

headXml = doc.head.html.strip() # Get internal data of head

print (doc.replaceReg(headXml,'>[\s]+<','><')) # Replace newlines and spaces with regex

結果：

<version>1.0</version><project>hello, world</project><date>2020-08-15</date>

反對回復 2023-06-06

3 回答
0 關注
189 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

Python。獲取沒有空格的內部 XML

Python。獲取沒有空格的內部 XML

3 回答

添加回答

Python。獲取沒有空格的內部 XML

Python。獲取沒有空格的內部 XML