2 回答

TA貢獻2021條經驗 獲得超8個贊
這是可行的——不是很優雅,但是有效。我擴展了您的示例 html,引入了一些更多有問題的節點:
test = """
<!DOCTYPE html>
<html>
<head>
<title></title>
</head>
<body>
<table>
<tr>
<th>test</th>
<td>abc</td>
</tr>
<tr>
<th>test1</th>
<td>abc</td>
<td>abc</td>
</tr>
<tr>
<th>test2</th>
<td>abc</td>
</tr>
<tr>
<a>test3</a>
<td>abcd</td>
</tr>
<tr>
<td>test4</td>
<td>abcd</td>
</tr>
</table>
</body> """
import lxml.html
doc = lxml.html.fromstring(test)
good_tags = ['th','td']
targs = doc.xpath('//tr')
for targ in targs:
tr = targ.xpath('.//*')
if len(tr)==2 and (tr[0].tag != tr[1].tag) and tr[0].tag in good_tags and tr[1].tag in good_tags:
print(lxml.html.tostring(targ).decode())
輸出:
<tr>
<th>test</th>
<td>abc</td>
</tr>
<tr>
<th>test2</th>
<td>abc</td>
</tr>
- 2 回答
- 0 關注
- 150 瀏覽
添加回答
舉報