我有一個包含某些標記HTML文件,我需要ID號碼的格式添加到每個標簽id="rule_1",id="rule_1.1",id="rule_1.2",id="rule_1.2.1",等。例如,當前的HTML是:<div style="styles"> <p class="classname">TEXT</p> <p class="classname">TEXT</p> <ul style="styles"> <li> <p class="classname">TEXT</p> </li> <li> <p class="classname">TEXT</p> </li> </ul></div>我需要該HTML看起來像這樣:<div style="styles" id="rule_1"> <p class="classname" id="rule_1.1">TEXT</p> <p class="classname" id="rule_1.2">TEXT</p> <ul style="styles" id="rule_1.3"> <li id="rule_1.3.1"> <p class="classname" id="rule_1.3.1.1">TEXT</p> </li> <li id="rule_1.3.2"> <p class="classname" id="rule_1.3.2.1">TEXT</p> </li> </ul></div>我可以手動編寫這些內容,但我希望使用現有的HTML解析器庫。是否可以使用BeautifulSoup或其他模塊?我嘗試過這樣的事情:from bs4 import BeautifulSoup as html_parserwith open('outputs/HTML/{}.html'.format(deal), 'r') as read_file: html_source = read_file.read()soup = html_parser(html_source, 'html.parser')html_tags = soup.find_all(['div', 'p', 'span', 'ul', 'li'])for each_tag in html_tags: each_tag.attrs['id'] = html_tags.index(each_tag)with open('outputs/HTML/{}-id.html'.format(deal), 'w') as save_file: save_file.write(str(soup))但這只是添加了id="1",id="2"等等。我怎么可以把它像交錯1,1.1,1.1.1,等?
1 回答

夢里花落0921
TA貢獻1772條經驗 獲得超6個贊
沒關系,想通了:
curr_tags = {}
for each_tag in html_tags:
if html_tags.index(each_tag) == 0:
each_tag.attrs['id'] = 'rule_1'
else:
parent_id = each_tag.parent.attrs['id']
if parent_id in curr_tags.keys():
curr_tags[parent_id] += 1
else:
curr_tags[parent_id] = 1
each_tag.attrs['id'] = parent_id + '.{0}'.format(curr_tags[parent_id])
添加回答
舉報
0/150
提交
取消