慕的地6264312
2021-07-05 05:11:27
我正在嘗試使用 Python 中的 BeautifulSoup 抓取各種網站。假設我有以下html摘錄:<div class="member_biography"><h3>Biography</h3><span class="sub_heading">District:</span> AnyState - At Large<br/><span class="sub_heading">Political Highlights:</span> AnyTown City Council, 19XX-XX<br/><span class="sub_heading">Born:</span> June X, 19XX; AnyTown, Calif.<br/><span class="sub_heading">Residence:</span> Some Town<br/><span class="sub_heading">Religion:</span> Episcopalian<br/><span class="sub_heading">Family:</span> Wife, Some Name; two children<br/><span class="sub_heading">Education:</span> Some State College, A.A. 19XX; Some Other State College, B.A. 19XX<br/><span class="sub_heading">Elected:</span> 19XX<br/></div>我需要結果采用以下格式:District: AnyState - At LargePolitical Highlights: AnyTown City Council, 19XX-XXBorn: June X, 19XX; AnyTown, Calif.Residence: Some TownReligion: EpiscopalianFamily: Wife, Some Name; two childrenEducation: Some State College, A.A. 19XX; Some Other State College, B.A. 19XXElected: 19XX但是,到目前為止,我只能實現以下目標:District:Political Highlights:Born:Residence:Religion:Family:Education:Elected:使用以下代碼:import urllib.requestimport sysfrom bs4 import BeautifulSoupdef main(url): fp = urllib.request.urlopen(url) site_bytearray = fp.read() fp.close() #bs_data = BeautifulSoup(site_str,features="html.parser") bs_data = BeautifulSoup(site_bytearray,'lxml') tmplist = bs_data.find_all('span',{'class':'sub_heading'}) for item in tmplist: print(item.text) sys.exit(0)if __name__ == "__main__": main(sys.argv[1])總之,我如何提取District和AnyState - At Large從<span class="sub_heading">District:</span> AnyState - At Large<br/>在作進一步處理列表積累的結果?
2 回答
慕桂英546537
TA貢獻1848條經驗 獲得超10個贊
將您的打印命令替換為:
Python 3.6+:
print(f'{item.text:<25} {item.next_sibling}')
Python 3 - 3.5:
print('{:<25} {}'.format(item.text, item.next_sibling))
輸出:
District: AnyState - At Large
Political Highlights: AnyTown City Council, 19XX-XX
Born: June X, 19XX; AnyTown, Calif.
Residence: Some Town
Religion: Episcopalian
Family: Wife, Some Name; two children
Education: Some State College, A.A. 19XX; Some Other State College, B.A. 19XX
Elected: 19XX
添加回答
舉報
0/150
提交
取消
