1 回答

TA貢獻1936條經驗 獲得超7個贊
我能夠find_previous在每個表上使用該方法來查找您提供的示例 html 的前一個標題。idx在檢查標題是否屬于該表時,我為每個表添加了一個附加屬性。我還在 html 的開頭和結尾添加了兩個沒有以前標題的表格。
html = '''
<table class='striped'></table>
<h3>
<span></span>
<span class='headline'>Headline #1</span>
</h3>
<table class='striped'></table>
<h4>
<span class='headline'>Headline #2</span>
</h4>
<table class='striped'></table>
<p>
<span class='headline'>Headline #3</span>
</p>
<ul></ul>
<center>
<table class='striped'></table>
</center>
<table class='striped'></table>
</div>
'''.replace('\n', '')
soup = BeautifulSoup(html, 'lxml')
table_query = ('table', {'class': 'striped'})
headline_query = ('span', {'class': 'headline'})
for idx, table in enumerate(soup.find_all(*table_query)):
table.attrs['idx'] = idx
previous_headline = table.find_previous(*headline_query)
if (previous_headline and
previous_headline.find_next(*table_query).attrs['idx'] == idx):
print(previous_headline.text)
else:
print('No headline found.')
輸出:
No headline found.
Headline #1
Headline #2
Headline #3
No headline found.
添加回答
舉報