3 回答

TA貢獻1816條經驗 獲得超6個贊
利用 .findAll
前任:
from bs4 import BeautifulSoup
html = """<div id="previewImages">
<div class="thumb"> <a><img src="https://example.com/s/15.jpg" image-file="/image/15.jpg" /></a> </div>
<div class="thumb"> <a><img src="https://example.com/s/2.jpg" image-file="/image/2.jpg" /> </a> </div>
<div class="thumb"> <a><img src="https://example.com/s/0.jpg" image-file="/image/0.jpg" /> </a> </div>
<div class="thumb"> <a><img src="https://example.com/s/3.jpg" image-file="/image/3.jpg" /> </a> </div>
<div class="thumb"> <a><img src="https://example.com/s/4.jpg" image-file="/image/4.jpg" /> </a> </div>
</div>"""
soup = BeautifulSoup(html, "html.parser")
images_box = soup.find('div', attrs={'id': 'previewImages'})
for link in images_box.findAll("img"):
print link.get('image-file')
輸出:
/image/15.jpg
/image/2.jpg
/image/0.jpg
/image/3.jpg
/image/4.jpg

TA貢獻1815條經驗 獲得超13個贊
我認為將 id 與傳遞給的屬性選擇器一起使用會更快 select
from bs4 import BeautifulSoup as bs
html = '''
<div id="previewImages">
<div class="thumb"> <a><img src="https://example.com/s/15.jpg" image-file="/image/15.jpg" /></a> </div>
<div class="thumb"> <a><img src="https://example.com/s/2.jpg" image-file="/image/2.jpg" /> </a> </div>
<div class="thumb"> <a><img src="https://example.com/s/0.jpg" image-file="/image/0.jpg" /> </a> </div>
<div class="thumb"> <a><img src="https://example.com/s/3.jpg" image-file="/image/3.jpg" /> </a> </div>
<div class="thumb"> <a><img src="https://example.com/s/4.jpg" image-file="/image/4.jpg" /> </a> </div>
</div>
'''
soup = bs(html, 'lxml')
links = [item['image-file'] for item in soup.select('#previewImages [image-file]')]
print(links)

TA貢獻1757條經驗 獲得超8個贊
如果我們對 lxml 執行相同的場景,則加起來,
import lxml.html
tree = lxml.html.fromstring(sample)
images = tree.xpath("//img/@image-file")
print(images)
輸出 ['/image/15.jpg', '/image/2.jpg', '/image/0.jpg', '/image/3.jpg', '/image/4.jpg']
添加回答
舉報