慕尼黑5688855
2023-04-11 15:32:01
我正在嘗試提取從urldata-src-mp3生成的屬性的所有值(它們是鏈接)。content1該鏈接包含在<a class="hwd_sound sound audio_play_button icon-volume-up ptr" title="Pronunciation for " data-src-mp3="https://www.collinsdictionary.com/sounds/hwd_sounds/EN-GB-W0037420.mp3" data-lang="en_GB"></a>.一種方法是使用正則表達式'data-src-mp3="(.*?)"'import requestssession = requests.Session()from bs4 import BeautifulSoupimport reheaders = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'}url = 'https://www.collinsdictionary.com/dictionary/english-french/graduate'r = session.get(url, headers = headers) soup = BeautifulSoup(r.content, 'html.parser')content1 = soup.select_one('.cB.cB-def.dictionary.biling').contentsoutput = re.findall('data-src-mp3="(.*?)"', str(content1))print(output)結果是['https://www.collinsdictionary.com/sounds/hwd_sounds/EN-GB-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/FR-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/FR-W0071410.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/fr_bachelier.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/63854.mp3']我想問一下如何使用BeautifulSoup和結構<a class="hwd_sound sound audio_play_button icon-volume-up ptr" title="Pronunciation for " data-src-mp3="https://www.collinsdictionary.com/sounds/hwd_sounds/EN-GB-W0037420.mp3" data-lang="en_GB"></a>來獲得相同的結果而無需循環。太感謝了!
1 回答

BIG陽
TA貢獻1859條經驗 獲得超6個贊
您可以在使用時組合選擇器.select:
mp3s = [tag.attrs['data-src-mp3'] for tag in soup.select('.cB.cB-def.dictionary.biling [data-src-mp3]')]
或者
mp3s = list(map(lambda tag: tag.attrs['data-src-mp3'],
soup.select('.cB.cB-def.dictionary.biling [data-src-mp3]')))
[data-src-mp3]僅選擇具有data-src-mp3屬性(具有任何值)的元素。
'data-src-mp3'在一個地方做一個小改動:
mp3_tag = 'data-src-mp3'
mp3s = list(map(lambda tag: tag.attrs[mp3_tag],
soup.select('.cB.cB-def.dictionary.biling [{}]'.format(mp3_tag))))
這個解決方案乍一看可能更嚇人,但比依賴錯誤的工具(例如解析 HTML 時的正則表達式)要好得多。
添加回答
舉報
0/150
提交
取消