已解決430363個問題，去搜搜看，總會有你想問的

使用 beautifulsoup 在不同的選項卡中打開產品頁面以獲取亞馬遜中輸入的搜索結果

首頁猿問使用 beautifulsoup...

使用 beautifulsoup 在不同的選項卡中打開產品頁面以獲取亞馬遜中輸入的搜索結果

Python

Cats萌萌 2023-05-16 14:55:38

我對 python 很陌生，對網絡抓取也很陌生——目前正在閱讀 Al Sweigart 的書《使用 Python 自動化無聊的東西》，并且有一個建議的練習作業，基本上是制作一個程序來執行此操作：接受產品輸入以在亞馬遜中搜索使用 requests.get() 和 .text() 獲取該搜索頁面的 html使用 beautifulsoup 在 html 中搜索表示產品頁面鏈接的 css 選擇器在單獨的選項卡中，打開搜索結果前五名產品的選項卡這是我的代碼：#! python3# Searches amazon for the inputted product (either through command line or input) and opens 5 tabs with the top # items for that search. import requests, sys, bs4, webbrowser if len(sys.argv) > 1: # if there are system arguments res = requests.get('https://www.amazon.com/s?k=' + ''.join(sys.argv)) res.raise_for_status else: # take input print('what product would you like to search Amazon for?') product = str(input()) res = requests.get('https://www.amazon.com/s?k=' + ''.join(product)) res.raise_for_status # retrieve top search links: soup = bs4.BeautifulSoup(res.text, 'html.parser') print(res.text) # TO CHECK HTML OF SITE, GET RID OF DURING ACTUAL PROGRAM # open a new tab for the top 5 items, and get the css selector for links # a list of all things on the downloaded page that are within the css selector 'a-link-normal a-text-normal' linkElems = soup.select('a-link-normal a-text-normal') numOpen = min(5, len(linkElems)) for i in range(numOpen): urlToOpen = 'https://www.amazon.com/' + linkElems[i].get('href') print('Opening', urlToOpen) webbrowser.open(urlToOpen)我想我已經選擇了正確的 css 選擇器（“a-link-normal a-text-normal”），所以我認為問題在于 res.text() - 當我打印以查看它的外觀時，當我在 chrome 中使用檢查元素查看同一站點時，html 內容似乎不完整，或者包含實際 html 的內容。此外，這些 html 都不包含任何內容，例如“a-link-normal a-text-normal”。

查看完整描述

1 回答

慕后森

TA貢獻1802條經驗獲得超5個贊

這是一個經典案例，如果您嘗試使用像 BeautifulSoup 這樣的爬蟲直接抓取網站，您將找不到任何東西。

該網站的工作方式是，首先將初始代碼塊下載到您的瀏覽器，就像您添加的一樣，big pencil然后通過 Javascript，加載頁面上的其余元素。

您需要先使用Selenium Webdriver加載頁面，然后從瀏覽器中獲取代碼。在正常意義上，這相當于您打開瀏覽器的控制臺，轉到“元素”選項卡并查找您提到的類。

要查看差異，我建議您查看頁面的源代碼并與“元素”選項卡中的代碼進行比較

在這里，您需要使用 BS4 獲取加載到瀏覽器的數據

from selenium import webdriver

browser = webdriver.Chrome("path_to_chromedriver") # This is the Chromedriver which will open up a new instance of a browser for you. More info in the docs

browser.get(url) # Fetch the URL on the browser

soup = bs4.BeautifulSoup(browser.page_source, 'html.parser') # Now load it to BS4 and go on with extracting the elements and so on

這是了解 Selenium 的非?；镜拇a，但是，在生產用例中，您可能需要使用像PhantomJS這樣的無頭瀏覽器

反對回復 2023-05-16

1 回答
0 關注
199 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

使用 beautifulsoup 在不同的選項卡中打開產品頁面以獲取亞馬遜中輸入的搜索結果

使用 beautifulsoup 在不同的選項卡中打開產品頁面以獲取亞馬遜中輸入的搜索結果

1 回答

添加回答