已解決430363個問題，去搜搜看，總會有你想問的

無法從站點上刮掉表格

首頁猿問無法從站點上刮掉表格

無法從站點上刮掉表格

Python

守著一只汪 2024-01-04 16:48:49

我正在嘗試抓取此網站上的排名表：https ://www.timeshighereducation.com/world-university-rankings/2021/world-ranking#!/page/0/length/25/sort_by/scores_overall/sort_order /asc/cols/分數但我無法獲取數據，現在我有這個代碼：import scrapyfrom scrapy import Selectorfrom selenium import webdriverfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as ECfrom logzero import logfile, loggerclass ScrapeTableSpider(scrapy.Spider): name = "scrape-table" allowed_domains = ["toscrape.com"] start_urls = ['http://quotes.toscrape.com'] def start_requests(self): # headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0'} for url in self.start_urls: yield scrapy.Request(url=url, callback=self.parse) def parse(self, response): # driver = webdriver.Chrome() options = webdriver.ChromeOptions() options.add_argument("headless") desired_capabilities = options.to_capabilities() driver = webdriver.Chrome('C:/chromedriver', desired_capabilities=desired_capabilities) driver.get("https://www.timeshighereducation.com/world-university-rankings/2021/world-ranking#!/page/0/length/25/sort_by/scores_overall/sort_order/asc/cols/scores") driver.implicitly_wait(2) for table in driver.find_element_by_xpath('//*[contains(@id,"datatable-1")]//tr'): data = [item.text for item in table.find_elements_by_xpath(".//*[self::td or self::th]")] print(data)任何有關如何從表中獲取數據的見解將不勝感激。

查看完整描述

2 回答

繁花不似錦

TA貢獻1851條經驗獲得超4個贊

我真的不明白為什么你同時使用 scrapy 和 selenium，但我們可以說只是使用 selenium。要從表中獲取文本，您可以執行以下非常簡單的操作：

from selenium import webdriver

options = webdriver.ChromeOptions()

options.add_argument("headless")

desired_capabilities = options.to_capabilities()

driver = webdriver.Chrome('C:/chromedriver', desired_capabilities=desired_capabilities)

driver.get("https://www.timeshighereducation.com/world-university-rankings/2021/world-ranking#!/page/0/length/25/sort_by/scores_overall/sort_order/asc/cols/scores")

driver.implicitly_wait(1)

table = driver.find_element_by_xpath('//*[@id="datatable-1"]')

print(table.text)

現在，如果您將表中的所有內容分開，只需使用該find_element_by_xxx函數并通過 xpath 選擇其他部分即可。

反對回復 2024-01-04

慕慕森

TA貢獻1856條經驗獲得超17個贊

如果您需要迭代結果，您應該選擇 elements 而不是 element。更改您的代碼：

 for table in driver.find_element_by_xpath('//*[contains(@id,"datatable-1")]//tr'):

編碼：

for table in driver.find_elements_by_xpath('//*[contains(@id,"datatable-1")]//tr'):

反對回復 2024-01-04

2 回答
0 關注
205 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

無法從站點上刮掉表格

無法從站點上刮掉表格

2 回答

添加回答