亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

為了賬號安全,請及時綁定郵箱和手機立即綁定
已解決430363個問題,去搜搜看,總會有你想問的

為什么即使請求數只有 1,我也會在 scrapy 響應中收到 429 個請求?

為什么即使請求數只有 1,我也會在 scrapy 響應中收到 429 個請求?

暮色呼如 2022-12-27 16:49:24
我正在使用scrapy抓取網站,但收到 429 響應。下面是它的輸出日志:2020-06-06 21:39:45 [scrapy.core.engine] INFO: Spider openedINFO:scrapy.extensions.logstats:Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)2020-06-06 21:39:45 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)INFO:scrapy.extensions.telnet:Telnet console listening on 127.0.0.1:60232020-06-06 21:39:45 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023DEBUG:scrapy.core.engine:Crawled (429) <GET https://www.realestate.com.au/rent/in-aspendale+gardens,+vic+3195/list-1> (referer: None)2020-06-06 21:39:46 [scrapy.core.engine] DEBUG: Crawled (429) <GET https://www.realestate.com.au/rent/in-aspendale+gardens,+vic+3195/list-1> (referer: None)INFO:scrapy.spidermiddlewares.httperror:Ignoring response <429 https://www.realestate.com.au/rent/in-aspendale+gardens,+vic+3195/list-1>: HTTP status code is not handled or not allowed2020-06-06 21:39:46 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <429 https://www.realestate.com.au/rent/in-aspendale+gardens,+vic+3195/list-1>: HTTP status code is not handled or not allowedINFO:scrapy.core.engine:Closing spider (finished)2020-06-06 21:39:46 [scrapy.core.engine] INFO: Closing spider (finished)INFO:scrapy.statscollectors:Dumping Scrapy stats:{'downloader/request_bytes': 343, 'downloader/request_count': 1, 'downloader/request_method_count/GET': 1, 'downloader/response_bytes': 2030, 'downloader/response_count': 1, 'downloader/response_status_count/429': 1, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2020, 6, 6, 11, 39, 46, 255540), 'httperror/response_ignored_count': 1, 'httperror/response_ignored_status_count/429': 1, 'log_count/DEBUG': 1, 'log_count/INFO': 10, 'memusage/max': 50941952, 'memusage/startup': 50941952, 'response_received_count': 1, 'scheduler/dequeued': 1,你可以看到downloader/request_count只有 1。
查看完整描述

1 回答

?
斯蒂芬大帝

TA貢獻1827條經驗 獲得超8個贊

狀態代碼429表示連接過多。下載器上的請求計數為 1,因為 429 表示拒絕并且不會通過下載器。他們錯誤地向他們認為是機器人的任何請求提供 429 代碼。


經過實驗后,由于缺少 cookie 標頭,它拒絕了我,該 cookie 標頭是在 set-cookie 標頭的初始 GET 請求中設置的。這里有一些嘗試將 Selenium 作為任何抓取項目中的最后一個選項。


嘗試使用像下面這樣的完整標題和COOKIES_ENABLED = True.

Host: www.realestate.com.au

User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8

Accept-Language: en-US,en;q=0.5

Accept-Encoding: gzip, deflate, br

Referer: https://duckduckgo.com/

Connection: keep-alive

Upgrade-Insecure-Requests: 1

Pragma: no-cache

Cache-Control: no-cache

TE: Trailers


查看完整回答
反對 回復 2022-12-27
  • 1 回答
  • 0 關注
  • 97 瀏覽
慕課專欄
更多

添加回答

舉報

0/150
提交
取消
微信客服

購課補貼
聯系客服咨詢優惠詳情

幫助反饋 APP下載

慕課網APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網微信公眾號