亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

為了賬號安全,請及時綁定郵箱和手機立即綁定
已解決430363個問題,去搜搜看,總會有你想問的

從 urlReq(url) 中刪除 'urllib.error.HTTPError:

從 urlReq(url) 中刪除 'urllib.error.HTTPError:

aluckdog 2021-12-21 16:48:06
嘿伙計們怎么了?:)我正在嘗試使用一些 url 參數來抓取網站。如果我使用為url1,url2 URL3它WORKS得當,它打印我的常規輸出我想要(HTML) - >import bs4from urllib.request import urlopen as urlReqfrom bs4 import BeautifulSoup as soup# create urlsurl1 = 'https://en.titolo.ch/sale'url2 = 'https://en.titolo.ch/sale?limit=108'url3 = 'https://en.titolo.ch/sale?category_styles=29838_21212'url4 = 'https://en.titolo.ch/sale?category_styles=31066&limit=108'# opening up connection on each url, grabbing the pageuClient = urlReq(url4)page_html = uClient.read()uClient.close()# parsing the downloaded htmlpage_soup = soup(page_html, "html.parser")# print the htmlprint(page_soup.body.prettify())-> 但是當我嘗試“url4”時, url4 = 'https://en.titolo.ch/sale?category_styles=31066&limit=108'它給了我下面的錯誤。我究竟做錯了什么?- 也許它與餅干有關?-> 但是為什么它對其他 url 有效...- 也許他們只是阻止了抓取嘗試?- 如何在 URL 中使用多個參數來避免此錯誤?urllib.error.HTTPError: HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop.The last 30x error message was:Moved Temporarily我在這里先向您的幫助表示感謝!干杯艾倫我已經嘗試過的:我嘗試了請求庫import requestsurl = 'https://en.titolo.ch/sale?category_styles=31066&limit=108'r = requests.get(url)html = r.textprint(html)<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"><html><head><title>403 Forbidden</title></head><body><h1>Forbidden</h1><p>You don't have permission to access /saleon this server.</p></body></html>[Finished in 0.375s]
查看完整描述

1 回答

?
繁花不似錦

TA貢獻1851條經驗 獲得超4個贊

如果使用requestspackage 并在標頭中添加用戶代理,則看起來它會收到200所有 4 個鏈接的響應。所以嘗試添加用戶代理標頭:


headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'}


import requests

from bs4 import BeautifulSoup as soup


# create urls

url1 = 'https://en.titolo.ch/sale'

url2 = 'https://en.titolo.ch/sale?limit=108'

url3 = 'https://en.titolo.ch/sale?category_styles=29838_21212'

url4 = 'https://en.titolo.ch/sale?category_styles=31066&limit=108'


headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'}


url_list = [url1, url2, url3, url4]


for url in url_list:

# opening up connection on each url, grabbing the page

    response = requests.get(url, headers=headers)

    print (response.status_code)

輸出:


200

200

200

200

所以:


import requests


headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'}


url = 'https://en.titolo.ch/sale?category_styles=31066&limit=108'


r = requests.get(url, headers=headers)

html = r.text

print(html)


查看完整回答
反對 回復 2021-12-21
  • 1 回答
  • 0 關注
  • 226 瀏覽
慕課專欄
更多

添加回答

舉報

0/150
提交
取消
微信客服

購課補貼
聯系客服咨詢優惠詳情

幫助反饋 APP下載

慕課網APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網微信公眾號