已解決430363個問題，去搜搜看，總會有你想問的

從 urlReq(url) 中刪除 'urllib.error.HTTPError:

首頁猿問從 urlReq(url) 中刪除...

從 urlReq(url) 中刪除 'urllib.error.HTTPError:

Python

aluckdog 2021-12-21 16:48:06

嘿伙計們怎么了？:)我正在嘗試使用一些 url 參數來抓取網站。如果我使用為url1，url2 URL3它WORKS得當，它打印我的常規輸出我想要（HTML） - >import bs4from urllib.request import urlopen as urlReqfrom bs4 import BeautifulSoup as soup# create urlsurl1 = 'https://en.titolo.ch/sale'url2 = 'https://en.titolo.ch/sale?limit=108'url3 = 'https://en.titolo.ch/sale?category_styles=29838_21212'url4 = 'https://en.titolo.ch/sale?category_styles=31066&limit=108'# opening up connection on each url, grabbing the pageuClient = urlReq(url4)page_html = uClient.read()uClient.close()# parsing the downloaded htmlpage_soup = soup(page_html, "html.parser")# print the htmlprint(page_soup.body.prettify())-> 但是當我嘗試“url4”時， url4 = 'https://en.titolo.ch/sale?category_styles=31066&limit=108'它給了我下面的錯誤。我究竟做錯了什么？- 也許它與餅干有關？-> 但是為什么它對其他 url 有效...- 也許他們只是阻止了抓取嘗試？- 如何在 URL 中使用多個參數來避免此錯誤？urllib.error.HTTPError: HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop.The last 30x error message was:Moved Temporarily我在這里先向您的幫助表示感謝！干杯艾倫我已經嘗試過的：我嘗試了請求庫import requestsurl = 'https://en.titolo.ch/sale?category_styles=31066&limit=108'r = requests.get(url)html = r.textprint(html)<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"><html><head><title>403 Forbidden</title></head><body><h1>Forbidden</h1><p>You don't have permission to access /saleon this server.</p></body></html>[Finished in 0.375s]

查看完整描述

1 回答

繁花不似錦

TA貢獻1851條經驗獲得超4個贊

如果使用requestspackage 并在標頭中添加用戶代理，則看起來它會收到200所有 4 個鏈接的響應。所以嘗試添加用戶代理標頭：

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'}

import requests

from bs4 import BeautifulSoup as soup

# create urls

url1 = 'https://en.titolo.ch/sale'

url2 = 'https://en.titolo.ch/sale?limit=108'

url3 = 'https://en.titolo.ch/sale?category_styles=29838_21212'

url4 = 'https://en.titolo.ch/sale?category_styles=31066&limit=108'

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'}

url_list = [url1, url2, url3, url4]

for url in url_list:

# opening up connection on each url, grabbing the page

response = requests.get(url, headers=headers)

print (response.status_code)

輸出：

200

所以：

import requests

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'}

url = 'https://en.titolo.ch/sale?category_styles=31066&limit=108'

r = requests.get(url, headers=headers)

html = r.text

print(html)

反對回復 2021-12-21

1 回答
0 關注
226 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

從 urlReq(url) 中刪除 'urllib.error.HTTPError:

從 urlReq(url) 中刪除 'urllib.error.HTTPError:

1 回答

添加回答