3 回答

TA貢獻1848條經驗 獲得超10個贊
以下
import requests
from bs4 import BeautifulSoup
response = source = requests.get('https://occ.ca/our-publications/', headers={'User-Agent': 'Mozilla'})
if response.status_code == 200:
soup = BeautifulSoup(response.text, 'html')
pdfs = soup.findAll('div', {"class": "publicationoverlay"})
links = [pdf.find('a').attrs['href'] for pdf in pdfs]
print(links)
輸出
['https://occ.ca/wp-content/uploads/The-Great-Mosaic-Reviving-Ontarios-Regional-Economies.pdf', 'https://occ.ca/wp-content/uploads/OCC-Letter-in-support-of-the-OPG-Pickering-Nuclear-Nomination.pdf', 'https://occ.ca/wp-content/uploads/OCC-Beverage-Alcohol-Report.pdf', 'https://occ.ca/wp-content/uploads/Industrial-Electricity-Rates.pdf', 'https://occ.ca/wp-content/uploads/OCC-Letter_Strategic-Approach-to-Alcohol-Sales.pdf', 'https://occ.ca/wp-content/uploads/OCC-Submission-Modernizing-Ontarios-Environmental-Assessment-Program.pdf', 'https://occ.ca/wp-content/uploads/OCC-Letter-on-Ticket-Sales-Act.pdf', 'https://occ.ca/wp-content/uploads/2018-2019-Policy-Report-Card.pdf', 'https://occ.ca/wp-content/uploads/Letter-on-Right-to-Repair-May-1.pdf', 'https://occ.ca/wp-content/uploads/Federal-Carbon-Tax-Transparency-Act-2019-OCC.pdf', 'https://occ.ca/wp-content/uploads/Waste-and-Litter-Submission-_-Final.pdf', 'https://occ.ca/wp-content/uploads/Supporting-Ontarios-Budding-Cannabis-Industry.pdf']

TA貢獻1827條經驗 獲得超9個贊
該頁面返回 403(禁止請求)和一些錯誤頁面。如果您添加用戶代理標頭,它會返回 200(OK)以及您需要的頁面:
requests.get(url, headers={'User-Agent': 'Mozilla'})

TA貢獻1859條經驗 獲得超6個贊
那是因為在您的原始請求中,您收到了 403 禁止請求。默認情況下,Python 請求會添加如下標頭:
{
'User-Agent': 'python-requests/2.21.0',
'Accept-Encoding': 'gzip, deflate',
'Accept': '*/*',
'Connection': 'keep-alive',
'Content-Length': '40',
'Content-Type': 'application/json'
}
某些網站會阻止此類標頭。所以你得到一個 403 HTTP 錯誤。
source=requests.get(url, headers={'User-Agent': 'Mozilla'})
添加這將解決該問題,您將獲得所需的內容。
添加回答
舉報