亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

為了賬號安全,請及時綁定郵箱和手機立即綁定
已解決430363個問題,去搜搜看,總會有你想問的

使用 BeautifulSoup 進行網頁抓取索引錯誤

使用 BeautifulSoup 進行網頁抓取索引錯誤

四季花海 2023-07-05 10:03:47
使用 BeautifulSoup 獲取數據時,我在某個地方遇到索引錯誤。我可以提取大量數據,但它在某個地方損壞了。我該如何解決?import requestsfrom bs4 import BeautifulSouptotalCar = 0for pageNumber in range(3, 7):    r = requests.get("https://www.autoscout24.com/lst/bmw?sort=standard&desc=0&offer=U&ustate=N%2CU&size=20&page="+        str(pageNumber)+"&cy=D&mmm=47%7C%7C&mmm=9%7C%7C&atype=C&")    r.status_code    r.content    soup = BeautifulSoup(r.content,"lxml")    #soup.prettify    car_details = soup.find_all("div",attrs={"class":"cl-list-element cl-list-element-gap"})    for detail in car_details:        car_link = "https://www.autoscout24.com"+detail.a.get("href")        #print(car_link)        car_r = requests.get(car_link)        car_soup = BeautifulSoup(car_r.content,"lxml")        car_make = car_soup.find("div",attrs={"class":"cldt-categorized-data cldt-data-section sc-pull-right"}).select("dl > dd:nth-of-type(1)")[0].text        #car_model = car_soup.find("div",attrs={"class":"cldt-categorized-data cldt-data-section sc-pull-right"}).select("dl > dd:nth-of-type(2)")[0].text        car_model = car_soup.find("div",attrs={"class":"cldt-categorized-data cldt-data-section sc-pull-right"}).select("dl > dd > a")[0].text        car_year = car_soup.find("div",attrs={"class":"cldt-categorized-data cldt-data-section sc-pull-right"}).select("dl > dd > a")[1].text        car_color = car_soup.find("div",attrs={"class":"cldt-categorized-data cldt-data-section sc-pull-right"}).select("dl > dd > a")[2].text        car_body = car_soup.find("div",attrs={"class":"cldt-categorized-data cldt-data-section sc-pull-right"}).select("dl > dd > a")[3].text        print("Make:{} Model:{} Year:{} Color:{} Body:{}".format(car_make,car_model,car_year,car_color,car_body))        print("-"*20)        totalCar+=1    print(totalCar)
查看完整描述

1 回答

?
DIEA

TA貢獻1820條經驗 獲得超2個贊

有時,車身信息不存在。您需要檢查:


import requests

from bs4 import BeautifulSoup


totalCar = 0

for pageNumber in range(3, 7):

    r = requests.get("https://www.autoscout24.com/lst/bmw?sort=standard&desc=0&offer=U&ustate=N%2CU&size=20&page="+

        str(pageNumber)+"&cy=D&mmm=47%7C%7C&mmm=9%7C%7C&atype=C&")

    r.status_code

    r.content

    soup = BeautifulSoup(r.content,"lxml")

    #soup.prettify

    car_details = soup.find_all("div",attrs={"class":"cl-list-element cl-list-element-gap"})

    for detail in car_details:

        car_link = "https://www.autoscout24.com"+detail.a.get("href")

        #print(car_link)

        car_r = requests.get(car_link)

        print(car_link)

        car_soup = BeautifulSoup(car_r.content,"lxml")

        car_make = car_soup.find("div",attrs={"class":"cldt-categorized-data cldt-data-section sc-pull-right"}).select("dl > dd:nth-of-type(1)")[0].text


        a = car_soup.find("div",attrs={"class":"cldt-categorized-data cldt-data-section sc-pull-right"}).select("dl > dd > a")


        car_model = a[0].text

        car_year = a[1].text

        car_color = a[2].text

        car_body = car_body = a[3].text if len(a) > 3 else '-'  # <-- check, if car body information is present


        print("Make:{} Model:{} Year:{} Color:{} Body:{}".format(car_make,car_model,car_year,car_color,car_body))

        print("-"*20)

        totalCar+=1

    print(totalCar)

印刷:


...


--------------------

https://www.autoscout24.com/offers/mercedes-benz-a-180-blueefficiency-limousine-5tuerig-gasoline-grey-73cbbad4-ab1c-4163-a7cf-76037408fcb8

Make:

Mercedes-Benz

 Model:A 180 Year:2009 Color:Grey Body:Sedans

--------------------

https://www.autoscout24.com/offers/audi-a4-ambiente-1-8-ahk-xenon-sitzh-pdc-tempom-8fach-gasoline-black-f6517012-9dfb-4d93-a7dd-d0b9b9bdbbc6

Make:

Audi

 Model:A4 Year:2008 Color:Black Body:Sedans

--------------------

80


查看完整回答
反對 回復 2023-07-05
  • 1 回答
  • 0 關注
  • 127 瀏覽
慕課專欄
更多

添加回答

舉報

0/150
提交
取消
微信客服

購課補貼
聯系客服咨詢優惠詳情

幫助反饋 APP下載

慕課網APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網微信公眾號