為什么下面的代碼爬不出圖片(想爬妹子^_^)
from urllib import request
from bs4 import BeautifulSoup
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
f = request.urlopen("https://www.zhihu.com/question/29815334")
html = f.read()
soup = BeautifulSoup(html, "html5lib")
# print(soup.prettify())
p = soup.select("img")
# print(p[2])
# for i in p:
# ? ? print(i)
# x = p[0]["src"]
# print(x)
i = 0
while True:
? ?x = p[i]["src"]
? ?name = "/Users/lcycq/Desktop/BT/%d.jpg"%i
? ?i += 1
? ?if i >= len(p):
? ? ? ?break
? ?if not x.startswith('http'):
? ? ? ?continue
? ?request.urlretrieve(x, filename='name')
2017-05-27
print一下最開始的html,如果輸出不行或者亂碼的話,先decode('utf-8),然后沒有網址的圖片具體目標分析,應該是解析網頁圖片哪里出現了錯誤,有些不一定是src標簽里面的或許。