剛開始學習python爬蟲,想實現對搜狗公眾號搜索結果的爬取發現問題是抓到的信息沒有直接在瀏覽器訪問的URL信息完整。以下是基本實現,代碼很簡單,爬取到的頁面中沒有“最近文章”(在瀏覽器中直接訪問有“最近文章”內容)請高手們指點一二,謝謝!#-*- coding: utf-8 -*-import urllib2import sysimport urllibfrom bs4 import BeautifulSoup
reload(sys)
sys.setdefaultencoding('utf8')
url = 'http://weixin.sogou.com/gzh?openid=oIWsFt5l9RDYeAjdXZBYtGzbH0JI'print url
i_headers = {"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:32.0) Gecko/20100101 Firefox/32.0"}
req = urllib2.Request(url, headers=i_headers)
content = urllib2.urlopen(req).read()
soup = BeautifulSoup(content)print soup
siteUrls = soup.findAll(attrs={'class':'img_box2'})print siteUrls
file_object = open('test.htm','w+')
file_object.write(content)
file_object.close()
python爬蟲對搜狗抓取微信搜索信息不全問題
陪伴而非守候
2018-07-16 15:12:22