亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

為了賬號安全,請及時綁定郵箱和手機立即綁定
已解決430363個問題,去搜搜看,總會有你想問的

求大佬指點哈!Python中for循環中列表切片問題

求大佬指點哈!Python中for循環中列表切片問題

滄海一幻覺 2019-10-25 19:53:52
這個程序是抽取豆瓣top250頁面所有電影相關信息(名稱,分數,影評人數,引用語)。問題是在parse_page函數中,top250共十個頁面,成功提取前八頁的信息,但最后兩頁的信息提取有問題提示listindexoutofrange,但此數據在for中有顯示,for之外調用時就出錯。求解。importsocketimportssldeflog(*args,**kwargs):print('log:',*args,**kwargs)defparse_url(url):#提取協議與uriprotocol=url.split('://')[0]ifprotocol=='http':protocol='http'uri=url.split('://')[1]elifprotocol=='https':protocol='https'uri=url.split('://')[1]else:uri=url#提取主機地址index=uri.find('/')ifindex==-1:host=urielse:host=uri.split('/')[0]#提取端口號http_ports={'http':80,'https':443,}ifprotocolinhttp_ports:port=http_ports[protocol]else:port=uri.split(':')[1]#提取路徑ifindex==-1:path='/'else:path='/'+uri.split('/')[1]returnprotocol,host,port,pathdefsocket_by_protocol(protocol):ifprotocol=='http':s=socket.socket()elifprotocol=='https':s=ssl.wrap_socket(socket.socket())returnsdefresponse_by_socket(s):buffer_size=1024all_data=b''whileTrue:response=s.recv(buffer_size)iflen(response)==0:breakall_data+=responsereturnall_data.decode()defparse_response(response):errors=''ifresponse:header,body=response.split('\r\n\r\n',1)header_line=header.split('\r\n')status_code=header_line[0].split()[1]headers={}forlineinheader_line[1:]:k,v=line.split(':')headers[k]=velse:errors='responseisnullvalue.'headers={}body=''returnstatus_code,headers,bodydefconstruct_request(host,path):request='GET{}HTTP/1.1\r\nhost:{}\r\nconnection:close\r\n\r\n'.format(path,host)returnrequest.encode()defget(url,query):protocol,host,port,path=parse_url(url)s=socket_by_protocol(protocol)s.connect((host,port))cons_path='{}?{}={}'.format(path,query[1],query[0])request=construct_request(host,cons_path)s.send(request)response=response_by_socket(s)status_code,header,body=parse_response(response)returnstatus_code,header,bodydefparse_page(source=''):mv_name=[]mv_score=[]mv_people=[]mv_quot=[]first_split=str(source.split('').pop(1))second_split=str(first_split.split('').pop(0))third_split=second_split.split('')delthird_split[0]forlineinthird_split:line=line.split('')delline[1]#名稱抽取raw_single_mv_name=line[0].split('')[0].split('')[1]single_mv_name=raw_single_mv_name.split('')[0]mv_name.append(single_mv_name)#分數與評價人數抽取raw_single_mv_evaluate=line[0].split('')[1].split('')single_mv_score=raw_single_mv_evaluate[1].split('">')[1]mv_score.append(single_mv_score)single_mv_people=raw_single_mv_evaluate[3].split('')[1]mv_people.append(single_mv_people)#引用語抽取#log(mv_name,mv_score,mv_people,line[0])#log(line[0].split('')[1])raw_singe_mv_quot=line[0].split('')[1]#log(raw_singe_mv_quot)single_mv_quot=raw_singe_mv_quot.split('')[0]#log(single_mv_quot)mv_quot.append(single_mv_quot)#此處mv_quot有值log(mv_quot)#為何這里mv_quot提示listindexoutofrangelog(mv_quot)#log(len(mv_name),len(mv_score),len(mv_people),len(mv_quot))returnmv_name,mv_score,mv_people,mv_quotdefmain():url="https://movie.douban.com/top250"protocol,host,port,path=parse_url(url)log(protocol,host,port,path)queries={}forvin[valueforvalueinrange(250,0,-25)]:queries[v]='start'log(queries)i=0forqinqueries.items():try:status_code,header,body=get(url,q)"""ifi==8:log(status_code,header,body)"""mvo_name,mvo_score,mvo_people,mvo_quot=parse_page(source=body)#log(mvo_name)#log(mvo_score)#log(mvo_people)log(mvo_quot)i+=1exceptExceptionase:log(e)continueif__name__=='__main__':main()
查看完整描述

2 回答

?
四季花海

TA貢獻1811條經驗 獲得超5個贊

這行代碼有問題
raw_singe_mv_quot=line[0].split('')[1]
拆開解釋
tmp_list=line[0].split('')
raw_singe_mv_quot=tmp_list[1]
tmp_list這個列表的長度可能為1,所以tmp_list[1]會報錯誤。
具體邏輯我也沒看,你自己排查吧!
                            
查看完整回答
反對 回復 2019-10-25
  • 2 回答
  • 0 關注
  • 353 瀏覽
慕課專欄
更多

添加回答

舉報

0/150
提交
取消
微信客服

購課補貼
聯系客服咨詢優惠詳情

幫助反饋 APP下載

慕課網APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網微信公眾號