亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

為了賬號安全,請及時綁定郵箱和手機立即綁定
已解決430363個問題,去搜搜看,總會有你想問的

pyPDF2中的extractText()函數拋出錯誤

pyPDF2中的extractText()函數拋出錯誤

Cats萌萌 2021-03-15 12:13:19
我正在嘗試從PDF中提取文本,以便可以對其進行分析,但是當我嘗試從頁面中提取文本時,出現以下錯誤。Traceback (most recent call last):File "C:\Program Files (x86)\eclipse\plugins\org.python.pydev_2.7.4.2013051601\pysrc\pydevd_comm.py", line 765, in doIt    result = pydevd_vars.evaluateExpression(self.thread_id, self.frame_id, self.expression, self.doExec)File "C:\Program Files (x86)\eclipse\plugins\org.python.pydev_2.7.4.2013051601\pysrc\pydevd_vars.py", line 376, in evaluateExpression    result = eval(compiled, updated_globals, frame.f_locals)File "<string>", line 1, in <module>File "C:\Python33\lib\site-packages\pypdf2-1.9.0-py3.3.egg\PyPDF2\pdf.py", line 1701, in extractText    content = ContentStream(content, self.pdf)File "C:\Python33\lib\site-packages\pypdf2-1.9.0-py3.3.egg\PyPDF2\pdf.py", line 1783, in __init__    stream = StringIO(stream.getData())File "C:\Python33\lib\site-packages\pypdf2-1.9.0-py3.3.egg\PyPDF2\generic.py", line 801, in getData    decoded._data = filters.decodeStreamData(self)File "C:\Python33\lib\site-packages\pypdf2-1.9.0-py3.3.egg\PyPDF2\filters.py", line 228, in decodeStreamData    data = ASCII85Decode.decode(data)File "C:\Python33\lib\site-packages\pypdf2-1.9.0-py3.3.egg\PyPDF2\filters.py", line 170, in decode    data = [y for y in data if not (y in ' \n\r\t')]File "C:\Python33\lib\site-packages\pypdf2-1.9.0-py3.3.egg\PyPDF2\filters.py", line 170, in <listcomp>    data = [y for y in data if not (y in ' \n\r\t')]TypeError: 'in <string>' requires string as left operand, not int相關代碼節如下:from PyPDF2 import PdfFileReaderfor PDF_Entry in self.PDF_List:    Pdf_File = PdfFileReader(open(PDF_Entry, "rb"))    for pg_idx in range(0, Pdf_File.getNumPages()):        page_Content = Pdf_File.getPage(pg_idx).extractText()        for line in page_Content.split("\n"):            self.Analyse_Line(line)將錯誤拋出在extractText()行。
查看完整描述

2 回答

?
吃雞游戲

TA貢獻1829條經驗 獲得超7個贊

您正在一行中做兩件事。嘗試打破所做的事情以進一步解決問題。改變:


page_Content = Pdf_File.getPage(pg_idx).extractText()

進入


page = Pdf_File.getPage(pg_idx)

page_Content = page.extractText()

查看錯誤發生的位置。還要從命令行而不是從Eclipse運行該程序,只是為了確保它是相同的錯誤。您說它發生在,extractText()但是該行沒有顯示在回溯中。


查看完整回答
反對 回復 2021-03-29
  • 2 回答
  • 0 關注
  • 511 瀏覽
慕課專欄
更多

添加回答

舉報

0/150
提交
取消
微信客服

購課補貼
聯系客服咨詢優惠詳情

幫助反饋 APP下載

慕課網APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網微信公眾號