我正在嘗試將 AGL 賬單轉換為數據框,以便我可以將所需的值放入 Excel 電子表格中。我一直試圖.replace()在行中沒有任何字符,以便只留下數字(試圖刪除數據框中的所有單詞)。另一個問題是每個單元格中有多個單詞和數字。from tabula import read_pdfimport openpyxlfrom openpyxl import load_workbookimport pandas as pdimport numpy as npdf1 = tabula.read_pdf('C:/Users/Blake/Desktop/Python/AGL_Bill.pdf',guess=False, pages=2)df1.columns = ['Description', 'Blank', 'Values']df1.drop(labels=None, axis=None, index=[0,1,3,4,7,8,25,26,19,15,16,20,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62], columns=None, level=None, inplace=True, errors='raise')df1.drop(labels=None, axis=1, columns=['Values'], level=None, inplace=True, errors='raise')df1['Description'].str.replace('kWh', '')print (df1)df1.to_csv('Tableone.csv', encoding='utf-8')wb2 = load_workbook('C:/Users/Blake/Desktop/ETemplate.xlsx')wb2.create_sheet('DATA')wb2.save('C:/Users/Blake/Desktop/Template.xlsx')`
1 回答

米脂
TA貢獻1836條經驗 獲得超3個贊
如果您試圖用空替換字符 - 然后使用數字,每個單元格的 RegEx - 將它們連接在一起。
進口重新
import pandas as pd
data={'1':'Some dumb data $200.22 for me','2':'Some more really dumb data $5.23'}
df=pd.DataFrame.from_dict(data,orient='index')
df.columns=['Data']
def Num_Only(val):
return ' '.join(re.findall('[\d\.]+',val))
df['New']=''
df.New=df.Data.apply(lambda x: Num_Only(x))
Which should output a new Dataframe ... like this
輸出現在是......我已經刪除了 $ 因為它沒有任何用處。
1. Some dumb data $200.22 for me 200.22
2 Some more really dumb data $5.23 5.23
希望能讓你繼續前進
添加回答
舉報
0/150
提交
取消