首頁猿問如何使用熊貓從字符串中刪除小數點

如何使用熊貓從字符串中刪除小數點

Python

絕地無雙 2021-12-09 15:15:50

我正在讀取一個 xls 文件并使用 pyspark 在 databricks 中轉換為 csv 文件。我的輸入數據在 xls 文件中的字符串格式為 101101114501700。但是在使用 Pandas 將其轉換為 CSV 格式并寫入 datalake 文件夾后，我的數據顯示為 101101114501700.0。我的代碼如下。請幫助我為什么我在數據中得到小數部分。for file in os.listdir("/path/to/file"): if file.endswith(".xls"): filepath = os.path.join("/path/to/file",file) filepath_pd = pd.ExcelFile(filepath) names = filepath_pd.sheet_names df = pd.concat([filepath_pd.parse(name) for name in names]) df1 = df.to_csv("/path/to/file"+file.split('.')[0]+".csv", sep=',', encoding='utf-8', index=False) print(time.strftime("%Y%m%d-%H%M%S") + ": XLS files converted to CSV and moved to folder"

查看完整描述

2 回答

尚方寶劍之說

TA貢獻1788條經驗獲得超4個贊

我認為該字段在讀取 excel 時會自動解析為浮點數。之后我會更正它：

df['column_name'] = df['column_name'].astype(int)

如果您的列包含空值，則無法轉換為整數，因此您需要先填充空值：

df['column_name'] = df['column_name'].fillna(0).astype(int)

然后你可以連接和存儲你的方式

反對回復 2021-12-09

繁華開滿天機

TA貢獻1816條經驗獲得超4個贊

您的問題與 Spark 或 PySpark 無關。它與Pandas相關。

這是因為 Pandas 會自動解釋和推斷列的數據類型。由于您的列的所有值都是數字，Pandas 會將其視為float數據類型。

為了避免這種情況，pandas.ExcelFile.parse方法接受一個名為的參數converters，您可以使用它通過以下方式告訴 Pandas 特定的列數據類型：

# if you want one specific column as string

df = pd.concat([filepath_pd.parse(name, converters={'column_name': str}) for name in names])

或者

# if you want all columns as string

# and you have multi sheets and they do not have same columns

# this merge all sheets into one dataframe

def get_converters(excel_file, sheet_name, dt_cols):

cols = excel_file.parse(sheet_name).columns

converters = {col: str for col in cols if col not in dt_cols}

for col in dt_cols:

converters[col] = pd.to_datetime

return converters

df = pd.concat([filepath_pd.parse(name, converters=get_converters(filepath_pd, name, ['date_column'])) for name in names]).reset_index(drop=True)

或者

# if you want all columns as string

# and all your sheets have same columns

cols = filepath_pd.parse().columns

dt_cols = ['date_column']

converters = {col: str for col in cols if col not in dt_cols}

for col in dt_cols:

converters[col] = pd.to_datetime

df = pd.concat([filepath_pd.parse(name, converters=converters) for name in names]).reset_index(drop=True)

反對回復 2021-12-09

2 回答
0 關注
246 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

如何使用熊貓從字符串中刪除小數點

如何使用熊貓從字符串中刪除小數點

2 回答

添加回答