首頁猿問如何將多個csv連接到xarray...

如何將多個csv連接到xarray并定義坐標？

Python

慕桂英546537 2022-06-22 15:34:32

我有多個 csv 文件，具有相同的行和列，它們包含的數據因日期而異。每個 csv 文件都附屬于不同的日期，在其名稱中列出，例如data.2018-06-01.csv. 我的數據的一個最小示例如下所示：我有 2 個文件data.2018-06-01.csv和data.2019-06-01.csv，它們分別包含user_id, weight, status001, 70, healthy002, 90, healthy 和user_id, weight, status001, 72, healthy002, 103, obese我的問題：如何將 csv 文件連接到 xarray 并定義 xarray 的坐標是user_id和date？我嘗試了以下代碼df_all = [] date_arr = []for f in [`data.2018-06-01.csv`, `data.2019-06-01.csv`]: date = f.split('.')[1] df = pd.read_csv(f) df_all.append(df) date_arr.append(date)x_arr = xr.concat([df.to_xarray() for df in df_all], coords=[date_arr, 'user_id'])但coords=[...]會導致錯誤。我能做什么？謝謝

查看完整描述

2 回答

慕的地8271018

TA貢獻1796條經驗獲得超4個贊

NumPy回想一下，盡管它在原始類數組之上引入了維度、坐標和屬性形式的標簽，但它的xarray靈感來自pandas. 因此，要回答這個問題，您可以按照以下步驟進行。

from glob import glob

import numpy as np

import pandas as pd

# Get the list of all the csv files in data path

csv_flist = glob(data_path + "/*.csv")

df_list = []

for _file in csv_flist:

# get the file name from the data path

file_name = _file.split("/")[-1]

# extract the date from a file name, e.g. "data.2018-06-01.csv"

date = file_name.split(".")[1]

# read the read the data in _file

df = pd.read_csv(_file)

# add a column date knowing that all the data in df are recorded at the same date

df["date"] = np.repeat(date, df.shape[0])

df["date"] = df.date.astype("datetime64[ns]") # reset date column to a correct date format

# append df to df_list

df_list.append(df)

讓我們檢查一下例如第df一個df_list

print(df_list[0])

status user_id weight date

0 healthy 1 72 2019-06-01

1 obese 2 103 2019-06-01

連接所有的dfsaxis=0

df_all = pd.concat(df_list, ignore_index=True).sort_index()

print(df_all)

status user_id weight date

0 healthy 1 72 2019-06-01

1 obese 2 103 2019-06-01

2 healthy 1 70 2018-06-01

3 healthy 2 90 2018-06-01

使用和將的索引設置df_all為兩個級別的levels[0] = "date"多索引levels[1]="user_id"。

data = df_all.set_index(["date", "user_id"]).sort_index()

print(data)

status weight

date user_id

2018-06-01 1 healthy 70

2 healthy 90

2019-06-01 1 healthy 72

2 obese 103

隨后，您可以將結果pandas.DataFrame轉換為xarray.Datasetusing .to_xarray()，如下所示。

xds = data.to_xarray()

print(xds)

<xarray.Dataset>

Dimensions: (date: 2, user_id: 2)

Coordinates:

* date (date) datetime64[ns] 2018-06-01 2019-06-01

* user_id (user_id) int64 1 2

Data variables:

status (date, user_id) object 'healthy' 'healthy' 'healthy' 'obese'

weight (date, user_id) int64 70 90 72 103

這將完全回答這個問題。

反對回復 2022-06-22

阿晨1998

TA貢獻2037條經驗獲得超6個贊

試試這些：

import glob

import pandas as pd

path=(r'ur file')

all_file = glob.glob(path + "/*.csv")

li = []

for filename in all_file:

df = pd.read_csv(filename, index_col=None, header=0)

li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

反對回復 2022-06-22

2 回答
0 關注
128 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

如何將多個csv連接到xarray并定義坐標？

如何將多個csv連接到xarray并定義坐標？

2 回答

添加回答

如何將多個csv連接到xarray并定義坐標？