首頁猿問將每個數據幀行切片為 3...

將每個數據幀行切片為 3 個具有不同切片范圍的窗口

Python

慕碼人2483693 2023-03-16 16:24:53

我想將我的數據幀的每一行切片成 3 個窗口，切片索引存儲在另一個數據幀中，并針對數據幀的每一行進行更改。之后我想以 MultiIndex 的形式返回一個包含窗口的數據幀。每個窗口中比窗口中最長的行短的行應該用 NaN 值填充。由于我的實際數據框有大約 100.000 行和 600 列，我很關心一個有效的解決方案?？紤]以下示例：這是我的數據框，我想將其分成 3 個窗口>>> df 0 1 2 3 4 5 6 70 0 1 2 3 4 5 6 71 8 9 10 11 12 13 14 152 16 17 18 19 20 21 22 23第二個數據框包含我的切片索引，其行數與df：>>> df_slice 0 10 3 51 2 62 4 7我試過切片窗戶，像這樣：first_window = df.iloc[:, :df_slice.iloc[:, 0]]first_window.columns = pd.MultiIndex.from_tuples([("A", c) for c in first_window.columns])second_window = df.iloc[:, df_slice.iloc[:, 0] : df_slice.iloc[:, 1]]second_window.columns = pd.MultiIndex.from_tuples([("B", c) for c in second_window.columns])third_window = df.iloc[:, df_slice.iloc[:, 1]:]third_window.columns = pd.MultiIndex.from_tuples([("C", c) for c in third_window.columns])result = pd.concat([first_window, second_window, third_window], axis=1)這給了我以下錯誤：TypeError: cannot do slice indexing on <class 'pandas.core.indexes.range.RangeIndex'> with these indexers [0 31 22 4Name: 0, dtype: int64] of <class 'pandas.core.series.Series'>我的預期輸出是這樣的：>>> result A B C 0 1 2 3 4 5 6 7 8 9 100 0 1 2 NaN 3 4 NaN NaN 5 6 71 8 9 NaN NaN 10 11 12 13 14 15 NaN2 16 17 18 19 20 21 22 NaN 23 NaN NaN在不遍歷數據幀的每一行的情況下，是否有一個有效的解決方案來解決我的問題？

查看完整描述

1 回答

隔江千里

TA貢獻1906條經驗獲得超10個贊

這是一個解決方案，使用meltand thenpivot_table加上一些邏輯來：

確定三組“A”、“B”和“C”。
將列向左移動，以便 NaN 僅出現在每個窗口的右側。
重命名列以獲得預期的輸出。

t = df.reset_index().melt(id_vars="index")

t = pd.merge(t, df_slice, left_on="index", right_index=True)

t.variable = pd.to_numeric(t.variable)

t.loc[t.variable < t.c_0,"group"] = "A"

t.loc[(t.variable >= t.c_0) & (t.variable < t.c_1), "group"] = "B"

t.loc[t.variable >= t.c_1, "group"] = "C"

# shift relevant values to the left

shift_val = t.groupby(["group", "index"]).variable.transform("min") - t.groupby(["group"]).variable.transform("min")

t.variable = t.variable - shift_val

# extract a, b, and c groups, and create a multi-level index for their

# columns

df_a = pd.pivot_table(t[t.group == "A"], index= "index", columns="variable", values="value")

df_a.columns = pd.MultiIndex.from_product([["a"], df_a.columns])

df_b = pd.pivot_table(t[t.group == "B"], index= "index", columns="variable", values="value")

df_b.columns = pd.MultiIndex.from_product([["b"], df_b.columns])

df_c = pd.pivot_table(t[t.group == "C"], index= "index", columns="variable", values="value")

df_c.columns = pd.MultiIndex.from_product([["c"], df_c.columns])

res = pd.concat([df_a, df_b, df_c], axis=1)

res.columns = pd.MultiIndex.from_tuples([(c[0], i) for i, c in enumerate(res.columns)])

print(res)

輸出是：

a b c

0 1 2 3 4 5 6 7 8 9 10

index

0 0.0 1.0 2.0 NaN 3.0 4.0 NaN NaN 5.0 6.0 7.0

1 8.0 9.0 NaN NaN 10.0 11.0 12.0 13.0 14.0 15.0 NaN

2 16.0 17.0 18.0 19.0 20.0 21.0 22.0 NaN 23.0 NaN NaN

反對回復 2023-03-16

1 回答
0 關注
76 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

將每個數據幀行切片為 3 個具有不同切片范圍的窗口

將每個數據幀行切片為 3 個具有不同切片范圍的窗口

1 回答

添加回答