首頁猿問如何根據值的頻率拆分 pandas...

如何根據值的頻率拆分 pandas 數據框

Python

蕪湖不蕪 2024-01-27 15:19:56

我有興趣根據 B 列中條目的頻率將該數據幀分成 20 個較小的數據幀。B 具有數字條目，其中一些條目重復多次，如下所示。 A (index) B (Column of interest) 0 1 1 2 2 2 3 2 4 3 ... ... 25643 5238 25644 5238 25645 5238 25646 5238 25647 5238我希望每個頻率都有一個數據框：1-10、11-20、21-30、....、191-200。意思是，1-10 數據幀包含 B 中在該數據幀中出現 1 到 10 次的所有條目。同樣，11-20 數據幀包含在整個數據幀中出現 11 次和 20 次的所有條目。最后，我應該有 20 個數據幀，所有這些數據幀都分割了這個主數據幀。我所能做的就是使用以下代碼從 B 列中找到與這些 freeuqncies 相對應的所需條目中的不同數量的條目： df.loc[(df['B'] > 0) & (df['B'] < 11)] df.loc[(df['B'] > 10) & (df['B'] < 21)] ... df.loc[df['B'] > 190) & (df['B'] < 201) 我一直在考慮使用該groupby()函數，但是，我還沒有找到一種根據頻率對列條目進行分組的方法。任何幫助表示贊賞！

查看完整描述

1 回答

慕容708150

TA貢獻1831條經驗獲得超4個贊

計算數據幀中每個值的出現次數，將頻率范圍以 10 為一組進行分組，然后為每個范圍創建dict一個DataFrames。
- 垃圾箱標簽將成為dict鑰匙
- 該bins列是分類的，因此.groupby將為每個標簽創建一個組，即使該組為空，因此使用pandas.DataFrame.empty，因此只有非空組才會添加到dictof 中DataFrames。
- 替換g: dfg為中g: pd.DataFrame(dfg.B)只有列。?Bdict
- 使用dfg.reset_index(drop=True)或pd.DataFrame(dfg.B).reset_index(drop=True)刪除原始索引。
- labels使用，因為它們更容易用作dict密鑰
- 如果不使用labels，dict鍵將是Interval, 就像[Interval(10, 20, closed='right')，這很麻煩。
- df.B.map(df.groupby('B')['B'].count())也有效，但不是必需的。

使用pandas.Series.value_counts()和pandas.Series.map在中創建一個計數列df，它將傳達列中值的頻率B。
用于pd.cut對頻率范圍進行分類
pandas.DataFrame.groupby與 a 一起使用可根據 bin 標簽?dict-comprehension創建dictof 。DataFrames

import pandas as pd

import numpy as np

# setup test dataframe

np.random.seed(365)

df = pd.DataFrame({'B': np.random.randint(5238, size=(200000))})

# add a counts column to the dataframe

df['counts'] = df.B.map(df.B.value_counts())

# create a bins column for the frequency range

bins = range(0, 201, 10)

labels = range(10, 201, 10)

df['bins'] = pd.cut(df.counts, bins=bins, right=True, labels=labels)

# display(df.head())

? ? ? B? counts bins

0? 2740? ? ? 37? ?40

1? 4897? ? ? 41? ?50

2? 4955? ? ? 45? ?50

3? ?428? ? ? 31? ?40

4? ?226? ? ? 34? ?40

# create a dict of dataframes for the non-empty bins

dfd = {g: dfg for g, dfg in df.groupby('bins') if not dfg.empty}

# print dict keys

dfd.keys()

[out]:

dict_keys([20, 30, 40, 50, 60, 70])

# display(dfd[20].head())

? ? ? ? ? B? counts bins

5350? ?4986? ? ? 19? ?20

5646? ?4952? ? ? 20? ?20

11232? 3728? ? ? 19? ?20

11707? 2819? ? ? 20? ?20

13547? 3728? ? ? 19? ?20

反對回復 2024-01-27

1 回答
0 關注
183 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

如何根據值的頻率拆分 pandas 數據框

如何根據值的頻率拆分 pandas 數據框

1 回答

添加回答