5 回答

TA貢獻1856條經驗 獲得超17個贊
采用排序后的順序,然后對其應用二次函數,其中根是數組長度的 1/2(加上一些小的偏移量)。通過這種方式,最高排名被賦予極值(eps偏移量的符號決定了您是否想要排名在最低值之上的最高值)。我在末尾添加了一個小組來展示它如何正確處理重復值或奇數組大小。
def extremal_rank(s):
eps = 10**-4
y = (pd.Series(np.arange(1, len(s)+1), index=s.sort_values().index)
- (len(s)+1)/2 + eps)**2
return y.reindex_like(s)
df['rnk'] = df.groupby('Group')['Performance'].apply(extremal_rank)
df = df.sort_values(['Group', 'rnk'], ascending=[True, False])
Group Name Performance rnk
2 A Chad Webster 142 6.2505
0 A Sheldon Webb 33 6.2495
4 A Elijah Mendoza 122 2.2503
1 A Traci Dean 64 2.2497
3 A Ora Harmon 116 0.2501
5 A June Strickland 68 0.2499
8 B Joel Gill 132 2.2503
9 B Vernon Stone 80 2.2497
7 B Betty Sutton 127 0.2501
6 B Beth Vasquez 95 0.2499
11 C b 110 9.0006
12 C c 68 8.9994
10 C a 110 4.0004
13 C d 68 3.9996
15 C f 70 1.0002
16 C g 70 0.9998
14 C e 70 0.0000

TA貢獻1828條經驗 獲得超3個贊
您可以避免在 Performace 上groupby使用sort_values一次升序一次降序,concat兩個排序的數據幀,然后使用sort_index并drop_duplicates獲得預期的輸出:
df_ = (pd.concat([df.sort_values(['Group', 'Performance'], ascending=[True, False])
.reset_index(), #need the original index for later drop_duplicates
df.sort_values(['Group', 'Performance'], ascending=[True, True])
.reset_index()
.set_index(np.arange(len(df))+0.5)], # for later sort_index
axis=0)
.sort_index()
.drop_duplicates('index', keep='first')
.reset_index(drop=True)
[['Group', 'Name', 'Performance']]
)
print(df_)
Group Name Performance
0 A Chad Webster 142
1 A Sheldon Webb 33
2 A Elijah Mendoza 122
3 A Traci Dean 64
4 A Ora Harmon 116
5 A June Strickland 68
6 B Joel Gill 132
7 B Vernon Stone 80
8 B Betty Sutton 127
9 B Beth Vasquez 95

TA貢獻1770條經驗 獲得超3個贊
對每個組應用nlargest和的排序串聯:nsmallest
>>> (df.groupby('Group')[df.columns[1:]]
.apply(lambda x:
pd.concat([x.nlargest(x.shape[0]//2,'Performance').reset_index(),
x.nsmallest(x.shape[0]-x.shape[0]//2,'Performance').reset_index()]
)
.sort_index()
.drop('index',1))
.reset_index().drop('level_1',1))
Group Name Performance
0 A Chad Webster 142
1 A Sheldon Webb 33
2 A Elijah Mendoza 122
3 A Traci Dean 64
4 A Ora Harmon 116
5 A June Strickland 68
6 B Joel Gill 132
7 B Vernon Stone 80
8 B Betty Sutton 127
9 B Beth Vasquez 95

TA貢獻1818條經驗 獲得超7個贊
只是另一種使用自定義函數的方法np.empty:
def mysort(s):
arr = s.to_numpy()
c = np.empty(arr.shape, dtype=arr.dtype)
idx = arr.shape[0]//2 if not arr.shape[0]%2 else arr.shape[0]//2+1
c[0::2], c[1::2] = arr[:idx], arr[idx:][::-1]
return pd.DataFrame(c, columns=s.columns)
print (df.sort_values("Performance", ascending=False).groupby("Group").apply(mysort))
Group Name Performance
Group
A 0 A Chad Webster 142
1 A Sheldon Webb 33
2 A Elijah Mendoza 122
3 A Traci Dean 64
4 A Ora Harmon 116
5 A June Strickland 68
B 0 B Joel Gill 132
1 B Vernon Stone 80
2 B Betty Sutton 127
3 B Beth Vasquez 95
基準:

TA貢獻1877條經驗 獲得超1個贊
讓我們嘗試用 檢測min, max行groupby().transform(),然后排序:
groups = df.groupby('Group')['Performance']
mins, maxs = groups.transform('min'), groups.transform('max')
(df.assign(temp=df['Performance'].eq(mins) | df['Performance'].eq(maxs))
.sort_values(['Group','temp','Performance'],
ascending=[True, False, False])
.drop('temp', axis=1)
)
輸出:
Group Name Performance
2 A Chad Webster 142
0 A Sheldon Webb 33
4 A Elijah Mendoza 122
3 A Ora Harmon 116
5 A June Strickland 68
1 A Traci Dean 64
8 B Joel Gill 132
9 B Vernon Stone 80
7 B Betty Sutton 127
6 B Beth Vasquez 95
添加回答
舉報