2 回答

TA貢獻1798條經驗 獲得超7個贊
首先,您想要分離單元格內的代碼,然后您可以提取第一個代碼和groupby:
# separate the codes
tmp = df.assign(FirstCode=df.Alpha.str.split(','))
# extract the first code
tmp['FirstCode'] = [tuple(sorted(set(x.split('-')[0] for x in cell)))
for cell in tmp.FirstCode]
# sum per each first codes with groupby
sum_per_code = tmp['AlphaComboCount'].groupby(tmp['FirstCode']).transform('sum')
# percentage is just a simple division
tmp['Percent'] = tmp['AlphaComboCount']/sum_per_code
# let's print the output:
print(tmp.sort_values('FirstCode'))
輸出:
Alpha AlphaComboCount FirstCode Percent
0 12-99 8039 (12,) 0.918743
11 12-581,12-99 711 (12,) 0.081257
2 12-99,138-99 1776 (12, 138) 0.528414
3 12-45,138-45 1585 (12, 138) 0.471586
6 121-99 1102 (121,) 1.000000
14 123-49,121-29,22-79 626 (121, 123, 22) 0.993651
15 121-99,123-99,22-99 4 (121, 123, 22) 0.006349
8 121-99,22-99 909 (121, 22) 1.000000
5 123-99 1145 (123,) 1.000000
13 2089-281 685 (2089,) 1.000000
4 21-99 1225 (21,) 0.532609
7 21-581 1000 (21,) 0.434783
10 21-141 75 (21,) 0.032609
1 22-99 1792 (22,) 1.000000
9 32-99 814 (32,) 1.000000
12 347-99 685 (347,) 1.000000

TA貢獻2011條經驗 獲得超2個贊
如果Alpha列中有多個代碼,順序不同,那么可能的解決方案之一是提取其中之一(例如最?。?,然后取出“-”之前的部分,將其保存在新列中并在進一步中使用加工:
df['Alpha_1'] = df.Alpha.str.split(',')\
.apply(lambda lst: min(lst)).str.split('-', expand=True)[0]
結果是:
Alpha AlphaComboCount Alpha_1
0 12-99 8039 12
1 22-99 1792 22
2 12-99,138-99 1776 12
3 12-45,138-45 1585 12
4 21-99 1225 21
5 123-99 1145 123
6 121-99 1102 121
7 21-581 1000 21
8 121-99,22-99 909 121
9 32-99 814 32
10 21-141 75 21
11 12-581,12-99 711 12
12 347-99 685 347
13 2089-281 685 2089
14 123-49,121-29,22-79 626 121
15 121-99,123-99,22-99 4 121
要計算每個組中AlphaComboCount的百分比(具有特定值Alpha_1),請定義以下函數:
def proc(grp):
return (grp.AlphaComboCount / grp.AlphaComboCount.sum()
* 100).apply('{0:.2f}%'.format)
按Alpha_1對df進行分組并應用此函數,將結果保存在Grp_pct列中:
df['Grp_pct'] = df.groupby('Alpha_1').apply(proc).reset_index(level=0, drop=True)
要輕松檢查結果,請將每組中的行放在一起,按以下方式打印df :
print(df.sort_values('Alpha_1'))
得到:
Alpha AlphaComboCount Alpha_1 Grp_pct
0 12-99 8039 12 66.38%
2 12-99,138-99 1776 12 14.66%
3 12-45,138-45 1585 12 13.09%
11 12-581,12-99 711 12 5.87%
6 121-99 1102 121 41.73%
8 121-99,22-99 909 121 34.42%
14 123-49,121-29,22-79 626 121 23.70%
15 121-99,123-99,22-99 4 121 0.15%
5 123-99 1145 123 100.00%
13 2089-281 685 2089 100.00%
4 21-99 1225 21 53.26%
7 21-581 1000 21 43.48%
10 21-141 75 21 3.26%
1 22-99 1792 22 100.00%
9 32-99 814 32 100.00%
12 347-99 685 347 100.00%
現在,例如,將有關Alpha_1 == 21 的部分與子代碼21的預期結果進行比較。
添加回答
舉報