亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

為了賬號安全,請及時綁定郵箱和手機立即綁定
已解決430363個問題,去搜搜看,總會有你想問的

根據現有列的條件創建新的 pandas 列

根據現有列的條件創建新的 pandas 列

互換的青春 2023-12-26 15:47:10
我有一個數據框,如下所示:col1 = ['a','b','c','a','c','a','b','c','a']col2 = [1,1,0,1,1,0,1,1,0]df2 = pd.DataFrame(zip(col1,col2),columns=['name','count'])    name    count0   a       11   b       12   c       03   a       14   c       15   a       06   b       17   c       18   a       0我試圖找到“名稱”列中每個元素對應的零數與零+一總和的比率。首先我將計數匯總如下:for j in df2.name.unique():    print(j)    zero_ct = zero_one_frequencies[zero_one_frequencies['name'] == j][0]    full_ct = zero_one_frequencies[zero_one_frequencies['name'] == j][0] + zero_one_frequencies[zero_one_frequencies['name'] == j][1]    zero_pb = zero_ct / full_ct    one_pb = 1 - zero_pb    print(f"ZERO rations for {j} = {zero_pb}")    print(f"One ratios for {j} = {one_pb}")    print("="*30)輸出如下:aZERO ratios for a = 0    0.5dtype: float64One ratios for a = 0    0.5dtype: float64==============================bZERO ratios for b = 1    0.0dtype: float64One ratios for b = 1    1.0dtype: float64==============================cZERO ratios for c = 2    0.333333dtype: float64One ratios for c = 2    0.666667dtype: float64==============================我的目標是向數據框中添加 2 個新列:“name_0”和“name_1”,以及“name”列中每個元素的比率值。我嘗試了一些方法,但沒有給出預期的結果:for j in df2.name.unique():    print(j)    zero_ct = zero_one_frequencies[zero_one_frequencies['name'] == j][0]    full_ct = zero_one_frequencies[zero_one_frequencies['name'] == j][0] + zero_one_frequencies[zero_one_frequencies['name'] == j][1]    zero_pb = zero_ct / full_ct    one_pb = 1 - zero_pb    print(f"ZERO Probablitliy for {j} = {zero_pb}")    print(f"One Probablitliy for {j} = {one_pb}")    print("="*30)        condition1 = [ df2['name'].eq(j) & df2['count'].eq(0)]    condition2 = [ df2['name'].eq(j) & df2['count'].eq(1)]    choice1 = zero_pb.tolist()    choice2 = one_pb.tolist()該列將使用名稱元素“c”的值進行更新。這是可以預料的,因為最后計算的值將用于更新所有值。還有另一種方法可以有效地使用 np.select 嗎?
查看完整描述

2 回答

?
慕俠2389804

TA貢獻1719條經驗 獲得超6個贊

我無法訪問 Zero_one_frequencies df。所以我冒昧地嘗試用我的方式解決這個問題。


import pandas as pd

import numpy as np

col1 = ['a','b','c','a','c','a','b','c','a']

col2 = [1,1,0,1,1,0,1,1,0]

df2 = pd.DataFrame(zip(col1,col2),columns=['name','count'])


df2["name_0"] = 0

df2["name_1"] = 0


for name in df2['name'].unique():

? df_name = df2[df2['name'] == name]

? prob_1 = sum(df_name['count']/df_name.shape[0])

? for count in df2['count'].unique():

? ? indx = np.where((df2['name'] == name) & (df2['count'] == count))

? ? df2["name_" + str(count)].loc[indx] = np.abs(((count +1) % 2) - prob_1)

輸出:


name? ? count? ?name_0? name_1

0? ?a? ?1? ?0.000000? ? 0.500000

1? ?b? ?1? ?0.000000? ? 1.000000

2? ?c? ?0? ?0.333333? ? 0.000000

3? ?a? ?1? ?0.000000? ? 0.500000

4? ?c? ?1? ?0.000000? ? 0.666667

5? ?a? ?0? ?0.500000? ? 0.000000

6? ?b? ?1? ?0.000000? ? 1.000000

7? ?c? ?1? ?0.000000? ? 0.666667

8? ?a? ?0? ?0.500000? ? 0.000000

查看完整回答
反對 回復 2023-12-26
?
慕容3067478

TA貢獻1773條經驗 獲得超3個贊

以下代碼解決了該問題。但是,我找不到使用 numpy.select 獲得相同效果的方法。


df2["name"+str("_0")] = 0.0

df2["name"+str("_1")] = 0.0

for j in df2.name.unique():

    print(j)

    zero_ct = zero_one_frequencies[zero_one_frequencies['name'] == j][0]

    full_ct = zero_one_frequencies[zero_one_frequencies['name'] == j][0] + zero_one_frequencies[zero_one_frequencies['name'] == j][1]

    zero_pb = zero_ct / full_ct

    one_pb = 1 - zero_pb

    print(f"ZERO Probablitliy for {j} = {zero_pb.tolist()[0]}")

    print(f"One Probablitliy for {j} = {one_pb.tolist()[0]}")

    print("="*30)

    for idx in df2[df2['name']== j ].index:

        print("Index:::", idx)

        if df2['count'].iloc[idx] == 0:

            df2.at[idx, "name"+str("_0")] = zero_pb.tolist()[0]

            print(f'Count for {j} at index {idx} is {a}')

            print('printing name_0: ', df2["name"+str("_0")].iloc[idx])

            print("*"*30)

        elif df2['count'].iloc[idx] == 1:

            df2.at[idx, "name"+str("_1")] = one_pb.tolist()[0]

            print(f'Count for {j} at index {idx} is ')

            print('printing name_1: ', df2["name"+str("_1")].iloc[idx])

            print("*"*30)


查看完整回答
反對 回復 2023-12-26
  • 2 回答
  • 0 關注
  • 153 瀏覽
慕課專欄
更多

添加回答

舉報

0/150
提交
取消
微信客服

購課補貼
聯系客服咨詢優惠詳情

幫助反饋 APP下載

慕課網APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網微信公眾號