亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

為了賬號安全,請及時綁定郵箱和手機立即綁定
已解決430363個問題,去搜搜看,總會有你想問的

對多個數據幀和返回語句進行計算的更好方法?

對多個數據幀和返回語句進行計算的更好方法?

叮當貓咪 2023-06-27 17:34:06
我的函數查看 3 個數據幀,在不同日期之間進行過濾,并創建一個語句。正如您所看到的,該函數一遍又一遍地重復使用相同的步驟,我想減少它們。我相信使用 afor-loop會有所幫助,但我不確定如何return像現在這樣在一小段中做出陳述def stat_generator(df,date1,date2,df2,date3,date4,df4,date5,date6):     ##First Date Filter for First Dataframe, and calculations for first dataframe        df['Announcement Date'] = pd.to_datetime(df['Announcement Date'])    mask = ((df['Announcement Date'] >= date1) & (df['Announcement Date'] <= date2))    df_new = df.loc[mask]    total = len(df_new)    better = df_new[(df_new['performance'] == 'better')]    better_perc = round(((len(better)/total)*100),2)    worse = df_new[(df_new['performance'] == 'worse')]    worse_perc = round(((len(worse)/total)*100),2)    statement1 = "During the time period between {} and {}, {} % of the students performed better. {} %     of the students performed worse" .format(date1,date2,better_perc,worse_perc)        ##Second Date Filter for Second Dataframe, and calculations for second dataframe        df2['Announcement Date'] = pd.to_datetime(df2['Announcement Date'])    mask2 = ((df2['Announcement Date'] >= date3) & (df2['Announcement Date'] <= date4))    df_new2 = df2.loc[mask2]    total2 = len(df_new2)    better2 = df_new2[(df_new2['performance'] == 'better')]    better_perc2 = round(((len(better2)/total2)*100),2)    worse2 = df_new2[(df_new2['performance'] == 'worse')]    worse_perc2 = round(((len(worse2)/total2)*100),2)    statement2 = "During the time period between {} and {}, {} % of the students performed better. {} %     of the students performed worse" .format(date3,date4,better_perc2,worse_perc2)        ##Third Date Filter for Third Dataframe, and calculations for third dataframe   
查看完整描述

2 回答

?
www說

TA貢獻1775條經驗 獲得超8個贊

我只需將 3 個參數傳遞給您的函數,即 df、date1 和 date2,然后調用您的函數 3 次。


def stat_generator(df,date1,date2):

    "..."

    return statement

然后將您的數據作為列表列表或類似的內容傳遞。例如:


data = [[df,date1,date2],[df2,date3,date4],[df4,date5,date6]]


for lists in data:

    stat_generator(*lists)


查看完整回答
反對 回復 2023-06-27
?
尚方寶劍之說

TA貢獻1788條經驗 獲得超4個贊

維持現有形式

  • df將中的參數更改stat_generatordf1,因此df可以在 中使用for-loop。

  • 將每個數據幀的數據分組在一起

  • 創建一個statements列表,待返回

  • date1anddate2改為d1andd2在循環中

  • 更新statement1為使用更易于閱讀的f-string.

  • 我認為這些更新需要對整體代碼進行最少的更改。

  • 可選:

    • 更改maskmask = df['Announcement Date'].between(d1, d2, inclusive=True)

def stat_generator(df1, date1 ,date2 ,df2 ,date3 ,date4 ,df4 ,date5 ,date6):?

? ? ##First Date Filter for First Dataframe, and calculations for first dataframe

? ??

? ? # create groups

? ? groups = [(df1, date1, date2), (df2, date3, date4), (df3, date5, date6)]

? ??

? ? # create a statements list for each statement

? ? statements = list()

? ??

? ? # iterate through each group

? ? for (df, d1, d2) in groups:

? ??

? ? ? ? df['Announcement Date'] = pd.to_datetime(df['Announcement Date'])

? ? ? ? mask = ((df['Announcement Date'] >= d1) & (df['Announcement Date'] <= d2))

? ? ? ? df_new = df.loc[mask]

? ? ? ? total = len(df_new)

? ? ? ? better = df_new[(df_new['performance'] == 'better')]

? ? ? ? better_perc = round(((len(better)/total)*100),2)

? ? ? ? worse = df_new[(df_new['performance'] == 'worse')]

? ? ? ? worse_perc = round(((len(worse)/total)*100),2)

? ? ? ? statement1 = f"During the time period between {d1} and {d2}, {better_perc}% of the students performed better. {worse_perc}%? of the students performed worse"

? ? ? ??

? ? ? ? # append the statement of the dataframe

? ? ? ? statements.append(statement1)


? ? # return a list of all the statements? ??

? ? return statements

完全重寫

  • 該函數最好只做一件事,即提取并返回數據。

  • 負責將多個數據幀傳遞到函數外部的函數,并將結果收集在一個list或多個數據print幀中。

  • better為和創建新的數據框效率不高worse。

    • 使用.value_counts()withnormalize=True來獲取百分比。

def stat_generator(df: pd.DataFrame, d1: str, d2: str) -> str:?

? ? ? ? ? ?

? ? df['Announcement Date'] = pd.to_datetime(df['Announcement Date'])


? ? # create the mask

? ? mask = df['Announcement Date'].between(d1, d2, inclusive=True)


? ? # apply the mask

? ? df_new = df.loc[mask]


? ? # calculate the percentage

? ? per = (df_new.performance.value_counts(normalize=True) * 100).round(2)


? ? return f"During the time period between {d1} and {d2}, {per['better']}% of the students performed better. {per['worse']}%? of the students performed worse"



groups = [(df1, date1, date2), (df2, date3, date4), (df3, date5, date6)]


statements = list()

for group in groups:

? ? statements.append(stat_generator(*group))


查看完整回答
反對 回復 2023-06-27
  • 2 回答
  • 0 關注
  • 176 瀏覽
慕課專欄
更多

添加回答

舉報

0/150
提交
取消
微信客服

購課補貼
聯系客服咨詢優惠詳情

幫助反饋 APP下載

慕課網APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網微信公眾號