2 回答

TA貢獻1775條經驗 獲得超8個贊
我只需將 3 個參數傳遞給您的函數,即 df、date1 和 date2,然后調用您的函數 3 次。
def stat_generator(df,date1,date2):
"..."
return statement
然后將您的數據作為列表列表或類似的內容傳遞。例如:
data = [[df,date1,date2],[df2,date3,date4],[df4,date5,date6]]
for lists in data:
stat_generator(*lists)

TA貢獻1788條經驗 獲得超4個贊
維持現有形式
df
將中的參數更改stat_generator
為df1
,因此df
可以在 中使用for-loop
。將每個數據幀的數據分組在一起
創建一個
statements
列表,待返回date1
anddate2
改為d1
andd2
在循環中更新
statement1
為使用更易于閱讀的f-string
.我認為這些更新需要對整體代碼進行最少的更改。
可選:
更改
mask
為mask = df['Announcement Date'].between(d1, d2, inclusive=True)
def stat_generator(df1, date1 ,date2 ,df2 ,date3 ,date4 ,df4 ,date5 ,date6):?
? ? ##First Date Filter for First Dataframe, and calculations for first dataframe
? ??
? ? # create groups
? ? groups = [(df1, date1, date2), (df2, date3, date4), (df3, date5, date6)]
? ??
? ? # create a statements list for each statement
? ? statements = list()
? ??
? ? # iterate through each group
? ? for (df, d1, d2) in groups:
? ??
? ? ? ? df['Announcement Date'] = pd.to_datetime(df['Announcement Date'])
? ? ? ? mask = ((df['Announcement Date'] >= d1) & (df['Announcement Date'] <= d2))
? ? ? ? df_new = df.loc[mask]
? ? ? ? total = len(df_new)
? ? ? ? better = df_new[(df_new['performance'] == 'better')]
? ? ? ? better_perc = round(((len(better)/total)*100),2)
? ? ? ? worse = df_new[(df_new['performance'] == 'worse')]
? ? ? ? worse_perc = round(((len(worse)/total)*100),2)
? ? ? ? statement1 = f"During the time period between {d1} and {d2}, {better_perc}% of the students performed better. {worse_perc}%? of the students performed worse"
? ? ? ??
? ? ? ? # append the statement of the dataframe
? ? ? ? statements.append(statement1)
? ? # return a list of all the statements? ??
? ? return statements
完全重寫
該函數最好只做一件事,即提取并返回數據。
負責將多個數據幀傳遞到函數外部的函數,并將結果收集在一個
list
或多個數據print
幀中。better
為和創建新的數據框效率不高worse
。使用
.value_counts()
withnormalize=True
來獲取百分比。
def stat_generator(df: pd.DataFrame, d1: str, d2: str) -> str:?
? ? ? ? ? ?
? ? df['Announcement Date'] = pd.to_datetime(df['Announcement Date'])
? ? # create the mask
? ? mask = df['Announcement Date'].between(d1, d2, inclusive=True)
? ? # apply the mask
? ? df_new = df.loc[mask]
? ? # calculate the percentage
? ? per = (df_new.performance.value_counts(normalize=True) * 100).round(2)
? ? return f"During the time period between {d1} and {d2}, {per['better']}% of the students performed better. {per['worse']}%? of the students performed worse"
groups = [(df1, date1, date2), (df2, date3, date4), (df3, date5, date6)]
statements = list()
for group in groups:
? ? statements.append(stat_generator(*group))
添加回答
舉報