2 回答

TA貢獻1863條經驗 獲得超2個贊
您可以使用 groupby 來完成,如下所示
df1_inputPredictedFeature_column = pd.DataFrame([['0', '0', '2000'], ['0', '8', '2000'], ['0', '16', '2200'], ['0', '23', '2200'], ['0', '30', '2200'], ['1', '0', '2100'], ['1', '5', '2100'], ['1', '7', '2100']], columns=('Document_ID', 'OFFSET', 'PredictedFeature'))
df1_predictedFeature_column = pd.DataFrame([['0', '0', '2000'], ['0', '8', '2100'], ['0', '16', '2100'], ['0', '23', '2100'], ['0', '30', '2200'], ['1', '0', '2000'], ['1', '5', '2000'], ['1', '7', '2100']], columns=('Document_ID', 'OFFSET', 'PredictedFeature'))
df1_inputPredictedFeature_column['new'] = (df1_inputPredictedFeature_column['PredictedFeature'] == df1_predictedFeature_column['PredictedFeature']).astype(np.int)
result = df1_inputPredictedFeature_column.groupby("PredictedFeature").agg({"PredictedFeature":"count", "new":np.sum})
result.columns = ["inputCsvOccured", "outputcsvmatched"]
result.index.name = "predictedFeatureClass"
result.reset_index(inplace=True)
print(result)
結果
predictedFeatureClass inputCsvOccured outputcsvmatched
0 2000 2 1
1 2100 3 1
2 2200 3 1

TA貢獻1877條經驗 獲得超6個贊
一個想法是通過元組列表將new列轉換為整數,Series.view然后通過元組列表聚合列new以指定新列名:sizesum
df1['new'] = (df1['PredictedFeature'] == df2['PredictedFeature']).view('i1')
df = (df1.groupby("PredictedFeature")['new']
.agg([('inputCsvOccured','size'), ('outputcsvmatched','sum')])
.reset_index())
print (df)
PredictedFeature inputCsvOccured outputcsvmatched
0 2000 2 1
1 2100 3 1
2 2200 3 1
熊貓 0.25+ 解決方案:
df1['new'] = (df1['PredictedFeature'] == df2['PredictedFeature']).view('i1')
df = (df1.groupby("PredictedFeature")
.agg(inputCsvOccured=pd.NamedAgg(column='new', aggfunc='size'),
outputcsvmatched=pd.NamedAgg(column='new', aggfunc='sum'))
.reset_index())
添加回答
舉報