1 回答

TA貢獻1836條經驗 獲得超5個贊
這里的解決方案將根據每個玩家的級別返回交集。這還使用了defaultdict,因為這對于這種情況非常方便。我將解釋內聯代碼
from itertools import combinations
import pandas
from collections import defaultdict
from pprint import pprint # only needed for pretty printing of dictionary
df = pandas.read_csv('df.csv', sep='\s+') # assuming the data frame is in a file df.csv
# group by account_id to get subframes which only refer to one account.
data_agg2 = df.groupby(['account_id'])
# a defaultdict is a dictionary, where when no key is present, the function defined
# is used to create the element. This eliminates the check, if a key is
# already present or to set all combinations in advance.
games_played_2 = defaultdict(int)
# iterate over all accounts
for el in data_agg2.groups:
# extract the sub-dataframe from the gouped function
tmp = data_agg2.get_group(el)
# print(tmp) # you can uncomment this to see each account
# This is in principle the same loop as suggested before. However, as not every
# player has played all variants, one only has to create the number of combinations
# necessary for that player
for i in range(len(tmp.loc[:, 'no_of_games'])):
# As now the game_variant is a column and not the index, the first part of zip
# is slightly adapted. This loops over all combinations of variants for the
# current account.
for comb, combsum in zip(combinations(tmp.loc[:, 'game_variant'], i+1), combinations(tmp.loc[:, 'no_of_games'].values, i+1)):
# Here, each variant combination gets a unique key. Comb is sorted, as the
# variants might be not in alphabetic order. The number of games played for
# each variant for that player are added to the value of all players before.
games_played_2['_'.join(sorted(comb))] += sum(combsum)
pprint (games_played_2)
# returns
>> defaultdict(<class 'int'>,
{'a': 5,
'a_b': 6,
'a_b_c': 7,
'a_c': 3,
'b': 9,
'b_c': 11,
'c': 4})
由于您已經提取了它們的變體所玩的游戲數量,因此您可以簡單地將它們相加。如果您想自動執行此操作,則可以itertools.combinations在循環中使用它,該循環會迭代所有可能的組合長度:
from itertools import combinations
import pandas
import numpy as np
from pprint import pprint # only needed for pretty printing of dictionary
df = pandas.read_csv('df.csv', sep='\s+') # assuming the data frame is in a file df.csv
data_agg = df.groupby(['game_variant']).agg({'no_of_games':[np.sum]})
games_played = {}
for i in range(len(data_agg.loc[:, 'no_of_games'])):
for comb, combsum in zip(combinations(data_agg.index, i+1), combinations(data_agg.loc[:, 'no_of_games'].values, i+1)):
games_played['_'.join(comb)] = sum(combsum)
pprint(games_played)
返回:
>> {'a': array([5], dtype=int64),
>> 'a_b': array([14], dtype=int64),
>> 'a_b_c': array([18], dtype=int64),
>> 'a_c': array([9], dtype=int64),
>> 'b': array([9], dtype=int64),
>> 'b_c': array([13], dtype=int64),
>> 'c': array([4], dtype=int64)}
'combinations(sequence, number)'number返回中所有元素組合的迭代器sequence。因此,要獲得所有可能的組合,您必須迭代所有numbersfrom1到len(sequence。這就是第一個 for 循環的作用。
下一個for循環由兩個迭代器組成:一個迭代器覆蓋聚合數據的索引 ( combinations(data_agg.index, i+1)),一個迭代器覆蓋每個變體中實際玩的游戲數量 ( combinations(data_agg.loc[:, 'no_of_games'].values, i+1))。因此comb應該始終是變體列表,并匯總每個變體所玩游戲數量的列表。這里請注意,要獲取所有值,您必須使用.loc[:, 'no_games'],而不是.loc['no_games'],因為后者搜索名為 的索引'no_games',而它是列名。
然后,我將字典的鍵設置為變體列表的組合字符串,并將值設置為玩過的游戲數量的元素之和。
添加回答
舉報