4 回答

TA貢獻1862條經驗 獲得超7個贊
這是一種使用collections.defaultdict.
前任:
from collections import defaultdict
a = [['user1', 'referral'], ['user2', 'referral'], ['user1', 'referral'], ['user1', 'affiliate'], ['user7', 'affiliate'], ['user1', 'affiliate'], ['user9', 'affiliate'], ['user4', 'cpc'], ['user4', 'referral'], ['user2', 'referral'], ['user7', 'affiliate'], ['user14', 'cpc'], ['user3', 'orgainic'], ['user2', 'orgainic'], ['user4', 'cpc'], ['user2', 'cpc'], ['user8', 'cpc'], ['user2', 'orgainic']]
result = defaultdict(int)
seen = set()
for k, v in a:
key = "{}_{}".format(k, v)
if key not in seen:
result[v] += 1
seen.add(key)
print(list(map(list, result.items())))
輸出:
[['referral', 3], ['affiliate', 3], ['cpc', 4], ['orgainic', 2]]

TA貢獻1155條經驗 獲得超0個贊
首先讓我們使條目獨一無二:
c = {tuple(sublist) for sublist in a}
現在我們有了一對獨特的用戶和類型。
對于我們不需要用戶的計數,因此讓我們將其設為僅包含第二個參數的列表:
c = [elem[1] for elem in c]
現在我們可以很容易地計算它:
from collections import Counter
c = Counter(c)
結果:Counter({'cpc': 4, 'affiliate': 3, 'referral': 3, 'orgainic': 2})
現在把它們放在一起:
from collections import Counter
c = Counter(elem[1] for elem in {tuple(sublist) for sublist in a})

TA貢獻1797條經驗 獲得超4個贊
defaultdict和基于循環的解決方案
這可以使用defaultdict:
d = defaultdict(set)
for user, category in a:
d[category].add(user)
res = [[category, len(users)] for category, users in d.items()]
輸出:
# [['affiliate', 3], ['cpc', 4], ['orgainic', 2], ['referral', 3]]
groupby基于解決方案
或者,這可以使用groupbyfrom來完成itertools:
from itertools import groupby
from operator import itemgetter
a = [['user1', 'referral'], ['user2', 'referral'], ['user1', 'referral'], ...]
# Sort the items according to the category so groupby will collect the pairs accordingly
res = {category: len({user for user, _ in pairs}) for category, pairs in
groupby(sorted(a, key=itemgetter(1)), key=itemgetter(1))}
res = [list(pair) for pair in res.items()]
輸出:
# [['affiliate', 3], ['cpc', 4], ['orgainic', 2], ['referral', 3]]

TA貢獻1934條經驗 獲得超2個贊
這聽起來像是熊貓的案例,您的列表已經是正確的形狀:
import pandas as pd
a = [['user1', 'referral'], ['user2', 'referral'], ['user1', 'referral'], ['user1', 'affiliate'], ['user7', 'affiliate'], ['user1', 'affiliate'], ['user9', 'affiliate'], ['user4', 'cpc'], ['user4', 'referral'], ['user2', 'referral'], ['user7', 'affiliate'], ['user14', 'cpc'], ['user3', 'orgainic'], ['user2', 'orgainic'], ['user4', 'cpc'], ['user2', 'cpc'], ['user8', 'cpc'], ['user2', 'orgainic']]
df = pd.DataFrame(a)
df.columns=["user", "type"]
unique_per_type = df.groupby("type")["user"].unique()
現在 unique_per_type 是:
type
affiliate [user1, user7, user9]
cpc [user4, user14, user2, user8]
orgainic [user3, user2]
referral [user1, user2, user4]
Name: user, dtype: object
您可以執行以下操作:
# access length by key
len(unique_per_type["affiliate"])
# or use it like a dict
for key, val in unique_per_type.items():
print(key, len(val)))
這個解決方案添加了 pandas,這是一個巨大的依賴。但是,一旦您將數據放入 DataFrame 中,您就可以用它做很多事情:
df["user"].unique() # shows all unique users
df.query("user=='user1'") # shows all observations involving user1
添加回答
舉報