4 回答

TA貢獻1862條經驗 獲得超6個贊
get_dummies()
很好,但sklearn's
?MultiLabelBinarizer
有更好的性能:
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
a = mlb.fit_transform(df["Hobbies"])
df_expanded = pd.DataFrame(a, columns=mlb.classes_, index=df.index)
# merge them using the following:
df_merged = df.merge(df_expanded, left_index=True, right_index=True)
print(df_merged)
index? ?Name? ? Hobbies? ? ? ? ? ? ? ? ? ? ? ? ?Play_PS4? ? Play_hockey Read? ? Sleep? ?Watch_NBA
0? ? ? ?Paul? ? [Watch_NBA, Play_PS4]? ? ? ? ? ?1? ? ? ? ? ?0? ? ? ? ? ?0? ? ? ?0? ? ? ?1
1? ? ? ?Jeff? ? [Play_hockey, Read, Play_PS4]? ?1? ? ? ? ? ?1? ? ? ? ? ?1? ? ? ?0? ? ? ?0
2? ? ? ?Kyle? ? [Sleep, Watch_NBA]? ? ? ? ? ? ? 0? ? ? ? ? ?0? ? ? ? ? ?0? ? ? ?1? ? ? ?1

TA貢獻1852條經驗 獲得超7個贊
In [86]: df
Out[86]:
Name Hobbies
0 Paul [NBA, PS4]
1 Jeff [Hockey, Read, PS4]
2 Kyle [Sleep, NBA]
In [87]: df['dummy'] = 1
In [88]: df.explode("Hobbies").pivot(index='Name', columns='Hobbies', values='dummy').fillna(value=0)
Out[88]:
Hobbies Hockey NBA PS4 Read Sleep
Name
Jeff 1.0 0.0 1.0 1.0 0.0
Kyle 0.0 1.0 0.0 0.0 1.0
Paul 0.0 1.0 1.0 0.0 0.0

TA貢獻1828條經驗 獲得超3個贊
你想要get_dummies()
方法。
對于你的例子:
names = df.Name
df = pd.get_dummies(df.Hobbies.apply(pd.Series).stack()).sum(level=0)
df.insert(0, 'Name', names)
#output:
? ?Name? Play_PS4? Play_hockey? Read? Sleep? Watch_NBA
0? Paul? ? ? ? ?1? ? ? ? ? ? 0? ? ?0? ? ? 0? ? ? ? ? 1
1? Jeff? ? ? ? ?1? ? ? ? ? ? 1? ? ?1? ? ? 0? ? ? ? ? 0
2? Kyle? ? ? ? ?0? ? ? ? ? ? 0? ? ?0? ? ? 1? ? ? ? ? 1

TA貢獻1808條經驗 獲得超4個贊
你可以試試這個:
n = df['Name']
df = df['Hobbies'].apply(lambda x: pd.Series([1] * len(x), index=x)).fillna(0, downcast='infer')
df.insert(0, 'Name', n)
print(df)
輸出:
Name Watch_NBA Play_PS4 Play_hockey Read Sleep
0 Paul 1 1 0 0 0
1 Jeff 0 1 1 1 0
2 Kyle 1 0 0 0 1
添加回答
舉報