首頁猿問 For循環返回DataFrame中...

For循環返回DataFrame中的唯一值

Python

皈依舞 2022-01-18 16:45:47

我正在研究一個初學者的 ML 代碼，為了計算一列中唯一樣本的數量，作者使用了以下代碼：def unique_vals(rows, col): """Find the unique values for a column in a dataset.""" return set([row[col] for row in rows])但是，我正在使用 DataFrame，對我來說，此代碼返回單個字母：'m'、'l' 等。我嘗試將其更改為：set(row[row[col] for row in rows)但隨后它返回：KeyError: "None of [Index(['Apple', 'Banana', 'Grape' dtype='object', length=2318)] are in the [columns]"謝謝你的時間！

查看完整描述

2 回答

收到一只叮咚

TA貢獻1821條經驗獲得超5個贊

一般來說，你不需要自己做這些事情，因為pandas已經為你做了。

在這種情況下，您需要的是unique方法，您可以Series直接在 a 上調用該方法（pd.Series除其他外，它是表示列的抽象），并返回一個numpy包含該中唯一值的數組Series。

如果您想要多個列的唯一值，您可以執行以下操作：

which_columns = ... # specify the columns whose unique values you want here

uniques = {col: df[col].unique() for col in which_columns}

反對回復 2022-01-18

一只甜甜圈

TA貢獻1836條經驗獲得超5個贊

如果您正在處理分類列，那么以下代碼非常有用

它不僅會打印唯一值，還會打印每個唯一值的計數

col = ['col1', 'col2', 'col3'...., 'coln']

#Print frequency of categories

for col in categorical_columns:

print ('\nFrequency of Categories for varible %s'%col)

print (bd1[col].value_counts())

例子：

pets location owner

0 cat San_Diego Champ

1 dog New_York Ron

2 cat New_York Brick

3 monkey San_Diego Champ

4 dog San_Diego Veronica

5 dog New_York Ron

categorical_columns = ['pets','owner','location']

#Print frequency of categories

for col in categorical_columns:

print ('\nFrequency of Categories for varible %s'%col)

print (df[col].value_counts())

輸出：

# Frequency of Categories for varible pets

# dog 3

# cat 2

# monkey 1

# Name: pets, dtype: int64

# Frequency of Categories for varible owner

# Champ 2

# Ron 2

# Brick 1

# Veronica 1

# Name: owner, dtype: int64

# Frequency of Categories for varible location

# New_York 3

# San_Diego 3

# Name: location, dtype: int64

反對回復 2022-01-18

2 回答
0 關注
190 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

For循環返回DataFrame中的唯一值

For循環返回DataFrame中的唯一值

2 回答

添加回答