首頁猿問如何比較數據框中同一列的數據

如何比較數據框中同一列的數據

Python

喵喵時光機 2023-09-26 16:52:40

我有一個 Panda 的數據框，如下所示：我想獲得 2007 年 PIB 低于 2002 年的國家，但我無法僅使用 Pandas 內置方法而不使用 python 迭代或類似的東西來編寫代碼來做到這一點。我得到的最多的是以下幾行：df[df[df.year == 2007].PIB < df[df.year == 2002].PIB].country但我收到以下錯誤：ValueError: Can only compare identically-labeled Series objects到目前為止，我只使用 Pandas 來過濾不同列中的數據，但我不知道如何比較同一列中的數據，在本例中是年份。

查看完整描述

4 回答

隔江千里

TA貢獻1906條經驗獲得超10個贊

我的策略是使用數據透視表。假設沒有兩行具有相同的（“國家/地區”，“年份”）對。在此假設下，aggfunc=np.sum代表唯一的單一PIB值。

table = pd.pivot_table(df, values='PIB', index=['country'],
                    columns=['year'], aggfunc=np.sum)[[2002,2007]]
                    list(table[table[2002] > table[2007]].index)

數據透視表看起來像這樣：

反對回復 2023-09-26

白豬掌柜的

TA貢獻1893條經驗獲得超10個贊

我建議Series按country列創建索引，但必須在具有相同索引值的系列中2007和2002比較系列中使用相同數量的國家：

df = pd.DataFrame({'country': ['Afganistan', 'Zimbabwe', 'Afganistan', 'Zimbabwe'],

? ? ? ? ? ? ? ? ? 'PIB': [200, 200, 100, 300],?

? ? ? ? ? ? ? ? ? 'year': [2002, 2002, 2007, 2007]})

print (df)

? ? ? country? PIB? year

0? Afganistan? 200? 2002

1? ? Zimbabwe? 200? 2002

2? Afganistan? 100? 2007

3? ? Zimbabwe? 300? 2007

df = df.set_index('country')

print (df)

? ? ? ? ? ? PIB? year

country? ? ? ? ? ? ??

Afganistan? 200? 2002

Zimbabwe? ? 200? 2002

Afganistan? 100? 2007

Zimbabwe? ? 300? 2007

df1 = df.pivot('country','year','PIB')

print (df1)

year? ? ? ? 2002? 2007

country? ? ? ? ? ? ? ?

Afganistan? ?200? ?100

Zimbabwe? ? ?200? ?300

countries = df1.index[df1[2007] < df1[2002]]

print (countries)

Index(['Afganistan'], dtype='object', name='country')

s1 = df.loc[df.year == 2007, 'PIB']?

s2 = df.loc[df.year == 2002, 'PIB']

print (s1)

country

Afganistan? ? 100

Zimbabwe? ? ? 300

Name: PIB, dtype: int64

print (s2)

country

Afganistan? ? 200

Zimbabwe? ? ? 200

Name: PIB, dtype: int64

countries = s1.index[s1 < s2]

print (countries)

Index(['Afganistan'], dtype='object', name='country')

另一個想法是首先按年份進行旋轉DataFrame.pivot，然后按年份選擇列并與中的索引進行比較boolean indexing：

df1 = df.pivot('country','year','PIB')

print (df1)

year? ? ? ? 2002? 2007

country? ? ? ? ? ? ? ?

Afganistan? ?200? ?100

Zimbabwe? ? ?200? ?300

countries = df1.index[df1[2007] < df1[2002]]

print (countries)

Index(['Afganistan'], dtype='object', name='country')

反對回復 2023-09-26

尚方寶劍之說

TA貢獻1788條經驗獲得超4個贊

這是我的數據框：

df = pd.DataFrame([

{"country": "a", "PIB": 2, "year": 2002},

{"country": "b", "PIB": 2, "year": 2002},

{"country": "a", "PIB": 1, "year": 2007},

{"country": "b", "PIB": 3, "year": 2007},

])

如果我過濾 2002 年和 2007 年這兩年，我得到了。

df_2002 = df[df["year"] == 2007]

out :

country PIB year

0 a 2 2002

1 b 2 2002

df_2007 = df[df["year"] == 2007]

out :

country PIB year

2 a 1 2007

3 b 3 2007

您想要比較每個國家/地區 PIB 的演變。

Pandas 不知道這一點，它嘗試比較值，但這里基于相同的索引。Witch不是你想要的，也是不可能的，因為索引不一樣。

所以你只需要使用set_index()

df.set_index("country", inplace=True)

df_2002 = df[df["year"] == 2007]

out :

PIB year

country

a 1 2007

b 3 2007

df_2007 = df[df["year"] == 2007]

out :

PIB year

country

a 2 2002

b 2 2002

現在你可以進行比較

df_2002.PIB > df_2007.PIB

out:

country

a True

b False

Name: PIB, dtype: bool

# to get the list of countries

(df_2002.PIB > df_2007.PIB)[res == True].index.values.tolist()

out :

['a']

反對回復 2023-09-26

繁華開滿天機

TA貢獻1816條經驗獲得超4個贊

試試這個（考慮到您只需要這些國家/地區的列表）：

[i for i in df.country if df[(df.country==i) & (df.year==2007)].PIB.iloc[0] < df[(df.country==i) & (df.year==2002)].PIB.iloc[0]]

反對回復 2023-09-26

4 回答
0 關注
184 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

如何比較數據框中同一列的數據

如何比較數據框中同一列的數據

4 回答

添加回答