首頁猿問基于兩個條件對數據集進行子集化，將...

基于兩個條件對數據集進行子集化，將每個數據幀保存到 .csv 文件中，迭代每個文件并繪制圖形

Python

www說 2024-01-16 09:51:34

我是數據科學新手，需要幫助執行以下操作：region(I) 在我的例子中，根據列中的唯一組和另一個組分割數據集country(II) 我想將每個數據幀保存為 .csv 文件 - 像這樣regionname_country.csv，例如west_GER.csv，east_POL.csv(III) 如果可能的話，我想迭代每個 .csv 文件以繪制每個 dffor loop 的散點圖。education vs age(IV) 最后將我的繪圖/圖形保存在 pdf 文件中（每頁 4 個圖形）'df' Region, country, Age, Education, Income, FICO, Target1 west, GER, 43, 1, 47510, 710, 12 east, POL, 32, 2, 73640, 723, 13 east, POL, 22, 2, 88525, 610, 04 west, GER, 55, 0, 31008, 592, 05 north, USA, 19, 0, 18007, 599, 16 south, PER, 27, 2, 68850, 690, 07 south, BRZ, 56, 3, 71065, 592, 08 north, USA, 39, 1, 98004, 729, 19 east, JPN, 36, 2, 51361, 692, 010 west, ESP, 59, 1, 98643, 729, 1期望的結果： # df_to_csv : 'west_GER.csv'west, GER, 43, 1, 47510, 710, 1 west, GER, 55, 0, 31008, 592, 0 # west_ESP.csvwest, ESP, 59, 1, 98643, 729, 1 # east_POL.csveast, POL, 32, 2, 73640, 723, 1 ...# north_USA.csvnorth, USA, 39, 1, 98004, 729, 1 north, USA, 19, 0, 18007, 599, 1

查看完整描述

2 回答

呼如林

TA貢獻1798條經驗獲得超3個贊

對于 Python：（

一）和（二）：

for i in df.groupby(["Region", "country"])[["Region", "country"]].apply(lambda x: list(np.unique(x))):

df.groupby(["Region", "country"]).get_group((i[1], i[0])).to_csv(f"{i[1]}_{i[0]}.csv")

（三）、（四）：

import glob

import matplotlib.pyplot as plt

fig, axs = plt.subplots(nrows=2, ncols=2)

for ax, file in zip(axs.flatten(), glob.glob("./*csv")):

df_temp = pd.read_csv(file)

region_temp = df_temp['Region'][0]

country_temp = df_temp['country'][0]

ax.scatter(df_temp["Age"], df_temp["Education"])

ax.set_title(f"Region:{region_temp}, Country:{country_temp}")

ax.set_xlabel("Age")

ax.set_ylabel("Education")

plt.tight_layout()

fig.savefig("scatter.pdf")

反對回復 2024-01-16

慕俠2389804

TA貢獻1719條經驗獲得超6個贊

在 R 中，您可以這樣做：

library(tidyverse)

#get data in list of dataframes

df %>%

select(Region, country, Education, Age) %>%

group_split(Region, country) -> split_data

#From list of data create list of plots.

list_plots <- map(split_data, ~ggplot(.) + aes(Education, Age) +

geom_point() +

ggtitle(sprintf('Plot for region %s and country %s',

first(.$Region), first(.$country))))

#Write the plots in pdf as well as write the csvs.

pdf("plots.pdf", onefile = TRUE)

for (i in seq_along(list_plots)) {

write.csv(split_data, sprintf('%s_%s.csv',

split_data[[i]]$Region[1], split_data[[i]]$country[1]), row.names = FALSE)

print(list_plots[[i]])

}

dev.off()

反對回復 2024-01-16

2 回答
0 關注
213 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

基于兩個條件對數據集進行子集化，將每個數據幀保存到 .csv 文件中，迭代每個文件并繪制圖形

基于兩個條件對數據集進行子集化，將每個數據幀保存到 .csv 文件中，迭代每個文件并繪制圖形

2 回答

添加回答

基于兩個條件對數據集進行子集化，將每個數據幀保存到 .csv 文件中，迭代每個文件并繪制圖形

基于兩個條件對數據集進行子集化，將每個數據幀保存到 .csv 文件中，迭代每個文件并繪制圖形