5 回答

TA貢獻1831條經驗 獲得超10個贊
使用aggregate:
aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum)
Category x
1 First 30
2 Second 5
3 Third 34
在上面的示例中,可以在中指定多個維度list。可以通過cbind以下方式合并相同數據類型的多個聚合度量標準:
aggregate(cbind(x$Frequency, x$Metric2, x$Metric3) ...
(嵌入@thelatemail評論),aggregate也有一個公式界面
aggregate(Frequency ~ Category, x, sum)
或者,如果要聚合多個列,可以使用.表示法(也適用于一列)
aggregate(. ~ Category, x, sum)
或者tapply:
tapply(x$Frequency, x$Category, FUN=sum)
First Second Third
30 5 34
使用此數據:
x <- data.frame(Category=factor(c("First", "First", "First", "Second",
"Third", "Third", "Second")),
Frequency=c(10,15,5,2,14,20,3))

TA貢獻1780條經驗 獲得超1個贊
最近,您還可以使用dplyr包來實現此目的:
library(dplyr)
x %>%
group_by(Category) %>%
summarise(Frequency = sum(Frequency))
#Source: local data frame [3 x 2]
#
# Category Frequency
#1 First 30
#2 Second 5
#3 Third 34
或者,對于多個匯總列(也適用于一列):
x %>%
group_by(Category) %>%
summarise_each(funs(sum))
更新dplyr> = 0.5: summarise_each已取代summarise_all,summarise_at和summarise_if家族的功能dplyr。
或者,如果您有多個要分組的列,則可以group_by使用逗號分隔所有這些列:
mtcars %>%
group_by(cyl, gear) %>% # multiple group columns
summarise(max_hp = max(hp), mean_mpg = mean(mpg)) # multiple summary columns
有關更多信息,包括%>%
運算符,請參閱dplyr簡介。

TA貢獻1887條經驗 獲得超5個贊
rcs提供的答案很簡單。但是,如果您正在處理更大的數據集并需要提高性能,那么可以采用更快的替代方案:
library(data.table)
data = data.table(Category=c("First","First","First","Second","Third", "Third", "Second"),
Frequency=c(10,15,5,2,14,20,3))
data[, sum(Frequency), by = Category]
# Category V1
# 1: First 30
# 2: Second 5
# 3: Third 34
system.time(data[, sum(Frequency), by = Category] )
# user system elapsed
# 0.008 0.001 0.009
讓我們使用data.frame和上面的內容將它與同一個東西進行比較:
data = data.frame(Category=c("First","First","First","Second","Third", "Third", "Second"),
Frequency=c(10,15,5,2,14,20,3))
system.time(aggregate(data$Frequency, by=list(Category=data$Category), FUN=sum))
# user system elapsed
# 0.008 0.000 0.015
如果你想保留列,這就是語法:
data[,list(Frequency=sum(Frequency)),by=Category]
# Category Frequency
# 1: First 30
# 2: Second 5
# 3: Third 34
對于較大的數據集,差異將變得更加明顯,如下面的代碼所示:
data = data.table(Category=rep(c("First", "Second", "Third"), 100000),
Frequency=rnorm(100000))
system.time( data[,sum(Frequency),by=Category] )
# user system elapsed
# 0.055 0.004 0.059
data = data.frame(Category=rep(c("First", "Second", "Third"), 100000),
Frequency=rnorm(100000))
system.time( aggregate(data$Frequency, by=list(Category=data$Category), FUN=sum) )
# user system elapsed
# 0.287 0.010 0.296
對于多個聚合,您可以組合lapply并按.SD如下方式進行組合
data[, lapply(.SD, sum), by = Category]
# Category Frequency
# 1: First 30
# 2: Second 5
# 3: Third 34

TA貢獻1828條經驗 獲得超4個贊
幾年后,只是為了添加另一個簡單的基礎R解決方案,由于某種原因,這里不存在 - xtabs
xtabs(Frequency ~ Category, df)
# Category
# First Second Third
# 30 5 34
或者如果你想data.frame回來
as.data.frame(xtabs(Frequency ~ Category, df))
# Category Freq
# 1 First 30
# 2 Second 5
# 3 Third 34
- 5 回答
- 0 關注
- 934 瀏覽
添加回答
舉報