1 回答

TA貢獻1825條經驗 獲得超6個贊
您的外觀與您發布的數據框略有不同:structure
> df
Subject Recipient Length Folder Message Date Edit
1 80 out NA 1/2/2020 1:00:01 AM TRUE
2 80 out NA 1/2/2020 1:00:05 AM TRUE
3 hey [email protected],[email protected] 80 out NA 1/2/2020 1:00:10 AM TRUE
4 hey [email protected],[email protected] 80 out NA 1/2/2020 1:00:15 AM TRUE
5 hey [email protected],[email protected] 80 out NA 1/2/2020 1:00:30 AM TRUE
6 NA NA NA
7 NA NA NA
8 hey [email protected],[email protected] 80 draft NA 1/2/2020 1:02:00 AM TRUE
9 hey [email protected],[email protected] 80 draft NA 1/2/2020 1:02:05 AM TRUE
10 NA NA NA
11 NA NA NA
12 hey [email protected],[email protected] 100 draft NA 1/2/2020 1:03:00 AM TRUE
13 hey [email protected],[email protected] 100 draft NA 1/2/2020 1:03:20 AM TRUE
此外,您所需的輸出表明您希望按其他類別拆分組,但這不是您的描述所說的,因此我沒有按 分組。不過,如果您愿意,這很容易改變。FolderFolder
您可以使用運行長度編碼來消除排序數據中相同連續值的組的歧義,但在 R 中,轉換為數據框列有點棘手。我用這個答案來實現這一點。rle
library(lubridate)
library(dplyr)
df %>%
mutate(Date = mdy_hms(Date),
Key = paste(Subject, Recipient, Length, sep = "_")) %>%
arrange(Date) %>%
filter(Folder == "out" | Folder == "draft" & Edit == TRUE) %>%
mutate(RLE = {RLE = rle(Key) ; rep(seq_along(RLE$lengths), RLE$lengths)}) %>%
group_by(RLE) %>%
summarize(Start = first(Date),
End = last(Date),
Duration = as.numeric(End) - as.numeric(Start))
這將從第 1:2 行、3:5+8:9 和 12:13 行創建組。這些組給出以下持續時間:
# A tibble: 3 x 4
RLE Start End Duration
<int> <dttm> <dttm> <dbl>
1 1 2020-01-02 01:00:01 2020-01-02 01:00:05 4
2 2 2020-01-02 01:00:10 2020-01-02 01:02:05 115
3 3 2020-01-02 01:03:00 2020-01-02 01:03:20 20
如果要包含在分組中,請將其添加到創建 中包含的內容中。這使得小組1:2,3:5,8:9和12:13。這樣做會得到這樣的結果:FolderKey
# A tibble: 4 x 4
RLE Start End Duration
<int> <dttm> <dttm> <dbl>
1 1 2020-01-02 01:00:01 2020-01-02 01:00:05 4
2 2 2020-01-02 01:00:10 2020-01-02 01:00:30 20
3 3 2020-01-02 01:02:00 2020-01-02 01:02:05 5
4 4 2020-01-02 01:03:00 2020-01-02 01:03:20 20
添加回答
舉報