2 回答

TA貢獻1856條經驗 獲得超17個贊
這是一個基本的 R 答案。
第一match列Commenid與Parentid. 創建一個數據集,其中Author列和Reply作者的列之前匹配。保留所有沒有NA值的行,并將 ( merge) 與原始數據連接起來以獲得其他列。
i <- with(df1, match(Commenid, Parentid))
res <- data.frame(Author = df1$Author, Reply = df1$Author[i])
res <- res[complete.cases(res), ]
merge(res, df1)
# Author Reply Commenid Parentid Submissionid
#1 User1 User2 333c 222b 111b
#2 User3 User1 222b 555d 23er
#3 User4 User3 555d 666f 111b
一種dplyr解決方案可能是
library(dplyr)
df1 %>%
mutate(i = match(Commenid, Parentid),
Reply = Author[i]) %>%
filter(!is.na(i)) %>%
select(Author, Reply, everything(vars = -i))
數據
df1 <- read.csv(text = "
Author,Commenid,Parentid,Submissionid
User1 , 333c , 222b , 111b
User2 , 444c , 333c , 5hdc
User3 , 222b , 555d , 23er
User4 , 555d , 666f , 111b
")
df1[] <- lapply(df1, trimws)
編輯
有了評論中描述的新數據和問題,這里有一個dplyr解決方案。在與上面基本相同之后,它將結果與原始數據集連接起來并對列重新排序。
library(dplyr)
df2 %>%
mutate(i = match(Commenid, Parentid),
Reply = Author[i]) %>%
filter(!is.na(i)) %>%
select(-i) %>%
select(Author, Score, Stance, Reply, everything()) %>%
left_join(df2 %>% select(Author, Score, Stance), by = c("Reply" = "Author")) %>%
select(-matches("id$"), everything(), matches("id$"))
新數據
df2 <- read.csv(text = "
Author,Commenid,Parentid,Submissionid, Score, Stance
User1 , 333c , 222b , 111b , 10 , Positive
User2 , 444c , 333c , 5hdc , 15 , Neutral
User3 , 222b , 555d , 23er , 20 , Negative
User4 , 555d , 666f , 111b , 11 , Positive
")
names(df1) <- trimws(names(df1))
df1[] <- lapply(df1, trimws)

TA貢獻1719條經驗 獲得超6個贊
您可以將每個用戶與其他用戶進行比較,如果commentid相等parentid則您可以打印它,下面是您如何在 Python 中執行此操作:
for u1 in dataset :
for u2 in dataset :
if u1['parentid'] == u2['commentid'] :
print( u1['Author'],' had comment of ',u2['Author'] )
添加回答
舉報