首頁猿問我們是否有任何功能可以在 R 或...

我們是否有任何功能可以在 R 或 Python 中過濾數據

Python

胡子哥哥 2023-01-04 13:33:51

我是 R 的新手，我無法弄清楚如何根據需要過濾數據下面是數據（326 行和 6 列）數據集這是一個小例子：Author,Commenid,Parentid,Submissionid Score StanceUser1 , 333c , 222b , 111b , 10 , Positive User2 , 444c , 333c , 5hdc , 15 , NeutralUser3 , 222b , 555d , 23er , 20 , NegativeUser4 , 555d , 666f , 111b , 11 , Positive這里user1的意思是，他已經回復了user2 user3 had replied to user1 user4 had replied to user3我想過濾為具有相同 commentid 和 parentid 的用戶，對于上面的示例，我們將過濾為數據Author Score Stance Reply Score StanceUser2 15 Neutral User1 10 Positive User1 10 Positive User3 20 Negative User3 20 Negative User4 11 Positive我嘗試了很多但我無法弄清楚，任何人都可以幫助我如何準確地做到這一點（R 或 Python）。

查看完整描述

2 回答

慕慕森

TA貢獻1856條經驗獲得超17個贊

這是一個基本的 R 答案。

第一match列Commenid與Parentid. 創建一個數據集，其中Author列和Reply作者的列之前匹配。保留所有沒有NA值的行，并將 ( merge) 與原始數據連接起來以獲得其他列。

i <- with(df1, match(Commenid, Parentid))

res <- data.frame(Author = df1$Author, Reply = df1$Author[i])

res <- res[complete.cases(res), ]

merge(res, df1)

# Author Reply Commenid Parentid Submissionid

#1 User1 User2 333c 222b 111b

#2 User3 User1 222b 555d 23er

#3 User4 User3 555d 666f 111b

一種dplyr解決方案可能是

library(dplyr)

df1 %>%

mutate(i = match(Commenid, Parentid),

Reply = Author[i]) %>%

filter(!is.na(i)) %>%

select(Author, Reply, everything(vars = -i))

數據

df1 <- read.csv(text = "

Author,Commenid,Parentid,Submissionid

User1 , 333c , 222b , 111b

User2 , 444c , 333c , 5hdc

User3 , 222b , 555d , 23er

User4 , 555d , 666f , 111b

df1[] <- lapply(df1, trimws)

編輯

有了評論中描述的新數據和問題，這里有一個dplyr解決方案。在與上面基本相同之后，它將結果與原始數據集連接起來并對列重新排序。

library(dplyr)

df2 %>%

mutate(i = match(Commenid, Parentid),

Reply = Author[i]) %>%

filter(!is.na(i)) %>%

select(-i) %>%

select(Author, Score, Stance, Reply, everything()) %>%

left_join(df2 %>% select(Author, Score, Stance), by = c("Reply" = "Author")) %>%

select(-matches("id$"), everything(), matches("id$"))

新數據

df2 <- read.csv(text = "

Author,Commenid,Parentid,Submissionid, Score, Stance

User1 , 333c , 222b , 111b , 10 , Positive

User2 , 444c , 333c , 5hdc , 15 , Neutral

User3 , 222b , 555d , 23er , 20 , Negative

User4 , 555d , 666f , 111b , 11 , Positive

names(df1) <- trimws(names(df1))

df1[] <- lapply(df1, trimws)

反對回復 2023-01-04

慕俠2389804

TA貢獻1719條經驗獲得超6個贊

您可以將每個用戶與其他用戶進行比較，如果commentid相等parentid則您可以打印它，下面是您如何在 Python 中執行此操作：

for u1 in dataset :

for u2 in dataset :

if u1['parentid'] == u2['commentid'] :

print( u1['Author'],' had comment of ',u2['Author'] )

反對回復 2023-01-04

2 回答
0 關注
121 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

我們是否有任何功能可以在 R 或 Python 中過濾數據

我們是否有任何功能可以在 R 或 Python 中過濾數據

2 回答

添加回答