已解決430363個問題，去搜搜看，總會有你想問的

Python（Pyspark）嵌套列表reduceByKey，Python列表追加創建嵌套列表

首頁猿問 Python（Pyspark）嵌套...

Python（Pyspark）嵌套列表reduceByKey，Python列表追加創建嵌套列表

Python

慕哥9229398 2021-09-11 19:12:10

我有一個格式如下的 RDD 輸入：[('2002', ['cougar', 1]),('2002', ['the', 10]),('2002', ['network', 4]),('2002', ['is', 1]),('2002', ['database', 13])]“2002”是關鍵。所以，我的鍵值對如下： ('year', ['word', count])Count 是整數，我想用 reduceByKey 得到以下結果：[('2002, [['cougar', 1], ['the', 10], ['network', 4], ['is', 1], ['database', 13]]')]我很難得到上面的巢列表。主要問題是獲取嵌套列表。例如，我有三個列表 a、b 和 ca = ['cougar', 1]b = ['the', 10]c = ['network', 4]a.append(b)將返回一個 ['cougar', 1, ['the', 10]]和x = []x.append(a)x.append(b)將返回 x 作為 [['cougar', 1], ['the', 10]]然而，如果那時 c.append(x)將返回 c 作為 ['network', 4, [['cougar', 1], ['the', 10]]]以上所有操作都沒有得到我想要的結果。我想得到 [('2002', [[word1, c1],[word2, c2], [word3, c3], ...]), ('2003'[[w1, count1],[w2, count2], [w3, count3], ...])]即嵌套列表應該是： [a, b, c]其中 a, b, c 本身是包含兩個元素的列表。我希望問題很清楚，有什么建議嗎？

查看完整描述

2 回答

牛魔王的故事

TA貢獻1830條經驗獲得超3個贊

對于這個問題，不需要使用 ReduceByKey。

定義 RDD

rdd = sc.parallelize([('2002', ['cougar', 1]),('2002', ['the', 10]),('2002', ['network', 4]),('2002', ['is', 1]),('2002', ['database', 13])])

查看 RDD 值 rdd.collect()：

[('2002', ['cougar', 1]), ('2002', ['the', 10]), ('2002', ['network', 4]), ('2002', ['is', 1]), ('2002', ['database', 13])]

應用 groupByKey 函數并將值映射為列表，如您在Apache Spark 文檔中所見。

rdd_nested = rdd.groupByKey().mapValues(list)

請參閱 RDD 分組值 rdd_nested.collect()：

[('2002', [['cougar', 1], ['the', 10], ['network', 4], ['is', 1], ['database', 13]])]

反對回復 2021-09-11

POPMUISE

TA貢獻1765條經驗獲得超5個贊

我提出了一種解決方案：

def wagg(a,b):

if type(a[0]) == list:

if type(b[0]) == list:

a.extend(b)

else:

a.append(b)

w = a

elif type(b[0]) == list:

if type(a[0]) == list:

b.extend(a)

else:

b.append(a)

w = b

else:

w = []

w.append(a)

w.append(b)

return w

rdd2 = rdd1.reduceByKey(lambda a,b: wagg(a,b))

有沒有人有更好的解決方案？

反對回復 2021-09-11

2 回答
0 關注
284 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

Python（Pyspark）嵌套列表reduceByKey，Python列表追加創建嵌套列表

Python（Pyspark）嵌套列表reduceByKey，Python列表追加創建嵌套列表

2 回答

添加回答

Python（Pyspark）嵌套列表reduceByKey，Python列表追加創建嵌套列表

Python（Pyspark）嵌套列表reduceByKey，Python列表追加創建嵌套列表