已解決430363個問題，去搜搜看，總會有你想問的

如何獲取 Spark DataFrame 中每行列表中最大值的索引？

首頁猿問如何獲取 Spark...

如何獲取 Spark DataFrame 中每行列表中最大值的索引？

Python

開心每一天1111 2022-07-12 16:27:42

我已經完成了 LDA 主題建模并將其存儲在lda_model.轉換我的原始輸入數據集后，我檢索了一個 DataFrame。其中一列是 topicDistribution，其中該行屬于 LDA 模型中每個主題的概率。因此，我想獲取每行列表中最大值的索引。df -- | 'list_of_words' | 'index ' | 'topicDistribution' | ['product','...'] 0 [0.08,0.2,0.4,0.0001] ..... ... ........我想轉換 df 以便添加一個附加列，它是每行 topicDistribution 列表的 argmax。df_transformed -- | 'list_of_words' | 'index' | 'topicDistribution' | 'topicID' | ['product','...'] 0 [0.08,0.2,0.4,0.0001] 2 ...... .... ..... ....我該怎么做？

查看完整描述

1 回答

qq_花開花謝_0

TA貢獻1835條經驗獲得超7個贊

您可以創建一個用戶定義的函數來獲取最大值的索引

from pyspark.sql import functions as f

from pyspark.sql.types import IntegerType

max_index = f.udf(lambda x: x.index(max(x)), IntegerType())

df = df.withColumn("topicID", max_index("topicDistribution"))

例子

>>> from pyspark.sql import functions as f

>>> from pyspark.sql.types import IntegerType

>>> df = spark.createDataFrame([{"topicDistribution": [0.2, 0.3, 0.5]}])

>>> df.show()

+-----------------+

|topicDistribution|

+-----------------+

| [0.2, 0.3, 0.5]|

+-----------------+

>>> max_index = f.udf(lambda x: x.index(max(x)), IntegerType())

>>> df.withColumn("topicID", max_index("topicDistribution")).show()

+-----------------+-------+

|topicDistribution|topicID|

+-----------------+-------+

| [0.2, 0.3, 0.5]| 2|

+-----------------+-------+

編輯：

由于您提到其中的列表topicDistribution是 numpy 數組，因此您可以更新max_index udf如下：

max_index = f.udf(lambda x: x.tolist().index(max(x)), IntegerType())

反對回復 2022-07-12

1 回答
0 關注
348 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

如何獲取 Spark DataFrame 中每行列表中最大值的索引？

如何獲取 Spark DataFrame 中每行列表中最大值的索引？

1 回答

添加回答

如何獲取 Spark DataFrame 中每行列表中最大值的索引？