已解決430363個問題，去搜搜看，總會有你想問的

Pyspark：在 groupBy 之后獲得百分比結果

首頁猿問 Pyspark：在 groupBy...

Pyspark：在 groupBy 之后獲得百分比結果

Python

喵喔喔 2021-05-30 09:57:12

例如，這是我的測試數據test = spark.createDataFrame([ (0, 1, 5, "2018-06-03", "Region A"), (1, 1, 2, "2018-06-04", "Region B"), (2, 2, 1, "2018-06-03", "Region B"), (3, 3, 1, "2018-06-01", "Region A"), (3, 1, 3, "2018-06-05", "Region A"),])\ .toDF("orderid", "customerid", "price", "transactiondate", "location")test.show()我可以得到這樣的匯總數據test.groupBy("customerid", "location").agg(sum("price")).show()在此處輸入圖片說明但我也想要百分比數據，像這樣+----------+--------+----------+ |customerid|location|sum(price)| percentage+----------+--------+----------+ | 1|Region B| 2| 20%| 1|Region A| 8| 80%| 3|Region A| 1| 100%| 2|Region B| 1| 100%+----------+--------+----------+我想知道我該怎么做？也許使用窗口功能？我可以將數據透視表變成這樣嗎？（帶有百分比和總和列）

查看完整描述

2 回答

天涯盡頭無女友

TA貢獻1831條經驗獲得超9個贊

這回答了問題的原始版本。

在 SQL 中，您可以使用窗口函數：

select customerid, location, sum(price),

(sum(price) / sum(sum(price)) over (partition by customerid) as ratio

from t

group by customerid, location;

反對回復 2021-06-01

慕雪6442864

TA貢獻1812條經驗獲得超5個贊

這是解決您問題的干凈代碼：

from pyspark.sql import functions as F

from pyspark.sql.window import Window

(test.groupby("customerid", "location")

.agg(F.sum("price").alias("t_price"))

.withColumn("perc", F.col("t_price") / F.sum("t_price").over(Window.partitionBy("customerid")))

反對回復 2021-06-01

2 回答
0 關注
444 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

Pyspark：在 groupBy 之后獲得百分比結果

Pyspark：在 groupBy 之后獲得百分比結果

2 回答

添加回答