已解決430363個問題，去搜搜看，總會有你想問的

PySpark - 將列分解為行并根據邏輯設置值

首頁猿問 PySpark -...

PySpark - 將列分解為行并根據邏輯設置值

Python

MMTTMM 2022-09-13 17:42:14

給定一個數據幀：+---+-----------+---------+-------+------------+| id| score|tx_amount|isValid| greeting|+---+-----------+---------+-------+------------+| 1| 0.2| 23.78| true| hello_world|| 2| 0.6| 12.41| false|byebye_world|+---+-----------+---------+-------+------------+我想將這些列分解為名為“col_value”的行。這部分很好，但我也想將邏輯應用于每一行，以便我得到如下結果：+---+------------+--------+---------+----------+-------+| id| col_value|is_score|is_amount|is_boolean|is_text|+---+------------+--------+---------+----------+-------+| 1| 0.2| Y| N| N| N|| 1| 23.78| N| Y| N| N|| 1| true| N| N| Y| N|| 1| hello_world| N| N| N| Y|| 2| 0.6| Y| N| N| N|| 2| 12.41| N| Y| N| N|| 2| false| N| N| Y| N|| 2|byebye_world| N| N| N| Y|+---+------------+--------+---------+----------+-------+到目前為止，我有什么：.withColumn("cols", F.explode(F.arrays_zip(F.array("score", "tx_amount", "isValid", "greeting")))) \ .select("id", F.col("cols.*")) \ .withColumnRenamed("0", "col_value") \ .withColumn("is_score", F.lit("Y") if col1_type == "score" else F.lit("N")) \ .withColumn("is_amount", F.lit("Y") if col2_type == "amount" else F.lit("N")) \ .withColumn("is_boolean", F.lit("Y") if col3_type == "boolean" else F.lit("N")) \ .withColumn("is_text", F.lit("Y") if col4_type == "text" else F.lit("N")) \ .show()如何在爆炸后執行此操作以獲得正確的結果？

查看完整描述

1 回答

偶然的你

TA貢獻1841條經驗獲得超3個贊

我認為你想要的可以通過在你的應用程序上來實現，以確定它是否是.只要不超過 1.0，并且始終高于 1.0，下面的代碼就可以工作。如果不是這種情況，請告訴我我將更新邏輯。regexcol_valuetext,boolean,amount or scorescoreamount

from pyspark.sql import functions as F

df.withColumn("cols", F.explode(F.arrays_zip(F.array("score", "tx_amount", "isValid", "greeting")))) \

.select("id", F.col("cols.*")) \

.withColumnRenamed("0", "col_value")\

.withColumn("text", (F.regexp_extract(F.col("col_value"),"([A-Za-z]+)",1)))\

.withColumn("boolean", F.when((F.col("text")=='true')|(F.col("text")=='false'),F.col("text")).otherwise(F.lit("")))\

.withColumn("text", F.when(F.col("text")==F.col("boolean"), F.lit("")).otherwise(F.col("text")))\

.withColumn("numeric", F.regexp_extract(F.col("col_value"),"([0-9]+)",1))\

.withColumn("is_text", F.when(F.col("text")!="", F.lit("Y")).otherwise(F.lit("N")))\

.withColumn("is_score", F.when(F.col("numeric")<=1, F.lit("Y")).otherwise(F.lit("N")))\

.withColumn("is_amount", F.when(F.col("numeric")>1, F.lit("Y")).otherwise(F.lit("N")))\

.withColumn("is_boolean", F.when(F.col("boolean")!="", F.lit("Y")).otherwise(F.lit("N")))\

.select("id", "col_value","is_score","is_amount","is_boolean","is_text").show()

+---+------------+--------+---------+----------+-------+

+---+------------+--------+---------+----------+-------+

| 1| 0.2| Y| N| N| N|

| 1| 23.78| N| Y| N| N|

| 1| true| N| N| Y| N|

| 1| hello_world| N| N| N| Y|

| 2| 0.6| Y| N| N| N|

| 2| 12.41| N| Y| N| N|

| 2| false| N| N| Y| N|

| 2|byebye_world| N| N| N| Y|

+---+------------+--------+---------+----------+-------+

反對回復 2022-09-13

1 回答
0 關注
121 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

PySpark - 將列分解為行并根據邏輯設置值

PySpark - 將列分解為行并根據邏輯設置值

1 回答

添加回答