1 回答

TA貢獻1864條經驗 獲得超6個贊
使用 F.expr() 可以進行類連接。在您的情況下,您需要將它與內部聯接一起使用。嘗試這個,
#%%
import pyspark.sql.functions as F
test1 =sqlContext.createDataFrame([("Mike","apple,greenbeans,redwine,the little prince 70th anniversary gift set (book/cd/downloadable audio)" ),("kate","Whitewine,greenbeans,pineapple"),("Ben","Water,Spaghetti")],schema=["name","groceries"])
test2 = sqlContext.createDataFrame([("001","redwine"),("002","greenbeans"),("003","cd")],schema=["id","item"])
#%%
test_join =test1.join(test2,F.expr("""groceries rlike item"""),how='inner')
結果:
test_join.show(truncate=False)
+----+-------------------------------------------------------------------------------------------------+---+----------+
|name|groceries |id |item |
+----+-------------------------------------------------------------------------------------------------+---+----------+
|Mike|apple,greenbeans,redwine,the little prince 70th anniversary gift set (book/cd/downloadable audio)|001|redwine |
|Mike|apple,greenbeans,redwine,the little prince 70th anniversary gift set (book/cd/downloadable audio)|002|greenbeans|
|Mike|apple,greenbeans,redwine,the little prince 70th anniversary gift set (book/cd/downloadable audio)|003|cd |
|kate|Whitewine,greenbeans,pineapple |002|greenbeans|
+----+-------------------------------------------------------------------------------------------------+---+----------+
對于您的復雜數據集,contains() 函數必須有效
import pyspark.sql.functions as F
test1 = spark.createDataFrame([("Mike","apple, oranges, red wine,green beans"),("Kate","Whitewine, green beans waterrr, pineapple, red wine"), ("Leah", "red wine, juice, rice, grapes, green beans"),("Ben","Water,Spaghetti, the little prince 70th anniversary gift set (book/cd/downloadable audio)")],schema=["name","groceries"])
test2 = spark.createDataFrame([("001","red wine"),("002","green beans waterrr"), ("003", "the little prince 70th anniversary gift set (book/cd/downloadable audio)")],schema=["id","item"])
#%%
test_join =test1.join(test2,F.col('groceries').contains(F.col('item')),how='inner')
結果:
+----+-----------------------------------------------------------------------------------------+---+------------------------------------------------------------------------+
|name|groceries |id |item |
+----+-----------------------------------------------------------------------------------------+---+------------------------------------------------------------------------+
|Mike|apple, oranges, red wine,green beans |001|red wine |
|Kate|Whitewine, green beans waterrr, pineapple, red wine |001|red wine |
|Kate|Whitewine, green beans waterrr, pineapple, red wine |002|green beans waterrr |
|Leah|red wine, juice, rice, grapes, green beans |001|red wine |
|Ben |Water,Spaghetti, the little prince 70th anniversary gift set (book/cd/downloadable audio)|003|the little prince 70th anniversary gift set (book/cd/downloadable audio)|
+----+-----------------------------------------------------------------------------------------+---+------------------------------------------------------------------------+
添加回答
舉報