首頁猿問使用...

使用 SequenceMatcher 比較 pandas 中兩列中的字符串

Python

繁華開滿天機 2023-05-16 14:47:13

我正在嘗試確定熊貓數據框中兩列的相似性：Text1 AllPerformance results achieved by the approaches submitted to this Challenge. The six top approaches and three others outperform the strong baseline.Accuracy is one of the basic principles of perfectionist. Where am I?我想比較'Performance results ... 'with'The six...'和 ' Accuracy is one...'with 'Where am I?'。第一行應該有較高的兩列之間的相似度，因為它包含一些詞；第二個應該等于 0，因為兩列之間沒有共同的單詞。要比較我使用的兩列，SequenceMatcher如下所示：from difflib import SequenceMatcherratio = SequenceMatcher(None, df.Text1, df.All).ratio()但是 . 的使用似乎是錯誤的df.Text1, df.All。你能告訴我為什么嗎？

查看完整描述

1 回答

30秒到達戰場

TA貢獻1828條經驗獲得超6個贊

SequenceMatcher不是為熊貓系列設計的。
你可以.apply的功能。
SequenceMatcher例子
- 偶數空格isjunk=None不被認為是垃圾。
- Withisjunk=lambda y: y == " "將空格視為垃圾。

from difflib import SequenceMatcher

import pandas as pd

data = {'Text1': ['Performance results achieved by the approaches submitted to this Challenge.', 'Accuracy is one of the basic principles of perfectionist.'],

? ? ? ? 'All': ['The six top approaches and three others outperform the strong baseline.', 'Where am I?']}

df = pd.DataFrame(data)

# isjunk=lambda y: y == " "

df['ratio'] = df[['Text1', 'All']].apply(lambda x: SequenceMatcher(lambda y: y == " ", x[0], x[1]).ratio(), axis=1)

# display(df)

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Text1? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? All? ? ?ratio

0? Performance results achieved by the approaches submitted to this Challenge.? The six top approaches and three others outperform the strong baseline.? 0.356164

1? ? ? ? ? ? ? ? ? ? Accuracy is one of the basic principles of perfectionist.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Where am I?? 0.088235

# isjunk=None

df['ratio'] = df[['Text1', 'All']].apply(lambda x: SequenceMatcher(None, x[0], x[1]).ratio(), axis=1)

# display(df)

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Text1? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? All? ? ?ratio

0? Performance results achieved by the approaches submitted to this Challenge.? The six top approaches and three others outperform the strong baseline.? 0.410959

1? ? ? ? ? ? ? ? ? ? Accuracy is one of the basic principles of perfectionist.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Where am I?? 0.117647

反對回復 2023-05-16

1 回答
0 關注
255 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

使用 SequenceMatcher 比較 pandas 中兩列中的字符串

使用 SequenceMatcher 比較 pandas 中兩列中的字符串

1 回答

添加回答