2 回答

TA貢獻1815條經驗 獲得超10個贊
我們Series.str.extractall與模式一起使用numbers - space - letter。然后我們檢查有哪些匹配項to_compare,最后我們使用GroupBy.sum得到有多少匹配項
matches = df['Col'].str.extractall('(\d+\s\w+)')
df['matches'] = matches[0].isin(to_compare).groupby(level=0).sum()
Col matches
0 Halve the clementine and place into the cavity... 2.0
1 Add the stock, then bring to the boil and redu... 1.0
2 2 heaped teaspoons Chinese five-spice 0.0
3 100 ml Marsala 1.0
4 1 litre organic chicken stock 0.0
此外,matches返回:
0
match
0 0 1 hour
1 20 minutes
1 0 15 minutes
2 0 2 heaped
3 0 100 ml
4 0 1 litre
要將它們放入列表中,請使用:
matches.groupby(level=0).agg(list)
0
0 [1 hour, 20 minutes]
1 [15 minutes]
2 [2 heaped]
3 [100 ml]
4 [1 litre]

TA貢獻1806條經驗 獲得超8個贊
您可以使用正則表達式構建可以提取數字和后續單詞的模式,然后將此功能應用于數據框的整個列
import pandas as pd
import re
df = pd.DataFrame({'text':["Halve the clementine and place into the cavity along with the bay leaves. Transfer the duck to a medium roasting tray and roast for around 1 hour 20 minutes.",
"Add the stock, then bring to the boil and reduce to a simmer for around 15 minutes.",
"2 heaped teaspoons Chinese five-spice",
"100 ml Marsala",
"1 litre organic chicken stock"]})
def extract_qty(txt):
return re.findall('\d+ \w+',txt)
df['extracted_qty'] = df['text'].apply(extract_qty)
df
# text extracted_qty
#0 Halve the clementine and place into the cavity... [1 hour, 20 minutes]
#1 Add the stock, then bring to the boil and redu... [15 minutes]
#2 2 heaped teaspoons Chinese five-spice [2 heaped]
#3 100 ml Marsala [100 ml]
#4 1 litre organic chicken stock [1 litre]
to_compare使用列表理解提取常見值:
to_compare= ["1 hour", "20 litres", "100 ml", "2", "15 minutes", "20 minutes"]
df['common'] = df['extracted_qty'].apply(lambda x: [el for el in x if el in to_compare])
# text extracted_qty common
#0 Halve the clementine ... [1 hour, 20 minutes] [1 hour, 20 minutes]
#1 Add the stock, then ... [15 minutes] [15 minutes]
#2 2 heaped teaspoons ... [2 heaped] []
#3 100 ml Marsala [100 ml] [100 ml]
#4 1 litre organic chicken... [1 litre] []
添加回答
舉報