如何獲得每組 X 次以上相同單詞的平均值?但在這里,我想連續獲得每組(group =?name)相同單詞超過 4 次的平均值。例子:id | name | sentences---------------------1? |? aa? | david hi david david david2? |? aa? | david david is at home3? |? bb? | I'm king4? |? cc? | where r u going5? |? dd? | lol lol lol lol lol lol6? |? ee? | abc abc cc abc abc abc abc cc7? |? ee? | dd dd dd ee dd dd dd我想得到以下結果:name | avg----------aa? ?|? 0.0? ?(0 sentence contain the words 'david' continuously 4 times in ). total instances of 'aa' group is 2bb? ?|? 0.0? ?(0 sentence contains same word continuously 4 times)?cc? ?|? 0.0? ?(0 sentence contains same word continuously 4 times)dd? ?|? 1.0? ?(1 sentence contains same word 'lol' continuously 4 times). total instances of 'dd' group is 1ee? ?|? 0.5? ?(1 sentence contains same word 'abc' continuously 4 times). total instances of 'dd' group is 2I'm using python 3.6.8
1 回答

汪汪一只貓
TA貢獻1898條經驗 獲得超8個贊
您可以4
使用以下方法對連續出現的單詞或連續多次進行計數Series.str.count
,然后使用Series.groupby
對系列cnt
進行分組name
并使用聚合mean
來獲得分組平均值。
cnt = df['sentences'].str.count(r'(\w+)(\s\1){3,}')
avg = cnt.groupby(df['name']).mean().reset_index(name='avg')
細節:
print(cnt)
0? ? 0
1? ? 0
2? ? 0
3? ? 0
4? ? 1
5? ? 1
6? ? 0
Name: sentences, dtype: int64
print(avg)
? name? avg
0? ?aa? 0.0
1? ?bb? 0.0
2? ?cc? 0.0
3? ?dd? 1.0
4? ?ee? 0.5
添加回答
舉報
0/150
提交
取消