首頁猿問計算數據框中的標簽頻率

計算數據框中的標簽頻率

Python

SMILET 2023-04-25 16:41:28

我正在嘗試計算數據框“文本”列中主題標簽詞的頻率。index text1 ello ello ello ello #hello #ello2 red green blue black #colours3 Season greetings #hello #goodbye 4 morning #goodMorning #hello5 my favourite animal #dogword_freq = df.text.str.split(expand=True).stack().value_counts()上面的代碼將對文本列中的所有字符串執行頻率計數，但我只是返回標簽頻率。例如，在我上面的數據框上運行代碼后，它應該返回#hello 3#goodbye 1#goodMorning 1#ello 1#colours 1#dog 1有沒有一種方法可以稍微重新調整我的 word_freq 代碼，以便它只計算標簽詞并按照我上面的方式返回它們？提前致謝。

查看完整描述

3 回答

慕妹3146593

TA貢獻1820條經驗獲得超9個贊

Series.str.findall在列上使用text查找所有主題標簽詞，然后使用Series.explode+?Series.value_counts：

counts?=?df['text'].str.findall(r'(#\w+)').explode().value_counts()

Series.str.split使用+的另一個想法DataFrame.stack：

s?=?df['text'].str.split(expand=True).stack()
counts?=?s[lambda?x:?x.str.startswith('#')].value_counts()

結果：

print(counts)

#hello? ? ? ? ? 3

#dog? ? ? ? ? ? 1

#colours? ? ? ? 1

#ello? ? ? ? ? ?1

#goodMorning? ? 1

#goodbye? ? ? ? 1

Name: text, dtype: int64

反對回復 2023-04-25

aluckdog

TA貢獻1847條經驗獲得超7個贊

使用它的一種方法是從結果中str.extractall刪除。#那么value_counts也

s = df['text'].str.extractall('(?<=#)(\w*)')[0].value_counts()

print(s)

hello? ? ? ? ? 3

colours? ? ? ? 1

goodbye? ? ? ? 1

ello? ? ? ? ? ?1

goodMorning? ? 1

dog? ? ? ? ? ? 1

Name: 0, dtype: int64

反對回復 2023-04-25

守候你守候我

TA貢獻1802條經驗獲得超10個贊

一個稍微詳細的解決方案，但這可以解決問題。

dictionary_count=data_100.TicketDescription.str.split(expand=True).stack().value_counts().to_dict()

dictionary_count={'accessgtgtjust': 1,

'sent': 1,

'investigate': 1,

'edit': 1,

'#prd': 1,

'getting': 1}

ert=[i for i in list(dictionary_count.keys()) if '#' in i]

ert

Out[238]: ['#prd']

unwanted = set(dictionary_count.keys()) - set(ert)

for unwanted_key in unwanted:

del dictionary_count[unwanted_key]

dictionary_count

Out[241]: {'#prd': 1}

反對回復 2023-04-25

3 回答
0 關注
147 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

計算數據框中的標簽頻率

計算數據框中的標簽頻率

3 回答

添加回答