已解決430363個問題，去搜搜看，總會有你想問的

如何通過Twitter API使用python格式化推文？

首頁猿問如何通過Twitter...

Python

幕布斯6054654 2021-03-19 14:15:10

我通過Twitter API收集了一些推文。然后我數了split(' ')在python中使用的單詞。但是，有些單詞如下所示：correct! correct.,correctblah"...那么，如何格式化不帶標點符號的推文呢？或者，也許我應該嘗試另一種split推文方式？謝謝。

查看完整描述

3 回答

斯蒂芬大帝

TA貢獻1827條經驗獲得超8個贊

您可以使用re.split...

from string import punctuation

import re

puncrx = re.compile(r'[{}\s]'.format(re.escape(punctuation)))

print filter(None, puncrx.split(your_tweet))

或者，只查找包含某些連續字符的單詞：

print re.findall(re.findall('[\w#@]+', s), your_tweet)

例如：

print re.findall(r'[\w@#]+', 'talking about #python with @someone is so much fun! Is there a 140 char limit? So not cool!')

# ['talking', 'about', '#python', 'with', '@someone', 'is', 'so', 'much', 'fun', 'Is', 'there', 'a', '140', 'char', 'limit', 'So', 'not', 'cool']

我最初在示例中確實有一個笑臉，但是當然這些最終都被這種方法過濾掉了，因此需要警惕。

反對回復 2021-03-23

江戶川亂折騰

TA貢獻1851條經驗獲得超5個贊

我建議使用以下代碼從特殊符號中清除文本：

tweet_object["text"] = re.sub(u'[!?@#$.,#:\u2026]', '', tweet_object["text"])

您需要先導入re，然后再使用function sub

import re

反對回復 2021-03-23

關注

舉報

0/150

提交

取消