首頁猿問在PythonUnicode字符串...

在PythonUnicode字符串中刪除重音的最佳方法是什么？

Python

九州編程 2019-06-06 14:56:32

在PythonUnicode字符串中刪除重音的最佳方法是什么？我在Python中有一個Unicode字符串，我想刪除所有的重音(Diacritics)。我在Web上發現了一種用Java實現這一目標的優雅方法：將Unicode字符串轉換為它的長規范化形式(字母和數字符號有一個單獨的字符)刪除Unicode類型為“diacritic”的所有字符。我需要安裝像pyICU這樣的庫嗎？或者僅僅用python標準庫就可以了嗎？那蟒蛇3呢？重要注意事項：我想避免代碼從重音字符到非重音字符之間的顯式映射。

查看完整描述

3 回答

Helenr

TA貢獻1780條經驗獲得超4個贊

統一碼這是正確的答案。它將任何Unicode字符串音譯為最接近的ascii文本表示形式。

例子：

accented_string = u'Málaga'# accented_string is of type 'unicode'import unidecode
unaccented_string = unidecode.unidecode(accented_string)# unaccented_string contains 'Malaga'and is of type 'str'

反對回復 2019-06-06

米琪卡哇伊

TA貢獻1998條經驗獲得超6個贊

這個怎么樣：

import unicodedatadef strip_accents(s):
   return ''.join(c for c in unicodedata.normalize('NFD', s)
                  if unicodedata.category(c) != 'Mn')

這也適用于希臘字母：

>>> strip_accents(u"A \u00c0 \u0394 \u038E")u'A A \u0394 \u03a5'>>>

這個字符范疇“Mn”代表Nonspacing_Mark，這類似于MiniQuark的答案中的合并(我沒有想到獨角獸數據，但它可能是更好的解決方案，因為它更明確)。

請記住，這些操作可能會顯著地改變文本的意義?？谝?、烏姆勞斯等不是“裝飾”。

反對回復 2019-06-06

慕仙森

TA貢獻1827條經驗獲得超8個贊

我剛在網上找到了這個答案：

import unicodedatadef remove_accents(input_str):
    nfkd_form = unicodedata.normalize('NFKD', input_str)
    only_ascii = nfkd_form.encode('ASCII', 'ignore')
    return only_ascii

它運行得很好(例如，法語)，但我認為第二步(刪除重音)可以比刪除非ASCII字符更好，因為對于某些語言(例如希臘語)來說，這將失敗。最好的解決方案可能是顯式刪除被標記為Diacritics的Unicode字符。

編輯：這起作用是：

import unicodedatadef remove_accents(input_str):
    nfkd_form = unicodedata.normalize('NFKD', input_str)
    return u"".join([c for c in nfkd_form if not unicodedata.combining(c)])

unicodedata.combining(c)如果字符為true，則返回true。c可以與前面的字符組合，這主要是如果它是一個對話框。

編輯2: remove_accents期望Unicode字符串，而不是字節字符串。如果有字節字符串，則必須將其解碼為如下所示的Unicode字符串：

encoding = "utf-8" # or iso-8859-15, or cp1252, or whatever encoding you usebyte_string = b"café"  
# or simply "café" before python 3.unicode_string = byte_string.decode(encoding)

反對回復 2019-06-06

3 回答
0 關注
972 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

在PythonUnicode字符串中刪除重音的最佳方法是什么？

在PythonUnicode字符串中刪除重音的最佳方法是什么？

3 回答

添加回答

在PythonUnicode字符串中刪除重音的最佳方法是什么？