例如,當我使用unicode_string = u"Austro\u002dHungarian_gulden"unicode_string.encode("ascii", "ignore")然后它將給出以下輸出:'Austro-Hungarian_gulden'但是我正在使用一個txt文件,其中包含一組數據,如下所示:Austria\u002dHungary Austro\u002dHungarian_guldenCocos_\u0028Keeling\u0029_Islands Australian_dollarEl_Salvador Col\u00f3n_\u0028currency\u0029Faroe_Islands Faroese_kr\u00f3naGeorgia_\u0028country\u0029 Georgian_lari而且,我必須使用Python中的正則表達式來處理這些數據,因此我創建了如下腳本,但是該腳本無法用字符串中的適當字符替換Unicode值。同樣地'\u002d' has appropriate character '-''\u0028' has appropriate character '(''\u0029' has appropriate character ')'用于處理文本文件的腳本:import reimport collectionsdef extract(): filename = raw_input("Enter file Name:") in_file = file(filename,"r") out_file = file("Attribute.txt","w+") for line in in_file: values = line.split("\t") if values[1]: str1 = "" for list in values[1]: list = re.sub("[^\Da-z0-9A-Z()]","",list) list = list.replace('_',' ') out_file.write(list) str1 += list out_file.write(" ") if values[2]: str2 = "" for list in values[2]: list = re.sub("[^\Da-z0-9A-Z\n]"," ",list) list = list.replace('"','') list = list.replace('_',' ') out_file.write(list) str2 += list s1 = str1.lstrip() s1 = str1.rstrip() s2 = str2.lstrip() s2 = str2.rstrip() print s1+s2給定數據的預期輸出為:Austria-Hungary Austro-Hungarian guldenCocos (Keeling) Islands Australian dollarEl Salvador Coln (currency)FaroeIslands Faroese krnaGeorgia (country) Georgian lari我該怎么做?
1 回答

有只小跳蛙
TA貢獻1824條經驗 獲得超8個贊
使用將輸入轉換為Unicode decode("unicode_escape"),然后encode()將輸出轉換為您選擇的編碼。
>>> r"Austro\u002dHungarian_gulden".decode("unicode_escape")
u'Austro-Hungarian_gulden'
添加回答
舉報
0/150
提交
取消