問題定義將每一行分成句子。假設以下字符分隔句子:句點 ('.')、問號 ('?') 和感嘆號 ('!')。這些定界符也應該從返回的句子中省略。刪除每個句子中的任何前導或尾隨空格。如果在上述之后,一個句子是空白的(空字符串,''),則應該省略該句子。返回句子列表。句子的順序必須與它們在文件中出現的順序相同。這是我當前的代碼import redef get_sentences(doc): assert isinstance(doc, list) result = [] for line in doc: result.extend( [sentence.strip() for sentence in re.split(r'\.|\?|\!', line) if sentence] ) return result# Demo:get_sentences(demo_input)輸入demo_input = [" This is a phrase; this, too, is a phrase. But this is another sentence.", "Hark!", " ", "Come what may <-- save those spaces, but not these --> ", "What did you say?Split into 3 (even without a space)? Okie dokie."]期望的輸出["This is a phrase; this, too, is a phrase", "But this is another sentence", "Hark", "Come what may <-- save those spaces, but not these -->", "What did you say", "Split into 3 (even without a space)", "Okie dokie"]但是,我的代碼產生了這個:['This is a phrase; this, too, is a phrase', 'But this is another sentence', 'Hark', '', 'Come what may <-- save those spaces, but not these -->', 'What did you say', 'Split into 3 (even without a space)', 'Okie dokie']問題:為什么''即使我的代碼忽略了它,我也會在其中得到那個空句子?我可以使用以下代碼解決問題,但我將不得不再次瀏覽列表,我不想這樣做。我想在同一個過程中做到這一點。import redef get_sentences(doc): assert isinstance(doc, list) result = [] for line in doc: result.extend([sentence.strip() for sentence in re.split(r'\.|\?|\!', line)]) result = [s for s in result if s] return result# Demo:get_sentences(demo_input)
1 回答

HUX布斯
TA貢獻1876條經驗 獲得超6個贊
嘗試使用if sentence.strip()
,即:
for line in doc: result.extend([sentence.strip() for sentence in re.split(r'\.|\?|\!', line) if sentence.strip()])
添加回答
舉報
0/150
提交
取消