首頁猿問在 Python...

在 Python 中使用正則表達式在文本后提取字符串

Python

海綿寶寶撒 2023-06-13 15:37:07

我有一個文檔文件，它具有以下結構：This is a fairy tale written by John Doe and Mary Smith Auckland,somewhere This story is awesome我想提取兩行文本，它們是： John Doe and Mary Smith Auckland,somewhere并使用正則表達式將這些值附加到列表中。我要提取的兩行總是在This is a fairy tale和所寫的行之間This story is awesome。我怎樣才能做到這一點？我嘗試了一些與的組合before_keyword,keyword,after_keyword=text.partition(regex)，但一點運氣都沒有。

查看完整描述

4 回答

慕斯709654

TA貢獻1840條經驗獲得超5個贊

re.DOTALL您可以使用正則表達式來.匹配任何字符，包括換行符。一旦在兩個分隔符之間有了文本，就可以使用另一個不帶的正則表達式來re.DOTALL提取至少包含一個非空白字符 ( \S) 的行。

import re

lst = []

with open('input.txt') as f:

text = f.read()

match = re.search('This is a fairy tale written by(.*?)This story is awesome',

text, re.DOTALL)

if match:

lst.extend(re.findall('.*\S.*', match.group(1)))

print(lst)

給出：

[' John Doe and Mary Smith', ' Auckland,somewhere']

反對回復 2023-06-13

炎炎設計

TA貢獻1808條經驗獲得超4個贊

你可以從這個開始：

re.search(r'(?<=This is a fairy tale written by\n).*?(?=\n\s*This story is awesome)', s, re.MULTILINE|re.DOTALL).group(0)

并微調這個正則表達式。re.MULTILINE可能會被省略，因為你沒有^或$無論如何，但也re.DOTALL需要讓.匹配換行符。上面的正則表達式使用向前看和向后看(?<=)，(?=)。如果您不喜歡那樣，您可以使用括號來代替捕獲。

反對回復 2023-06-13

函數式編程

TA貢獻1807條經驗獲得超9個贊

如果您可以從文檔文件創建字符串列表，則無需使用正則表達式。只需執行這個簡單的程序：

fileContent = ['This is a fairy tale written by','John Doe and Mary Smith','Auckland,somewhere','This story is awesome',

'Some other things', 'story texts', 'Not Important data',

'This is a fairy tale written by','Kem Cho?','Majama?','This story is awesome', 'Not important data']

authorsList = []

for i in range(len(fileContent)-3):

if fileContent[i] == 'This is a fairy tale written by' and fileContent[i+3] == 'This story is awesome':

authorsList.append([fileContent[i+1], fileContent[i+2]])

print(authorsList)

在這里，我只是檢查'This is a fairy tale written by'and'This story is awesome'如果找到，則在列表中在它之間添加文本。

輸出：

[['John Doe and Mary Smith', 'Auckland,somewhere'], ['Kem Cho?', 'Majama?']]

反對回復 2023-06-13

繁星淼淼

TA貢獻1775條經驗獲得超11個贊

嘗試改用它。它應該匹配這兩個字符串之間的任何內容。

re.search(r'(?<=This is a fairy tale).*?(?=This story is awesome)',text)

反對回復 2023-06-13

4 回答
0 關注
256 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

在 Python 中使用正則表達式在文本后提取字符串

在 Python 中使用正則表達式在文本后提取字符串

4 回答

添加回答