首頁猿問 Pyton Regular...

Pyton Regular Expressions - 找到所有以連字符開頭的句子并將它們放入列表中

Python

米琪卡哇伊 2023-03-01 16:48:59

我有一個文本文件，我想解析并將問題和選項放入問題和選項列表中示例文本：[更新示例文本以包括問題類型和選項中的所有變化類型]- 26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ; IMA. observeB. HBV DNA study\C. InterferonD. take liver biopsy- Trauma è skin erythema and Partiel skin loss ,ttt: surgeryA. H2o irrigationB. Bicarb. IrrigationC. Surgical debridment\- Old female, obese on diet control ,polydipsia , invest. Hba1c 7.5 ,all (random,Fasting, post prandial ) sugar are mild elevated urine ketone (+) ttt: IMA. Insulin “ ketonuria “\B. pioglitazoneC. ThiazolidinedionesD. fourth i forgot (not Metformin nor sulfonylurea)- Day to day variation of this not suitable for patients under warfarin therapy: IMA. retinolsB. Fresh fruits and vegitablesC. Meet and paultry\D. Old cheese我是 python 的新手，尤其是正則表達式的新手。試圖找到將找到以“-”開頭的句子以及新行有“A”的正則表達式。, 在 'A.' 之前將其切片并將問題放入列表中。注意：有些問題有兩行長。也是一個正則表達式，用于將每組選項提取到列表中。所以最終結果將是：question list = ['- 26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ; IM','- Old female, obese on diet control ,polydipsia , invest. Hba1c 7.5 ,all (random,Fasting, post prandial ) sugar are mild elevated urine ketone (+) ttt:IM ','etc','and so on']options list = [['A. observe','B. HBV DNA study\','C. Interferon','D. take liver biopsy'],['A. H2o irrigation\','B. Bicarb. Irrigation','C. Surgical debridment',[['A. Something Else','B. Something Else',......,'D. ']],[etc]]我猜這會有點復雜，但是對正則表達式部分的任何幫助甚至是開始都會很棒。我有一個包含 1000 個這樣的問題和選項的文本文件，理想情況下我想提取所有問題和選項。import rewith open("julysmalltext.txt") as file: content = file.read() question_list = re.findall(r'', content) options_list = re.findall(r'', content)

查看完整描述

3 回答

MMMHUHU

TA貢獻1834條經驗獲得超8個贊

這將做到：

import re

with open("data.txt") as fp:

question_list = list()

options_list = list()

for line in fp.readlines():

question = re.match(r'-.*', line)

if question:

question_list.append(question.group(0))

else:

answer = re.match(r'[ABCD]\..*', line)

if answer.group(0)[0]=='A':

options_list.append([answer.group(0)])

else:

options_list[-1].append(answer.group(0))

print(question_list)

print(options_list)

輸出：

['- 26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ;IM', '- Trauma è skin erythema and Partiel skin loss ,ttt: surgery']

[['A. observe', 'B. HBV DNA study', 'C. Interferon', 'D. take liver biopsy'], ['A. H2o irrigation', 'B. Bicarb. Irrigation', 'C. Surgical debridment']]

另一種選擇，如果您不需要嵌套問題列表：

import re

with open("data.txt") as file:

content = file.read()

question_list = re.findall(r'-.*', content)

options_list = re.findall(r'[ABCD]\..*', content)

print(question_list)

print(options_list)

輸出：

['- 26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ;IM', '- Trauma è skin erythema and Partiel skin loss ,ttt: surgery']

['A. observe', 'B. HBV DNA study', 'C. Interferon', 'D. take liver biopsy', 'A. H2o irrigation', 'B. Bicarb. Irrigation', 'C. Surgical debridment']

反對回復 2023-03-01

斯蒂芬大帝

TA貢獻1827條經驗獲得超8個贊

試試這個例子。

import json

text = '''

- 26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ;IM

A. observe

B. HBV DNA study

C. Interferon

D. take liver biopsy

- Trauma è skin erythema and Partiel skin loss ,ttt: surgery

A. H2o irrigation\

B. Bicarb. Irrigation

C. Surgical debridment

'''

questions = {}

letters = ['A','B','C','D','E',]

text = text.split('\n')

text[:] = [x for x in text if x]

question = ''

for line in text:

if line[0] == '-':

question = line[2:]

questions[question] = {}

elif line[0] in letters:

line = line.split('.',1)

for i in range(len(line)):

line[i] = line[i].strip()

questions[question][line[0]] = line[1]

print(json.dumps(questions,indent=2, ensure_ascii=False))

輸出將非常有條理：

{

"26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ;IM": {

"A": "observe",

"B": "HBV DNA study",

"C": "Interferon",

"D": "take liver biopsy"

"Trauma è skin erythema and Partiel skin loss ,ttt: surgery": {

"A": "H2o irrigationB. Bicarb. Irrigation",

"C": "Surgical debridment"

}

反對回復 2023-03-01

HUWWW

TA貢獻1874條經驗獲得超12個贊

簡單的：

import re

with open("julysmalltext.txt") as file:

content = file.read()

questions = re.findall('-.*?(?=\nA)', content)

options = re.findall('\w\..*?(?=\n)', content)

print(questions)

print(options)

輸出：

['- 26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ;IM', '- Trauma è skin erythema and Partiel skin loss ,ttt: surgery']

['A. observe', 'B. HBV DNA study', 'C. Interferon', 'D. take liver biopsy', 'A. H2o irrigation\\', 'B. Bicarb. Irrigation']

打破它：

This part:'-'表示字符串必須以'-'

This part:'.*?'表示提取中間的所有內容，但不貪心。

這部分：'(?=\nA)'表示'A'字符串前面必須有一個換行符和一個右邊。

反對回復 2023-03-01

3 回答
0 關注
138 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

Pyton Regular Expressions - 找到所有以連字符開頭的句子并將它們放入列表中

Pyton Regular Expressions - 找到所有以連字符開頭的句子并將它們放入列表中

3 回答

添加回答