亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

為了賬號安全,請及時綁定郵箱和手機立即綁定
已解決430363個問題,去搜搜看,總會有你想問的

Pyton Regular Expressions - 找到所有以連字符開頭的句子并將它們放入列表中

Pyton Regular Expressions - 找到所有以連字符開頭的句子并將它們放入列表中

米琪卡哇伊 2023-03-01 16:48:59
我有一個文本文件,我想解析并將問題和選項放入問題和選項列表中示例文本:[更新示例文本以包括問題類型和選項中的所有變化類型]- 26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ; IMA. observeB. HBV DNA study\C. InterferonD. take liver biopsy- Trauma è skin erythema and Partiel skin loss ,ttt: surgeryA. H2o irrigationB. Bicarb. IrrigationC. Surgical debridment\- Old female, obese on diet control ,polydipsia , invest. Hba1c 7.5 ,all (random,Fasting, post prandial ) sugar are mild elevated urine ketone (+) ttt: IMA. Insulin “ ketonuria “\B. pioglitazoneC. ThiazolidinedionesD. fourth i forgot (not Metformin nor sulfonylurea)- Day to day variation of this not suitable for patients under warfarin therapy: IMA. retinolsB. Fresh fruits and vegitablesC. Meet and paultry\D. Old cheese我是 python 的新手,尤其是正則表達式的新手。試圖找到將找到以“-”開頭的句子以及新行有“A”的正則表達式。, 在 'A.' 之前將其切片 并將問題放入列表中。注意:有些問題有兩行長。也是一個正則表達式,用于將每組選項提取到列表中。所以最終結果將是:question list = ['- 26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ; IM','- Old female, obese on diet control ,polydipsia , invest. Hba1c 7.5 ,all (random,Fasting, post prandial ) sugar are mild elevated urine ketone (+) ttt:IM ','etc','and so on']options list = [['A. observe','B. HBV DNA study\','C. Interferon','D. take liver biopsy'],['A. H2o irrigation\','B. Bicarb. Irrigation','C. Surgical debridment',[['A. Something Else','B. Something Else',......,'D.  ']],[etc]]我猜這會有點復雜,但是對正則表達式部分的任何幫助甚至是開始都會很棒。我有一個包含 1000 個這樣的問題和選項的文本文件,理想情況下我想提取所有問題和選項。import rewith open("julysmalltext.txt") as file:    content = file.read()    question_list = re.findall(r'', content)    options_list = re.findall(r'', content)
查看完整描述

3 回答

?
MMMHUHU

TA貢獻1834條經驗 獲得超8個贊

這將做到:


import re

 

with open("data.txt") as fp:

    question_list = list()

    options_list = list()

    for line in fp.readlines():

        question = re.match(r'-.*', line)

        if question:

            question_list.append(question.group(0))

        else:

            answer = re.match(r'[ABCD]\..*', line)

            if answer.group(0)[0]=='A':

                options_list.append([answer.group(0)])

            else:

                options_list[-1].append(answer.group(0))


print(question_list)

print(options_list)

輸出:


['- 26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ;IM', '- Trauma è skin erythema and Partiel skin loss ,ttt: surgery']

[['A. observe', 'B. HBV DNA study', 'C. Interferon', 'D. take liver biopsy'], ['A. H2o irrigation', 'B. Bicarb. Irrigation', 'C. Surgical debridment']]

另一種選擇,如果您不需要嵌套問題列表:


import re


with open("data.txt") as file:

    content = file.read()

    question_list = re.findall(r'-.*', content)

    options_list = re.findall(r'[ABCD]\..*', content)


print(question_list)

print(options_list)

輸出:


['- 26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ;IM', '- Trauma è skin erythema and Partiel skin loss ,ttt: surgery']

['A. observe', 'B. HBV DNA study', 'C. Interferon', 'D. take liver biopsy', 'A. H2o irrigation', 'B. Bicarb. Irrigation', 'C. Surgical debridment']


查看完整回答
反對 回復 2023-03-01
?
斯蒂芬大帝

TA貢獻1827條經驗 獲得超8個贊

試試這個例子。


import json


text = '''

- 26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ;IM

A. observe

B. HBV DNA study

C. Interferon

D. take liver biopsy

- Trauma è skin erythema and Partiel skin loss ,ttt: surgery

A. H2o irrigation\

B. Bicarb. Irrigation

C. Surgical debridment

'''


questions = {}

letters = ['A','B','C','D','E',]

text = text.split('\n')

text[:]          = [x for x in text if x]

question = ''

for line in text:

    if line[0] == '-':

        question = line[2:]

        questions[question] = {}

    elif line[0] in letters:

        line = line.split('.',1)

        for i in range(len(line)):

            line[i] = line[i].strip()

        questions[question][line[0]] = line[1]



print(json.dumps(questions,indent=2, ensure_ascii=False))

輸出將非常有條理:


{

  "26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ;IM": {

    "A": "observe",

    "B": "HBV DNA study",

    "C": "Interferon",

    "D": "take liver biopsy"

  },

  "Trauma è skin erythema and Partiel skin loss ,ttt: surgery": {

    "A": "H2o irrigationB. Bicarb. Irrigation",

    "C": "Surgical debridment"

  }

}


查看完整回答
反對 回復 2023-03-01
?
HUWWW

TA貢獻1874條經驗 獲得超12個贊

簡單的:


import re


with open("julysmalltext.txt") as file:

    content = file.read()


questions = re.findall('-.*?(?=\nA)', content)

options = re.findall('\w\..*?(?=\n)', content)


print(questions)

print(options)

輸出:


['- 26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ;IM', '- Trauma è skin erythema and Partiel skin loss ,ttt: surgery']

['A. observe', 'B. HBV DNA study', 'C. Interferon', 'D. take liver biopsy', 'A. H2o irrigation\\', 'B. Bicarb. Irrigation']

打破它:


This part:'-'表示字符串必須以'-'

This part:'.*?'表示提取中間的所有內容,但不貪心。

這部分:'(?=\nA)'表示'A'字符串前面必須有一個換行符和一個右邊。


查看完整回答
反對 回復 2023-03-01
  • 3 回答
  • 0 關注
  • 138 瀏覽
慕課專欄
更多

添加回答

舉報

0/150
提交
取消
微信客服

購課補貼
聯系客服咨詢優惠詳情

幫助反饋 APP下載

慕課網APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網微信公眾號