亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

為了賬號安全,請及時綁定郵箱和手機立即綁定
已解決430363個問題,去搜搜看,總會有你想問的

如何使用正則表達式從文本中解析缺少子字符串的子字符串

如何使用正則表達式從文本中解析缺少子字符串的子字符串

蕭十郎 2022-07-12 09:36:11
我想從字符串中解析出遵循特定格式的子字符串調查回復的格式為<survey_name>_<category_name>_<question_type>.<response_type>例如:輸入字符串y_survey_category1_1st.nox_survey_category2_2ndsurvey_z_category_3_3rd.yes_more_7x_survey_category_4_4th.excludedsurvey_z_category5.yes_more_7survey_z_category_6.yes_more_7這是我到目前為止所擁有的。它適用于大多數情況,除了 question_type 是可選的(例如:上面的 5 和 6 個輸入)。以下是每個子部分的限制 1. survey_name can only be one of the 3 values 2. category_name will always be present and can have underscores 3. question_type may be present and may have underscore in it 4. response_type may be present and may have underscore in it 5. Either question_type or response_type or both will always be present(x_survey|y_survey|survey_z)_([\w_]+)_(1st|2nd|3rd|4th)[.]?(.*)https://regex101.com/r/bGc0gM/1有關如何修改正則表達式以使其適用于所有情況的任何幫助?
查看完整描述

2 回答

?
GCT1015

TA貢獻1827條經驗 獲得超4個贊

很難找到使用的正則表達式,question_type即使它是可選的,我強制類別字母/下劃線并以數字結尾

category : [a-z_]+\d
all : (x_survey|y_survey|survey_z)_([a-z_]+\d)(?>_(1st|2nd|3rd|4th))?(?>\.(.*))?

Regex demo


查看完整回答
反對 回復 2022-07-12
?
慕森卡

TA貢獻1806條經驗 獲得超8個贊

帶有一段代碼的簡短版本顯示了一個可以工作的正則表達式。該模式添加了額外的空格,因此您需要設置re.X標志。也設置re.I為忽略大小寫。


    # Capture:

    # <survey_name>_<category_name>_<question_type>.<response_type>

    #     (0)           (1)            (2) or (4)     (3) or (6)

    pat = r"""^(x_survey|y_survey|survey_z)    # <sn>  (0)

               _                               # _

               ([^.]+                          # <cn> (1)

       (?:                                     # One of

            _(1st|2nd|3rd|4th)  [.]([\w]+)$ |  # qt (2) & rt (3)

            _(1st|2nd|3rd|4th)            $ |  # qt (4)

        (?<!_(1st|2nd|3rd|4th)) [.]([\w]+)$    # rt (6)

       )

           """

    matcher = re.compile(pat, re.I | re.X)


對于包含測試用例的解決方案的兩個變體的較長版本:


"""

Format: <survey_name>_<category_name>_<question_type>.<response_type>


 1. survey_name can only be one of the 3 values

 2. category_name will always be present and can have underscores

 3. question_type may be present and may have underscore in it

 4. response_type may be present and may have underscore in it

 5. Either question_type or response_type or both will always be present


A) <survery_name> always there

    easy to find, one of three: (x_survey|y_survey|survey_z)

B) <category_name> always there

    has 0 or more internal underscores

C) <question_type> optional

    one ore more internal underscores

    ends before a dot or at end of line

    one of 4 values: (1st|2nd|3rd|4th)

D) <response_type> optional

    starts before a .

    ends at end of line


Both category_name and question_type can have zero or more internal

underscores.  This results in an ambiguity, since we have no way of knowing

when category_name ends and question_type starts.


Assume that question_type is one of the 4 values (1st|2nd|3rd|4th).  this

results in 3 valid cases and one that should not match:


Format: <survey_name>_<category_name>_<question_type>.<response_type>


0) Both question_type and response_type present

   (x_survey|y_survey|survey_z)_<category_name>_(1st|2nd|3rd|4th).<response_type>

   -->

   p1 = r"^(x_survey|y_survey|survey_z)_([^.]+)_(1st|2nd|3rd|4th)[.]([\w]+)$"  # noqa:


1) Only question_type and no response_type present

   (x_survey|y_survey|survey_z)_<category_name>_(1st|2nd|3rd|4th)

   -->

   p2 = r"^(x_survey|y_survey|survey_z)_([^.]+)_(1st|2nd|3rd|4th)$"


2) No question_type and only response_type present

   (x_survey|y_survey|survey_z)_<category_name>.<response_type>

   -->

   p3 = r"^(x_survey|y_survey|survey_z)_([^.]+)(?<!_(1st|2nd|3rd|4th))[.]([\w]+)$"  # noqa:


3) Neither question_type nor response_type present

   (x_survey|y_survey|survey_z)_<category_name>


   Neither of p1, p2 nor p3 will match.


Since the patterns are mutually exclusive we can try them one after the other.

We could also combine them into one pattern.


We can combine the three patterns in one large pattern or we can try them one

after the other.


"""

from collections import namedtuple

import re


Response = namedtuple('Response', ['survey_name',

                                   'category_name',

                                   'question_type',

                                   'response_type'])


cases = ["survey_z__CATEGORY",

         "y_survey_category1_1st.no",

         "x_survey_category2_2nd",

         "survey_z_category_3_3rd.yes_more_7",

         "x_survey_category_4_4th.excluded",

         "X_SURVEY_CATEGORY_4_4TH.excluded",

         "survey_z_category5.yes_more_7",

         "survey_z_category_6.yes_more_7",

         "survey_z_category_7._yes_more_77",

         "survey_z_category_8_._yes_more_88",

         "survey_z_category_8888__foo._yes_more_77_",

         "survey_z_category_22_22_1st_2nd._yes_more_77_",

         "survey_z__CATEGORY_2222_1ST__2ND._yes_odd_77_",

         "survey_z__CATEGORY_3333_1ST__2ND",

         "survey_z__CATEGORY_foo_1ST__2ND._yes_odd_77_",

         ]



def parse_survey_response_1(line):

    """Parse a line with a survey response seting optional values not


    present to None.  Return a Response or None when no match.


    Use a list of mutually exclusive patterns for line format:

    <survey_name>_<category_name>_<question_type>.<response_type>

    """

    # Format <sn>_<cn>_(1st|2nd|3rd|4th).<rt>

    # Format <sn>_<cn>_(1st|2nd|3rd|4th)

    # Format: <sn>_<cn>.<rt>

    prfx = r"^(x_survey|y_survey|survey_z)_([^.]+)"

    regexs = [

        prfx + r"_(1st|2nd|3rd|4th)[.]([\w]+)$",       # 4 captures

        prfx + r"_(1st|2nd|3rd|4th)($)",               # 3+1 captures

        prfx + r"(?<!_(1st|2nd|3rd|4th))[.]([\w]+)$",  # 4 captures

        ]

    matchers = [re.compile(r, re.I | re.X) for r in regexs]


    for m in matchers:

        parsed_line = m.search(line)

        if parsed_line:

            map_empty2none = (g if g else None for g in parsed_line.groups())

            return Response._make(map_empty2none)

    return None



def parse_survey_response_2(line):

    """Parse a line with a survey response seting optional values not


    present to None.  Return a Response or None when no match.


    Use a one large pattern for line format:

    <survey_name>_<category_name>_<question_type>.<response_type>

    """

    # Capture:

    # <survey_name>_<category_name>_<question_type>.<response_type>

    #     (0)           (1)            (2) or (4)     (3) or (6)

    pat = r"""^(x_survey|y_survey|survey_z)    # <sn>  (0)

               _                               # _

               ([^.]+)                         # <cn> (1)

       (?:                                     # One of

            _(1st|2nd|3rd|4th)  [.]([\w]+)$ |  # qt (2) & rt (3)

            _(1st|2nd|3rd|4th)            $ |  # qt (4)

        (?<!_(1st|2nd|3rd|4th)) [.]([\w]+)$    # rt (6)

       )

           """


    matcher = re.compile(pat, re.I | re.X)

    parsed_line = matcher.search(line)

    if parsed_line:

        pg = list(parsed_line.groups())

        pg[2] = pg[2] if pg[2] else pg[4]  # capture 2 or 4

        pg[3] = pg[3] if pg[3] else pg[6]  # capture 3 or 6

        return Response._make(pg[:4])

    return None



def unparse_survey(response):

    if response.response_type:

        head = '_'.join(e for e in response[:-1] if e)

        unparsed = '.'.join([head, response.response_type])

    else:

        unparsed = '_'.join(e for e in response if e)

    return unparsed



for c in cases:

    p1 = parse_survey_response_1(c)

    p2 = parse_survey_response_2(c)

    print(c)

    print(p1)

    print(p2)

    print(20*'=')

    if p1 or p2:

        assert(c == unparse_survey(p1))

        assert(c == unparse_survey(p2))


運行給出:


run reex02.py                                                                                                                                       

survey_z__CATEGORY

None

None

====================

y_survey_category1_1st.no

Response(survey_name='y_survey', category_name='category1', question_type='1st', response_type='no')

Response(survey_name='y_survey', category_name='category1', question_type='1st', response_type='no')

====================

x_survey_category2_2nd

Response(survey_name='x_survey', category_name='category2', question_type='2nd', response_type=None)

Response(survey_name='x_survey', category_name='category2', question_type='2nd', response_type=None)

====================

survey_z_category_3_3rd.yes_more_7

Response(survey_name='survey_z', category_name='category_3', question_type='3rd', response_type='yes_more_7')

Response(survey_name='survey_z', category_name='category_3', question_type='3rd', response_type='yes_more_7')

====================

x_survey_category_4_4th.excluded

Response(survey_name='x_survey', category_name='category_4', question_type='4th', response_type='excluded')

Response(survey_name='x_survey', category_name='category_4', question_type='4th', response_type='excluded')

====================

X_SURVEY_CATEGORY_4_4TH.excluded

Response(survey_name='X_SURVEY', category_name='CATEGORY_4', question_type='4TH', response_type='excluded')

Response(survey_name='X_SURVEY', category_name='CATEGORY_4', question_type='4TH', response_type='excluded')

====================

survey_z_category5.yes_more_7

Response(survey_name='survey_z', category_name='category5', question_type=None, response_type='yes_more_7')

Response(survey_name='survey_z', category_name='category5', question_type=None, response_type='yes_more_7')

====================

survey_z_category_6.yes_more_7

Response(survey_name='survey_z', category_name='category_6', question_type=None, response_type='yes_more_7')

Response(survey_name='survey_z', category_name='category_6', question_type=None, response_type='yes_more_7')

====================

survey_z_category_7._yes_more_77

Response(survey_name='survey_z', category_name='category_7', question_type=None, response_type='_yes_more_77')

Response(survey_name='survey_z', category_name='category_7', question_type=None, response_type='_yes_more_77')

====================

survey_z_category_8_._yes_more_88

Response(survey_name='survey_z', category_name='category_8_', question_type=None, response_type='_yes_more_88')

Response(survey_name='survey_z', category_name='category_8_', question_type=None, response_type='_yes_more_88')

====================

survey_z_category_8888__foo._yes_more_77_

Response(survey_name='survey_z', category_name='category_8888__foo', question_type=None, response_type='_yes_more_77_')

Response(survey_name='survey_z', category_name='category_8888__foo', question_type=None, response_type='_yes_more_77_')

====================

survey_z_category_22_22_1st_2nd._yes_more_77_

Response(survey_name='survey_z', category_name='category_22_22_1st', question_type='2nd', response_type='_yes_more_77_')

Response(survey_name='survey_z', category_name='category_22_22_1st', question_type='2nd', response_type='_yes_more_77_')

====================

survey_z__CATEGORY_2222_1ST__2ND._yes_odd_77_

Response(survey_name='survey_z', category_name='_CATEGORY_2222_1ST_', question_type='2ND', response_type='_yes_odd_77_')

Response(survey_name='survey_z', category_name='_CATEGORY_2222_1ST_', question_type='2ND', response_type='_yes_odd_77_')

====================

survey_z__CATEGORY_3333_1ST__2ND

Response(survey_name='survey_z', category_name='_CATEGORY_3333_1ST_', question_type='2ND', response_type=None)

Response(survey_name='survey_z', category_name='_CATEGORY_3333_1ST_', question_type='2ND', response_type=None)

====================

survey_z__CATEGORY_foo_1ST__2ND._yes_odd_77_

Response(survey_name='survey_z', category_name='_CATEGORY_foo_1ST_', question_type='2ND', response_type='_yes_odd_77_')

Response(survey_name='survey_z', category_name='_CATEGORY_foo_1ST_', question_type='2ND', response_type='_yes_odd_77_')

====================



查看完整回答
反對 回復 2022-07-12
  • 2 回答
  • 0 關注
  • 113 瀏覽
慕課專欄
更多

添加回答

舉報

0/150
提交
取消
微信客服

購課補貼
聯系客服咨詢優惠詳情

幫助反饋 APP下載

慕課網APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網微信公眾號