亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

為了賬號安全,請及時綁定郵箱和手機立即綁定
已解決430363個問題,去搜搜看,總會有你想問的

我如何從python中的字符串中提取月份和年份?

我如何從python中的字符串中提取月份和年份?

郎朗坤 2022-12-14 21:05:49
輸入文字:text = "Wipro Limited | Hyderabad, IN                Dec 2017 – PresentProject Analyst Infosys | Delhi, IN                Apr 2017 – Nov 2017 Software Developer HCL Technologies | Hyderabad, IN                Jun 2016 – Mar 2017 Software Engineer  "我已經為此編寫了一個代碼,但它顯示在每個提取的單詞的列表中并且無法執行任何操作。regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s+\–\s+(?P<month1>[a-zA-Z]+)\s+(?P<year1>\d{4})')mat = re.findall(regex, text)mat查看代碼:https ://regex101.com/r/mMlgYp/1 。我希望像下面這樣的輸出可以預覽日期并對其進行區分,然后計算總經驗:此處 Present 或 Till date 應該考慮當前的月份和年份。import timePresent = time.strftime("%m-%Y")Present # output: '05-2020'#Desired outputExtracted dates: [('Dec 2017 - Present'), ('Apr 2017 - Nov 2017'), ('Jun 2016 - Mar 2017')]# and so on ...should display all the search results First experience: 1.9 years second experience: 8 monthsthird experience: 7 months# and so on ...should display all the search results Total experience: 3.4 years請幫助我,我是編程語言和 NLP、正則表達式方面的新手。
查看完整描述

2 回答

?
ABOUTYOU

TA貢獻1812條經驗 獲得超5個贊

您可能最終希望在數據框中使用它,因為您將其標記為 pandas(請參閱Andrej 的回答),但無論哪種方式,您都可以使用內插法從字符串中解析日期:


fr"(?i)((?:{months}) *\d{{4}}) *(?:-|–) *(present|(?:{months}) *\d{{4}})"

{months}所有可能的月份名稱和縮寫的交替組在哪里。


import calendar

import re

from datetime import datetime

from dateutil.relativedelta import relativedelta


text = """Wipro Limited | Hyderabad, IN                Dec 2017 – Present

Project Analyst 


Infosys | Delhi, IN                Apr 2017 – Nov 2017 

Software Developer 


HCL Technologies | Hyderabad, IN                Jun 2016 – Mar 2017 

Software Engineer  

"""


def parse_date(x, fmts=("%b %Y", "%B %Y")):

    for fmt in fmts:

        try:

            return datetime.strptime(x, fmt)

        except ValueError:

            pass


months = "|".join(calendar.month_abbr[1:] + calendar.month_name[1:])

pattern = fr"(?i)((?:{months}) *\d{{4}}) *(?:-|–) *(present|(?:{months}) *\d{{4}})"

total_experience = None


for start, end in re.findall(pattern, text):

    if end.lower() == "present":

        today = datetime.today()

        end = f"{calendar.month_abbr[today.month]} {today.year}"


    duration = relativedelta(parse_date(end), parse_date(start))


    if total_experience:

        total_experience += duration

    else: 

        total_experience = duration


    print(f"{start}-{end} ({duration.years} years, {duration.months} months)")


if total_experience:

    print(f"total experience:  {total_experience.years} years, {total_experience.months} months")

else:

    print("couldn't parse text")

輸出:


Dec 2017-May 2020 (2 years, 5 months)

Apr 2017-Nov 2017 (0 years, 7 months)

Jun 2016-Mar 2017 (0 years, 9 months)

total experience:  3 years, 9 months


查看完整回答
反對 回復 2022-12-14
?
回首憶惘然

TA貢獻1847條經驗 獲得超11個贊

import re

import numpy as np

import pandas as pd


text = '''Wipro Limited | Hyderabad, IN                Dec 2017 – Present

Project Analyst


Infosys | Delhi, IN                Apr 2017 – Nov 2017

Software Developer


HCL Technologies | Hyderabad, IN                Jun 2016 – Mar 2017

Software Engineer

'''


def pretty_format(monthts):

    return f'{monthts/12:.1f} years' if monthts > 11 else f'{monthts:.1f} months'


data = []

for employer, d1, d2 in re.findall(r'(.*?)\s*\|.*([A-Z][a-z]{2} [12]\d{3}) – (?:([A-Z][a-z]{2} [12]\d{3})|Present)', text):

    data.append({'Employer': employer, 'Begin': d1, 'End': d2 or np.nan})


df = pd.DataFrame(data)

df['Begin'] = pd.to_datetime(df['Begin'])

df['End'] = pd.to_datetime(df['End'])


df['Experience'] = ((df['End'].fillna(pd.to_datetime('now')) - df['Begin']) / np.timedelta64(1, 'M')).apply(pretty_format)

print(df)


total = np.sum(df['End'].fillna(pd.to_datetime('now')) - df['Begin']) / np.timedelta64(1, 'M')

print()

print(f'Total experience = {pretty_format(total)}')

印刷:


           Employer      Begin        End  Experience

0     Wipro Limited 2017-12-01        NaT   2.5 years

1           Infosys 2017-04-01 2017-11-01  7.0 months

2  HCL Technologies 2016-06-01 2017-03-01  9.0 months


Total experience = 3.8 years


查看完整回答
反對 回復 2022-12-14
  • 2 回答
  • 0 關注
  • 223 瀏覽
慕課專欄
更多

添加回答

舉報

0/150
提交
取消
微信客服

購課補貼
聯系客服咨詢優惠詳情

幫助反饋 APP下載

慕課網APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網微信公眾號