首頁猿問從字符串中提取信息并轉換為列表

從字符串中提取信息并轉換為列表

Python

臨摹微笑 2023-09-05 21:10:46

我有一個如下所示的字符串：[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=250.44,Y=223.48499) height=3.5324998 width=4.2910004]DECEMBER 31,[Base Font : IOFOEO+Imago-Book, Font Size : 3.876, Font Weight : 0.0] [(X=307.5,Y=240.48499) height=3.876 width=2.9970093]respectively. The net decrease in the revenue[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=49.5,Y=233.98499) height=3.5324998 width=2.5690002](US$ in millions)我想提取“X”的值和關聯的文本并將其轉換為列表。請參閱下面的預期輸出：預期輸出：['X=250.44','DECEMBER 31,']['X=307.5','respectively. The net decrease in the revenue']['X=49.5','(US$ in millions)']我們如何在 Python 中解決這個問題？我的方法：mylist = []for line in data.split("\n"): if line.strip(): x_coord = re.findall('^(X=.*)\,$', line) text = re.findall('^(]\w +)', line) mylist.append([x_coord, text])我的方法沒有發現x_coord和的任何價值text。

查看完整描述

3 回答

郎朗坤

TA貢獻1921條經驗獲得超9個贊

re解決方案：

import re

input = [

"[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=250.44,Y=223.48499) height=3.5324998 width=4.2910004]DECEMBER 31,",

"[Base Font : IOFOEO+Imago-Book, Font Size : 3.876, Font Weight : 0.0] [(X=307.5,Y=240.48499) height=3.876 width=2.9970093]respectively. The net decrease in the revenue",

"[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=49.5,Y=233.98499) height=3.5324998 width=2.5690002](US$ in millions)",

]

def extract(s):

match = re.search("(X=\d+(?:\.\d*)?).*?\](.*?)$",s)

return match.groups()

output = [extract(item) for item in input]

print(output)

輸出：

[

('X=250.44', 'DECEMBER 31,'),

('X=307.5', 'respectively. The net decrease in the revenue'),

('X=49.5', '(US$ in millions)'),

]

解釋：

\d... 數字
\d+...一位或多位數字
(?:...)...非捕獲（“正常”）括號
\.\d*... 點后跟零個或多個數字
(?:\.\d*)?...可選（零或一）“小數部分”
(X=\d+(?:\.\d*)?)...第一組，X=number
.*?...零個或多個任何字符（非貪婪）
\]...]符號
$... 字符串結尾
\](.*?)$...第二組，]字符串之間和結尾之間的任何內容

反對回復 2023-09-05

斯蒂芬大帝

TA貢獻1827條經驗獲得超8個贊

嘗試這個：

(X=[^,]*)(?:.*])(.*)

import re

source = """[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=250.44,Y=223.48499) height=3.5324998 width=4.2910004]DECEMBER 31,

[Base Font : IOFOEO+Imago-Book, Font Size : 3.876, Font Weight : 0.0] [(X=307.5,Y=240.48499) height=3.876 width=2.9970093]respectively. The net decrease in the revenue

[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=49.5,Y=233.98499) height=3.5324998 width=2.5690002](US$ in millions)""".split('\n')

pattern = r"(X=[^,]*)(?:.*])(.*)"

for line in source:

print(re.search(pattern, line).groups())

輸出：

('X=250.44', 'DECEMBER 31,')

('X=307.5', 'respectively. The net decrease in the revenue')

('X=49.5', '(US$ in millions)')

您X=在所有捕獲前面，所以我只做了一個捕獲組，如果重要的話，請隨意添加非捕獲組。

反對回復 2023-09-05

MYYA

TA貢獻1868條經驗獲得超4個贊

使用帶有命名組的正則表達式來捕獲相關位：

>>> line = "[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=250.44,Y=223.48499) height=3.5324998 width=4.2910004]DECEMBER 31,"

>>> m = re.search(r'(?:\(X=)(?P<x_coord>.*?)(?:,.*])(?P<text>.*)$', line)

>>> m.groups()

('250.44', 'DECEMBER 31,')

>>> m['x_coord']

'250.44'

>>> m['text']

'DECEMBER 31,'

反對回復 2023-09-05

3 回答
0 關注
197 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

從字符串中提取信息并轉換為列表

從字符串中提取信息并轉換為列表

3 回答

添加回答