3 回答

TA貢獻1921條經驗 獲得超9個贊
re解決方案:
import re
input = [
"[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=250.44,Y=223.48499) height=3.5324998 width=4.2910004]DECEMBER 31,",
"[Base Font : IOFOEO+Imago-Book, Font Size : 3.876, Font Weight : 0.0] [(X=307.5,Y=240.48499) height=3.876 width=2.9970093]respectively. The net decrease in the revenue",
"[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=49.5,Y=233.98499) height=3.5324998 width=2.5690002](US$ in millions)",
]
def extract(s):
match = re.search("(X=\d+(?:\.\d*)?).*?\](.*?)$",s)
return match.groups()
output = [extract(item) for item in input]
print(output)
輸出:
[
('X=250.44', 'DECEMBER 31,'),
('X=307.5', 'respectively. The net decrease in the revenue'),
('X=49.5', '(US$ in millions)'),
]
解釋:
\d
... 數字\d+
...一位或多位數字(?:...)
...非捕獲(“正常”)括號\.\d*
... 點后跟零個或多個數字(?:\.\d*)?
...可選(零或一)“小數部分”(X=\d+(?:\.\d*)?)
...第一組,X=number
.*?
...零個或多個任何字符(非貪婪)\]
...]
符號$
... 字符串結尾\](.*?)$
...第二組,]
字符串之間和結尾之間的任何內容

TA貢獻1827條經驗 獲得超8個贊
嘗試這個:
(X=[^,]*)(?:.*])(.*)
import re
source = """[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=250.44,Y=223.48499) height=3.5324998 width=4.2910004]DECEMBER 31,
[Base Font : IOFOEO+Imago-Book, Font Size : 3.876, Font Weight : 0.0] [(X=307.5,Y=240.48499) height=3.876 width=2.9970093]respectively. The net decrease in the revenue
[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=49.5,Y=233.98499) height=3.5324998 width=2.5690002](US$ in millions)""".split('\n')
pattern = r"(X=[^,]*)(?:.*])(.*)"
for line in source:
print(re.search(pattern, line).groups())
輸出:
('X=250.44', 'DECEMBER 31,')
('X=307.5', 'respectively. The net decrease in the revenue')
('X=49.5', '(US$ in millions)')
您X=在所有捕獲前面,所以我只做了一個捕獲組,如果重要的話,請隨意添加非捕獲組。

TA貢獻1868條經驗 獲得超4個贊
使用帶有命名組的正則表達式來捕獲相關位:
>>> line = "[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=250.44,Y=223.48499) height=3.5324998 width=4.2910004]DECEMBER 31,"
>>> m = re.search(r'(?:\(X=)(?P<x_coord>.*?)(?:,.*])(?P<text>.*)$', line)
>>> m.groups()
('250.44', 'DECEMBER 31,')
>>> m['x_coord']
'250.44'
>>> m['text']
'DECEMBER 31,'
添加回答
舉報