首頁猿問在空格格式的報表中分析多行標頭

在空格格式的報表中分析多行標頭

Python

波斯汪 2022-08-25 13:44:17

我正在嘗試解析表中具有多行標題的文件： Categ_1 Categ_2 Categ_3 Categ_4data1 Group Data Data Data Data ( %) Options--------------------------------------------------------------------------------param_group1 6.366e-03 6.644e-03 6.943e-05 0.0131 (57.42%) iparam_group2 1.251e-05 7.253e-06 4.256e-04 4.454e-04 ( 1.96%) param_group3 2.205e-05 6.421e-05 2.352e-03 2.438e-03 (10.70%) param_group4 1.579e-07 0.0000 1.479e-05 1.495e-05 ( 0.07%) param_group5 3.985e-03 2.270e-07 2.789e-03 6.775e-03 (29.74%) param_group6 0.0000 0.0000 0.0000 0.0000 ( 0.00%) param_group7 -8.121e-09 0.0000 1.896e-08 1.084e-08 ( 0.00%) 我過去曾成功地使用pyparsing來解析這樣的表，但是標題在一行中，并且沒有一個標題字段在它們中有多個空格( %)我是這樣做的：def mustMatchCols(startloc,endloc): return lambda s,l,t: startloc <= col(l,s) <= endloc+1def tableValue(expr, colstart, colend): return Optional(expr.copy().addCondition(mustMatchCols(colstart,colend), message="text not in expected columns"))if header: column_lengths = determine_header_column_widths(header_line)# Then run the tableValue function for each start,end pair.是否有任何內置的構造/示例用于在pyparsing或任何其他方法中解析此類空間格式的表？

查看完整描述

1 回答

達令說

TA貢獻1821條經驗獲得超6個贊

如果您可以預先確定列寬，則下面是將多個列標題拼接在一起的代碼：

headers = """\

Categ_1 Categ_2 Categ_3 Categ_4

data1 Group Data Data Data Data ( %) Options

"""

col_widths = [24, 10, 10, 11, 9, 10, 10]

# convert widths to slices

col_slices = []

prev = 0

for cw in col_widths:

col_slices.append(slice(prev, prev + cw))

prev += cw

# verify slices

# for line in headers.splitlines():

# for slc in col_slices:

# print(line[slc])

def extract_line_parts(slices, line_string):

return [line_string[slc].strip() for slc in slices]

# extract the different column header parts

parts = [extract_line_parts(col_slices, line) for line in headers.splitlines()]

for p in parts:

print(p)

# use zip(*parts) to transpose list of row parts into list of column parts

header_cols = list(zip(*parts))

print(header_cols)

for header in header_cols:

print(' '.join(filter(None, header)))

指紋：

['', 'Categ_1', 'Categ_2', 'Categ_3', 'Categ_4', '', '']

['data1 Group', 'Data', 'Data', 'Data', 'Data', '( %)', 'Options']

[('', 'data1 Group'), ('Categ_1', 'Data'), ('Categ_2', 'Data'), ('Categ_3', 'Data'), ('Categ_4', 'Data'), ('', '( %)'), ('', 'Options')]

data1 Group

Categ_1 Data

Categ_2 Data

Categ_3 Data

Categ_4 Data

( %)

Options

反對回復 2022-08-25

1 回答
0 關注
101 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

在空格格式的報表中分析多行標頭

在空格格式的報表中分析多行標頭

1 回答

添加回答