首頁猿問使用函數參數過濾 CSV 文件

使用函數參數過濾 CSV 文件

Python

慕運維8079593 2023-09-26 14:10:14

所以我正在編寫一個函數來根據函數參數過濾 csv 文件，然后在過濾后找到一列的平均值。我只允許使用 import csv （沒有 pandas）并且不能使用 lambda 或任何其他 python“高級”快捷方式。我覺得我可以輕松獲得平均部分，但我在根據我提到的參數和約束對其進行過濾時遇到了麻煩。我通常會使用 pandas 來解決這個問題，這使得這個過程更容易，但我不能。這是我的代碼：def calc_avg(self, specific, filter, logic, threshold): with open(self.load_data, 'r') as avg_file: for row in csv.DictReader(avg_file, delimiter= ','): specific = row[specific] filter = int(row[filter]) logic = logic threshold = 0 if logic == 'lt': filter < threshold elif logic == 'gt': filter > threshold elif logic == 'lte': filter <= threshold elif logic == 'gte': filter >= threshold 它應該與這個命令一起使用print(csv_data.calc_avg("Length_of_stay", filter="SOFA", logic="lt", threshold="15"))這是代碼和列標題的格式。樣本數據：RecordID SAPS-I SOFA Length_of_stay 132539 6 1 5 132540 16 8 8 132541 21 11 19 132545 17 2 4 132547 14 11 6 132548 14 4 9 132551 19 8 6 132554 11 0 17

查看完整描述

2 回答

狐的傳說

TA貢獻1804條經驗獲得超3個贊

更新

此選項計算一次并返回一個可在迭代行時使用的logic函數。compare當數據有很多行時，速度會更快。

# written as a function because you don't share the definition of load_data

# but the main idea can be translated to a class

def calc_avg(self, specific, filter, logic, threshold):

if isinstance(threshold, str):

threshold = float(threshold)

def lt(a, b): return a < b

def gt(a, b): return a > b

def lte(a, b): return a <= b

def gte(a, b): return a >= b

if logic == 'lt': compare = lt

elif logic == 'gt': compare = gt

elif logic == 'lte': compare = lte

elif logic == 'gte': compare = gte

with io.StringIO(self) as avg_file: # change to open an actual file

running_sum = running_count = 0

for row in csv.DictReader(avg_file, delimiter=','):

if compare(int(row[filter]), threshold):

running_sum += int(row[specific])

# or float(row[specific])

running_count += 1

if running_count == 0:

# no even one row passed the filter

return 0

else:

return running_sum / running_count

print(calc_avg(data, 'Length_of_stay', 'SOFA', 'lt', '15'))

print(calc_avg(data, 'Length_of_stay', 'SOFA', 'lt', '2'))

print(calc_avg(data, 'Length_of_stay', 'SOFA', 'lt', '0'))

輸出

9.25

11.0

初步答復

為了過濾行，一旦確定應該使用哪種類型的不等式，就必須進行比較。這里的代碼將其存儲在 boolean 中include。

然后你可以有兩個變量：running_sum和running_count稍后應該除以返回平均值。

import io

import csv

# written as a function because you don't share the definition of load_data

# but the main idea can be translated to a class

def calc_avg(self, specific, filter, logic, threshold):

if isinstance(threshold, str):

threshold = float(threshold)

with io.StringIO(self) as avg_file: # change to open an actual file

running_sum = running_count = 0

for row in csv.DictReader(avg_file, delimiter=','):

# your code has: filter = int(row[filter])

value = int(row[filter]) # avoid overwriting parameters

if logic == 'lt' and value < threshold:

include = True

elif logic == 'gt' and value > threshold:

include = True

elif logic == 'lte' and value <= threshold: # should it be 'le'

include = True

elif logic == 'gte' and value >= threshold: # should it be 'ge'

include = True

# or import ast and consider all cases in one line

# if ast.literal_eval(f'{value}{logic}{treshold}'):

# include = True

else:

include = False

if include:

running_sum += int(row[specific])

# or float(row[specific])

running_count += 1

return running_sum / running_count

data = """RecordID,SAPS-I,SOFA,Length_of_stay

132539,6,1,5

132540,16,8,8

132541,21,11,19

132545,17,2,4

132547,14,11,6

132548,14,4,9

132551,19,8,6

132554,11,0,17"""

print(calc_avg(data, 'Length_of_stay', 'SOFA', 'lt', '15'))

print(calc_avg(data, 'Length_of_stay', 'SOFA', 'lt', '2'))

輸出

9.25

11.0

反對回復 2023-09-26

陪伴而非守候

TA貢獻1757條經驗獲得超8個贊

您沒有對比較結果做任何事情。您需要在if報表中使用它們以將特定值包含在平均值計算中。

def calc_avg(self, specific, filter, logic, threshold):

with open(self.load_data, 'r') as avg_file:

values = []

for row in csv.DictReader(avg_file, delimiter= ','):

specific = row[specific]

filter = int(row[filter])

threshold = 0

if logic == 'lt' and filter < threshold:

values.append(specific)

elif logic == 'gt' and filter > threshold:

values.append(specific)

elif logic == 'lte' and filter <= threshold:

values.append(specific)

elif logic == 'gte' and filter >= threshold:

values.append(specific)

if len(values) > 0:

return sum(values) / len(values)

else:

return 0

反對回復 2023-09-26

2 回答
0 關注
177 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

使用函數參數過濾 CSV 文件

使用函數參數過濾 CSV 文件

2 回答

添加回答