3 回答

TA貢獻1871條經驗 獲得超8個贊
這是我能想到的最 pythonic 的方式。我的做法是先對列表的列表進行排序,按sublist[3],這意味著當我們遍歷列表時,我們最終會在遇到重復項之前遇到具有最大評論數的子列表。這個技巧將用于構建最終列表。
meta_list = [['Coloring book moana', 'ART_AND_DESIGN', '3.9', 967, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
['Coloring book moana', 'FAMILY', '3.9', 974, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
['Gmail', 'COMMUNICATION', '4.3', 4604324, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
['Gmail', 'COMMUNICATION', '4.3', 4604483, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66577313, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66577446, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66509917, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']]
# Sort the list by review count and review name - make sure the highest review is first
meta_list.sort(key=lambda x: (int(x[3]), x[0]), reverse=True)
# This is the list we'll use to store the final data in
final_list = []
# Go through all the items in the meta_list
for meta in meta_list:
if not meta[0] in [item[0] for item in final_list]:
'''
If another meta with the same name (0th index)
doesn't already exist in final_list, add it
'''
final_list.append(meta)
輸出-
[['Instagram',
'SOCIAL',
'4.5',
66577446,
'Varies with device',
'1,000,000,000+',
'Free',
'0',
'Teen',
'Social',
'July 31, 2018',
'Varies with device',
'Varies with device'],
['Gmail',
'COMMUNICATION',
'4.3',
4604483,
'Varies with device',
'1,000,000,000+',
'Free',
'0',
'Everyone',
'Communication',
'August 2, 2018',
'Varies with device',
'Varies with device'],
['Coloring book moana',
'FAMILY',
'3.9',
974,
'14M',
'500,000+',
'Free',
'0',
'Everyone',
'Art & Design;Pretend Play',
'January 15, 2018',
'2.0.0',
'4.0.3 and up']]
基本上它將所有不存在的元數據添加到final_list. 為什么這行得通?因為您在循環時遇到的第一個元數據是評論數最高的元數據。所以一旦那個被添加,它的復制品就不能被添加,我們就完成了。
注意:這不會保留評論本身的順序。它只會確保只保留評論數最高的評論,以防出現同名的重復評論。

TA貢獻1869條經驗 獲得超4個贊
這個問題可能有更優雅/pythonic 的解決方案,但這是一個可能的途徑:
my_list = [...] # Nested list here
def compare_duplicates(nested_list, name_index=0, compare_index=3):
max_values = dict() # Used two dictionaries for readability
final_indexes = dict()
for i, item in enumerate(nested_list):
name, value = item[name_index], item[compare_index]
if value > max_values.get(name, 0):
max_values[name] = value
final_indexes[name] = i
return [nested_list[i] for i in final_indexes.values()]
print(compare_duplicates(my_list))

TA貢獻1806條經驗 獲得超5個贊
是這樣的:
_DATA = [
['Coloring book moana', 'ART_AND_DESIGN', '3.9', 967, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
['Coloring book moana', 'ART_AND_DESIGN', '3.9', 974, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
['Gmail', 'COMMUNICATION', '4.3', 4604324, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
['Gmail', 'COMMUNICATION', '4.3', 4604483, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66577313, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66577446, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66509917, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
]
def print_highest(data):
list_map = {}
for d in data:
key = str(d[0:3] + d[4:])
if key not in list_map:
list_map[key] = d
continue
if d[3] > list_map[key][3]:
list_map[key] = d
for l in list_map.values():
print(l)
print_highest(_DATA)
輸出:
['Coloring book moana', 'ART_AND_DESIGN', '3.9', 974, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']
['Gmail', 'COMMUNICATION', '4.3', 4604483, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', 66577446, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
添加回答
舉報