首頁猿問如何使用空列表對 pandas...

如何使用空列表對 pandas 中的列進行 json_normalize，而不丟失記錄

Python

HUX布斯 2023-07-27 09:35:04

我用來pd.json_normalize將"sections"這些數據中的字段展平為行。除了空列表的行之外，它工作正常"sections"。該 ID 被完全忽略，并且從最終的扁平化數據框中丟失。我需要確保數據中的每個唯一 ID 至少有一行（某些 ID 可能有很多行，每個唯一 ID、每個唯一、、以及當我在數據中取消嵌套更多字段時最多可以有section_id一行question_id）answer_id： {'_id': '5f48f708fe22ca4d15fb3b55', 'created_at': '2020-08-28T12:22:32Z', 'sections': []}]樣本數據：sample = [{'_id': '5f48bee4c54cf6b5e8048274', 'created_at': '2020-08-28T08:23:00Z', 'sections': [{'comment': '', 'type_fail': None, 'answers': [{'comment': 'stuff', 'feedback': [], 'value': 10.0, 'answer_type': 'default', 'question_id': '5e59599c68369c24069630fd', 'answer_id': '5e595a7c3fbb70448b6ff935'}, {'comment': 'stuff', 'feedback': [], 'value': 10.0, 'answer_type': 'default', 'question_id': '5e598939cedcaf5b865ef99a', 'answer_id': '5e598939cedcaf5b865ef998'}], 'score': 20.0, 'passed': True, '_id': '5e59599c68369c24069630fe', 'custom_fields': []}, {'comment': '', 'type_fail': None, 'answers': [{'comment': '', 'feedback': [], 'value': None, 'answer_type': 'not_applicable', 'question_id': '5e59894f68369c2398eb68a8', 'answer_id': '5eaad4e5b513aed9a3c996a5'},測試：df = pd.json_normalize(sample)df2 = pd.json_normalize(df.to_dict(orient="records"), meta=["_id", "created_at"], record_path="sections", record_prefix="section_")此時，我現在缺少一行 ID“5f48f708fe22ca4d15fb3b55”，我仍然需要它。df3 = pd.json_normalize(df2.to_dict(orient="records"), meta=["_id", "created_at", "section__id", "section_score", "section_passed", "section_type_fail", "section_comment"], record_path="section_answers", record_prefix="")我可以以某種方式更改此設置以確保每個 ID 至少獲得一行嗎？我正在處理數百萬條記錄，并且不想稍后意識到我的最終數據中缺少一些 ID。我能想到的唯一解決方案是標準化每個數據幀，然后再次將其連接到原始數據幀。

查看完整描述

2 回答

吃雞游戲

TA貢獻1829條經驗獲得超7個贊

解決問題的最好方法是修復dict

如果sections為空list，則填充[{'answers': [{}]}]

for i, d in enumerate(sample):

if not d['sections']:

sample[i]['sections'] = [{'answers': [{}]}]

df = pd.json_normalize(sample)

df2 = pd.json_normalize(df.to_dict(orient="records"), meta=["_id", "created_at"], record_path="sections", record_prefix="section_")

# display(df2)

section_comment section_type_fail section_answers section_score section_passed section__id section_custom_fields _id created_at

0 NaN [{'comment': 'stuff', 'feedback': [], 'value': 10.0, 'answer_type': 'default', 'question_id': '5e59599c68369c24069630fd', 'answer_id': '5e595a7c3fbb70448b6ff935'}, {'comment': 'stuff', 'feedback': [], 'value': 10.0, 'answer_type': 'default', 'question_id': '5e598939cedcaf5b865ef99a', 'answer_id': '5e598939cedcaf5b865ef998'}] 20.0 True 5e59599c68369c24069630fe [] 5f48bee4c54cf6b5e8048274 2020-08-28T08:23:00Z

1 NaN [{'comment': '', 'feedback': [], 'value': None, 'answer_type': 'not_applicable', 'question_id': '5e59894f68369c2398eb68a8', 'answer_id': '5eaad4e5b513aed9a3c996a5'}, {'comment': '', 'feedback': [], 'value': None, 'answer_type': 'not_applicable', 'question_id': '5e598967cedcaf5b865efe3e', 'answer_id': '5eaad4ece3f1e0794372f8b2'}, {'comment': 'stuff', 'feedback': [], 'value': 0.0, 'answer_type': 'default', 'question_id': '5e598976cedcaf5b865effd1', 'answer_id': '5e598976cedcaf5b865effd3'}] 0.0 True 5e59894f68369c2398eb68a9 [] 5f48bee4c54cf6b5e8048274 2020-08-28T08:23:00Z

2 NaN NaN [{}] NaN NaN NaN NaN 5f48f708fe22ca4d15fb3b55 2020-08-28T12:22:32Z

df3 = pd.json_normalize(df2.to_dict(orient="records"), meta=["_id", "created_at", "section__id", "section_score", "section_passed", "section_type_fail", "section_comment"], record_path="section_answers", record_prefix="")

# display(df3)

comment feedback value answer_type question_id answer_id _id created_at section__id section_score section_passed section_type_fail section_comment

0 stuff [] 10.0 default 5e59599c68369c24069630fd 5e595a7c3fbb70448b6ff935 5f48bee4c54cf6b5e8048274 2020-08-28T08:23:00Z 5e59599c68369c24069630fe 20 True NaN

1 stuff [] 10.0 default 5e598939cedcaf5b865ef99a 5e598939cedcaf5b865ef998 5f48bee4c54cf6b5e8048274 2020-08-28T08:23:00Z 5e59599c68369c24069630fe 20 True NaN

2 [] NaN not_applicable 5e59894f68369c2398eb68a8 5eaad4e5b513aed9a3c996a5 5f48bee4c54cf6b5e8048274 2020-08-28T08:23:00Z 5e59894f68369c2398eb68a9 0 True NaN

3 [] NaN not_applicable 5e598967cedcaf5b865efe3e 5eaad4ece3f1e0794372f8b2 5f48bee4c54cf6b5e8048274 2020-08-28T08:23:00Z 5e59894f68369c2398eb68a9 0 True NaN

4 stuff [] 0.0 default 5e598976cedcaf5b865effd1 5e598976cedcaf5b865effd3 5f48bee4c54cf6b5e8048274 2020-08-28T08:23:00Z 5e59894f68369c2398eb68a9 0 True NaN

5 NaN NaN NaN NaN NaN NaN 5f48f708fe22ca4d15fb3b55 2020-08-28T12:22:32Z NaN NaN NaN NaN NaN

反對回復 2023-07-27

繁花不似錦

TA貢獻1851條經驗獲得超4個贊

這是的一個已知問題json_normalize。我還沒有找到使用json_normalize.?您可以嘗試使用flatten_json，如下所示：

import flatten_json as fj

dic = (fj.flatten(d) for d in sample)

df = pd.DataFrame(dic)

print(df)

? ? ? ? ? ? ? ? ? ? ? ? _id? ? ? ? ? ? created_at sections_0_comment? ...? ? ? ? ? ? sections_1__id sections_1_custom_fields sections

0? 5f48bee4c54cf6b5e8048274? 2020-08-28T08:23:00Z? ? ? ? ? ? ? ? ? ? ?...? 5e59894f68369c2398eb68a9? ? ? ? ? ? ? ? ? ? ? ?[]? ? ? NaN

1? 5f48f708fe22ca4d15fb3b55? 2020-08-28T12:22:32Z? ? ? ? ? ? ? ? NaN? ...? ? ? ? ? ? ? ? ? ? ? ?NaN? ? ? ? ? ? ? ? ? ? ? NaN? ? ? ?[]

編輯

反對回復 2023-07-27

2 回答
0 關注
154 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

如何使用空列表對 pandas 中的列進行 json_normalize，而不丟失記錄

如何使用空列表對 pandas 中的列進行 json_normalize，而不丟失記錄

2 回答

添加回答

如何使用空列表對 pandas 中的列進行 json_normalize，而不丟失記錄

如何使用空列表對 pandas 中的列進行 json_normalize，而不丟失記錄