1 回答

TA貢獻2003條經驗 獲得超2個贊
我認為最好的方法是首先旋轉數據框,這樣每個傳感器都有一個時間序列列:
df.pivot(columns="location", values="temperature")
location Garage bedroom outside1 outside2
timestamp
2019-08-22 21:28:56 23.54 NaN NaN NaN
2019-08-22 21:29:44 NaN 23.33 NaN NaN
2019-08-22 21:29:53 23.40 NaN NaN NaN
2019-08-23 22:21:06 NaN NaN 25.0 NaN
2019-08-23 22:21:33 NaN NaN NaN 24.12
然后你可以用插值法填充缺失的數據
df.pivot(columns="location", values="temperature").interpolate(method="time", limit_direction="both")
location Garage bedroom outside1 outside2
timestamp
2019-08-22 21:28:56 23.540000 23.33 25.0 24.12
2019-08-22 21:29:44 23.422105 23.33 25.0 24.12
2019-08-22 21:29:53 23.400000 23.33 25.0 24.12
2019-08-23 22:21:06 23.400000 23.33 25.0 24.12
2019-08-23 22:21:33 23.400000 23.33 25.0 24.12
現在你應該讓所有數據點在時間上對齊,你可以重新采樣到一個恒定的采樣率,比方說“1 分鐘”
df.pivot(columns="location", values="temperature").interpolate(method="time", limit_direction="both").resample("1 min").mean()
location Garage bedroom outside1 outside2
timestamp
2019-08-22 21:28:00 23.540000 23.33 25.0 24.12
2019-08-22 21:29:00 23.411053 23.33 25.0 24.12
2019-08-22 21:30:00 NaN NaN NaN NaN
2019-08-22 21:31:00 NaN NaN NaN NaN
2019-08-22 21:32:00 NaN NaN NaN NaN
... ... ... ... ...
2019-08-23 22:17:00 NaN NaN NaN NaN
2019-08-23 22:18:00 NaN NaN NaN NaN
2019-08-23 22:19:00 NaN NaN NaN NaN
2019-08-23 22:20:00 NaN NaN NaN NaN
2019-08-23 22:21:00 23.400000 23.33 25.0 24.12
你顯然有很多丟失的數據,采樣間隔這么小,數據點稀疏,我猜你的實際數據集中有更多(理想情況下,你希望在每個重采樣間隔中至少有一個數據點)。
現在由您和您的實際數據決定如何進行。.nearest()您可以使用而不是填充缺失的數據.mean()。如果缺少的項目只是少數,您可以用滾動平均值填充它們。
添加回答
舉報