2 回答

TA貢獻1853條經驗 獲得超18個贊
讓我們用cumsum它來識別塊和分組:
blocks = df['C'].notna().cumsum()
agg_dict = {col:' '.join if col=='B' else 'first' for col in df}
df.groupby(blocks).agg(agg_dict).reset_index(drop=True)
輸出:
A B C D
0 Train Superfast Convernient Newest model Year 2002/099 10.0 20.0
1 Car Fastest Can be more fast Year/2020/AYD 20.0 30.0

TA貢獻1829條經驗 獲得超6個贊
一個有點復雜的解決方案,僅使用numpy
,但對于大數據來說工作速度非常快:
import pandas as pd, numpy as np, math
df = pd.DataFrame([
['Train', 'Superfast', 10, 20],
[np.nan, 'Convernient', np.nan, np.nan],
[np.nan, 'Newest model', np.nan, np.nan],
[np.nan, 'Year 2002/099', np.nan, np.nan],
['Car', 'Fastest', 20, 30],
[np.nan, 'Can be more fast', np.nan, np.nan],
[np.nan, 'Year/2020/AYD', np.nan, np.nan],
], columns = ['A', 'B', 'C', 'D'])
a = df.values
i = np.append(np.flatnonzero(~(a[:, 0] != a[:, 0])), a.shape[0])
b = a[i[:-1], :]
diffs = np.diff(i)
maxs = np.amax(diffs)
c = np.zeros([i.shape[0], maxs], dtype = np.str_)
begs, ends = i[:-1], i[1:]
for j in range(1, maxs):
chosen = begs + j < ends
b[chosen, 1] += ' ' + a[begs[chosen] + j, 1]
df = pd.DataFrame(b, columns = df.columns.values.tolist())
print(df)
代碼輸出:
A B C D
0 Train Superfast Convernient Newest model Year 2002/099 10 20
1 Car Fastest Can be more fast Year/2020/AYD 20 30
添加回答
舉報