我實現了一個生成器函數來產生一個熱編碼向量,但生成器實際上是在拋出錯誤我使用生成器函數來生成一個熱編碼向量,因為后者將用作深度學習 lstm 模型的輸入。我這樣做是為了避免在嘗試在非常大的數據集上創建一個熱編碼時出現過多的負載和內存故障。但是,我沒有收到生成器功能的錯誤。我需要幫助來弄清楚我哪里出錯了。之前的代碼:X = np.zeros((len(sequences), seq_length, vocab_size), dtype=np.bool)y = np.zeros((len(sequences), vocab_size), dtype=np.bool)for i, sentence in enumerate(sequences): for t, word in enumerate(sentence): X[i, t, vocab[word]] = 1 y[i, vocab[next_words[i]]] = 1這里,sequences = sentences generated from data setseq_length = length of each sentence(this is constant)vocab_size = number of unique words in dictionaryMy program when run on the large data set produces,sequences = 44073315seq_length = 30vocab_size = 124958所以,當上面的代碼直接用于后面的輸入時,它會給出 beloe 錯誤。Traceback (most recent call last): File "1.py", line 206, in <module> X = np.zeros((len(sequences), seq_length, vocab_size), dtype=np.bool)MemoryError(my_env) [rjagannath1@login ~]$所以,我嘗試創建一個生成器函數(用于測試),如下所示 -def gen(batch_size, no_of_sequences, seq_length, vocab_size): bs = batch_size ns = no_of_sequences X = np.zeros((batch_size, seq_length, vocab_size), dtype=np.bool) y = np.zeros((batch_size, vocab_size), dtype=np.bool) while(ns > bs): for i, sentence in enumerate(sequences): for t, word in enumerate(sentence): X[i, t, vocab[word]] = 1 y[i, vocab[next_words[i]]] = 1 print(X.shape()) print(y.shape()) yield(X, y) ns = ns - bs for item in gen(1000, 44073315, 30, 124958): print(item) 但我收到以下錯誤 -File "path_of_file", line 247, in gen X[i, t, vocab[word]] = 1IndexError: index 1000 is out of bounds for axis 0 with size 1000我在生成器函數中犯了什么錯誤?
1 回答

森欄
TA貢獻1810條經驗 獲得超5個贊
在您的生成器中進行如下修改:
batch_i = 0
while(ns > bs):
s = batch_i*batch_size
e = (batch_i+1)*batch_size
for i, sentence in enumerate(sequences[s:e]):
基本上,您想要運行大小的窗口,batch_size因此您正在制作一個運行切片,sequences它似乎是您的整個數據集。
你還必須增加batch_i,把它放在后面yield,所以添加 batch_i+=1
添加回答
舉報
0/150
提交
取消