2 回答

TA貢獻1757條經驗 獲得超7個贊
您需要將這些字符串轉換為向量,并將它們填充為相等的長度。我將向您展示一個示例partial_x_train_actors_array:
import tensorflow as tf
partial_x_train_actors_array = [b'victor mclaglen', b'jon hall', b'frances farmer',
b'olympe bradna', b'gene lockhart', b'douglass dumbrille',
b'francis ford', b'ben welden', b'abner biberman',
b'pedro de cordoba', b'rudy robles', b'bobby stone',
b'nellie duran', b'james flavin', b'nina campana']
tok = tf.keras.preprocessing.text.Tokenizer(char_level=True)
tok.fit_on_texts(partial_x_train_actors_array)
seq = tok.texts_to_sequences(partial_x_train_actors_array)
這seq看起來像:
[[20, 10, 11, 16, 7, 4, 5, 12, 11, 6, 1, 17, 6, 2, 3],
[21, 7, 3, 5, 22, 1, 6, 6],
[14, 4, 1, 3, 11, 2, 13, 5, 14, 1, 4, 12, 2, 4],
[7, 6, 18, 12, 19, 2, 5, 8, 4, 1, 9, 3, 1],
[17, 2, 3, 2, 5, 6, 7, 11, 28, 22, 1, 4, 16],
[9, 7, 15, 17, 6, 1, 13, 13, 5, 9, 15, 12, 8, 4, 10, 6, 6, 2],
[14, 4, 1, 3, 11, 10, 13, 5, 14, 7, 4, 9],
[8, 2, 3, 5, 29, 2, 6, 9, 2, 3],
[1, 8, 3, 2, 4, 5, 8, 10, 8, 2, 4, 12, 1, 3],
[19, 2, 9, 4, 7, 5, 9, 2, 5, 11, 7, 4, 9, 7, 8, 1],
[4, 15, 9, 18, 5, 4, 7, 8, 6, 2, 13],
[8, 7, 8, 8, 18, 5, 13, 16, 7, 3, 2],
[3, 2, 6, 6, 10, 2, 5, 9, 15, 4, 1, 3],
[21, 1, 12, 2, 13, 5, 14, 6, 1, 20, 10, 3],
[3, 10, 3, 1, 5, 11, 1, 12, 19, 1, 3, 1]]
然后,將序列填充為等長:
padded = tf.keras.preprocessing.sequence.pad_sequences(seq)
array([[ 0, 0, 0, 20, 10, 11, 16, 7, 4, 5, 12, 11, 6, 1, 17, 6, 2, 3],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 7, 3, 5, 22, 1, 6, 6],
[ 0, 0, 0, 0, 14, 4, 1, 3, 11, 2, 13, 5, 14, 1, 4, 12, 2, 4],
[ 0, 0, 0, 0, 0, 7, 6, 18, 12, 19, 2, 5, 8, 4, 1, 9, 3, 1],
[ 0, 0, 0, 0, 0, 17, 2, 3, 2, 5, 6, 7, 11, 28, 22, 1, 4, 16],
[ 9, 7, 15, 17, 6, 1, 13, 13, 5, 9, 15, 12, 8, 4, 10, 6, 6, 2],
[ 0, 0, 0, 0, 0, 0, 14, 4, 1, 3, 11, 10, 13, 5, 14, 7, 4, 9],
[ 0, 0, 0, 0, 0, 0, 0, 0, 8, 2, 3, 5, 29, 2, 6, 9, 2, 3],
[ 0, 0, 0, 0, 1, 8, 3, 2, 4, 5, 8, 10, 8, 2, 4, 12, 1, 3],
[ 0, 0, 19, 2, 9, 4, 7, 5, 9, 2, 5, 11, 7, 4, 9, 7, 8, 1],
[ 0, 0, 0, 0, 0, 0, 0, 4, 15, 9, 18, 5, 4, 7, 8, 6, 2, 13],
[ 0, 0, 0, 0, 0, 0, 0, 8, 7, 8, 8, 18, 5, 13, 16, 7, 3, 2],
[ 0, 0, 0, 0, 0, 0, 3, 2, 6, 6, 10, 2, 5, 9, 15, 4, 1, 3],
[ 0, 0, 0, 0, 0, 0, 21, 1, 12, 2, 13, 5, 14, 6, 1, 20, 10, 3],
[ 0, 0, 0, 0, 0, 0, 3, 10, 3, 1, 5, 11, 1, 12, 19, 1, 3, 1]])
最后:
ds = tf.data.Dataset.from_tensor_slices(padded)
next(iter(ds))
<tf.Tensor: shape=(18,), dtype=int32, numpy=
array([ 0, 0, 0, 20, 10, 11, 16, 7, 4, 5, 12, 11, 6, 1, 17, 6, 2,
3])>
如果出于任何原因,您需要所有輸入(不僅僅是partial_x_train_actors_array)具有相同的填充形狀,您可以使用該maxlen參數。
添加回答
舉報