首頁猿問無法將“字符串&rdq...

無法將“字符串”列表轉換為 tf.Dataset.from_tensor_slicer()

Python

萬千封印 2023-03-22 16:38:11

我有以下數據：partial_x_train_features = [ [b'south pago pago victor mclaglen jon hall frances farmer olympe bradna gene lockhart douglass dumbrille francis ford ben welden abner biberman pedro cordoba rudy robles bobby stone nellie duran james flavin nina campana alfred e green treasure hunt adventure adventure'], [b'easy virtue jessica biel ben barnes kristin scott thomas colin firth kimberley nixon katherine parkinson kris marshall christian brassington charlotte riley jim mcmanus pip torrens jeremy hooton joanna bacon maggie hickey georgie glen stephan elliott young englishman marry glamorous american brings home meet parent arrive like blast future blow entrenched british stuffiness window comedy romance'], [b'fragments antonin gregori derangere anouk grinberg aurelien recoing niels arestrup yann collette laure duthilleul david assaraf pascal demolon jean baptiste iera richard sammel vincent crouzet fred epaud pascal elso nicolas giraud michael abiteboul gabriel le bomin psychiatrist probe mind traumatized soldier attempt unlock secret drove gentle deeply disturbed world war veteran edge insanity drama war'], [b'milka film taboos milka elokuva tabuista irma huntus leena suomu matti turunen eikka lehtonen esa niemela sirkka metsasaari tauno lehtihalmes ulla tapaninen toivo tuomainen hellin auvinen salmi rauni mollberg small finnish lapland community milka innocent year old girl live mother miss dead father prays god love haymaking employ drama'], [b'sleeping car david naughton judie aronson kevin mccarthy jeff conaway dani minnick ernestine mercer john carl buechler gary brockette steve lundquist billy stevenson michael scott bicknell david coburn nicole hansen tiffany million robert ruth douglas curtis jason david naughton move abandon train car resurrect vicious ghost landlady dead husband mister near fatal encounter comedy horror']]我知道演員不是大小相同的數組，但搜索幾個類似的問題（即question1，question2）無法解決我的問題。如果您想復制該問題，也請關注我的colab notebook ，如果我遺漏了任何重復的問題，請在評論中寫下。

查看完整描述

2 回答

長風秋雁

TA貢獻1757條經驗獲得超7個贊

您需要將這些字符串轉換為向量，并將它們填充為相等的長度。我將向您展示一個示例partial_x_train_actors_array：

import tensorflow as tf

partial_x_train_actors_array = [b'victor mclaglen', b'jon hall', b'frances farmer',

b'olympe bradna', b'gene lockhart', b'douglass dumbrille',

b'francis ford', b'ben welden', b'abner biberman',

b'pedro de cordoba', b'rudy robles', b'bobby stone',

b'nellie duran', b'james flavin', b'nina campana']

tok = tf.keras.preprocessing.text.Tokenizer(char_level=True)

tok.fit_on_texts(partial_x_train_actors_array)

seq = tok.texts_to_sequences(partial_x_train_actors_array)

這seq看起來像：

[[20, 10, 11, 16, 7, 4, 5, 12, 11, 6, 1, 17, 6, 2, 3],

[21, 7, 3, 5, 22, 1, 6, 6],

[14, 4, 1, 3, 11, 2, 13, 5, 14, 1, 4, 12, 2, 4],

[7, 6, 18, 12, 19, 2, 5, 8, 4, 1, 9, 3, 1],

[17, 2, 3, 2, 5, 6, 7, 11, 28, 22, 1, 4, 16],

[9, 7, 15, 17, 6, 1, 13, 13, 5, 9, 15, 12, 8, 4, 10, 6, 6, 2],

[14, 4, 1, 3, 11, 10, 13, 5, 14, 7, 4, 9],

[8, 2, 3, 5, 29, 2, 6, 9, 2, 3],

[1, 8, 3, 2, 4, 5, 8, 10, 8, 2, 4, 12, 1, 3],

[19, 2, 9, 4, 7, 5, 9, 2, 5, 11, 7, 4, 9, 7, 8, 1],

[4, 15, 9, 18, 5, 4, 7, 8, 6, 2, 13],

[8, 7, 8, 8, 18, 5, 13, 16, 7, 3, 2],

[3, 2, 6, 6, 10, 2, 5, 9, 15, 4, 1, 3],

[21, 1, 12, 2, 13, 5, 14, 6, 1, 20, 10, 3],

[3, 10, 3, 1, 5, 11, 1, 12, 19, 1, 3, 1]]

然后，將序列填充為等長：

padded = tf.keras.preprocessing.sequence.pad_sequences(seq)

array([[ 0, 0, 0, 20, 10, 11, 16, 7, 4, 5, 12, 11, 6, 1, 17, 6, 2, 3],

[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 7, 3, 5, 22, 1, 6, 6],

[ 0, 0, 0, 0, 14, 4, 1, 3, 11, 2, 13, 5, 14, 1, 4, 12, 2, 4],

[ 0, 0, 0, 0, 0, 7, 6, 18, 12, 19, 2, 5, 8, 4, 1, 9, 3, 1],

[ 0, 0, 0, 0, 0, 17, 2, 3, 2, 5, 6, 7, 11, 28, 22, 1, 4, 16],

[ 9, 7, 15, 17, 6, 1, 13, 13, 5, 9, 15, 12, 8, 4, 10, 6, 6, 2],

[ 0, 0, 0, 0, 0, 0, 14, 4, 1, 3, 11, 10, 13, 5, 14, 7, 4, 9],

[ 0, 0, 0, 0, 0, 0, 0, 0, 8, 2, 3, 5, 29, 2, 6, 9, 2, 3],

[ 0, 0, 0, 0, 1, 8, 3, 2, 4, 5, 8, 10, 8, 2, 4, 12, 1, 3],

[ 0, 0, 19, 2, 9, 4, 7, 5, 9, 2, 5, 11, 7, 4, 9, 7, 8, 1],

[ 0, 0, 0, 0, 0, 0, 0, 4, 15, 9, 18, 5, 4, 7, 8, 6, 2, 13],

[ 0, 0, 0, 0, 0, 0, 0, 8, 7, 8, 8, 18, 5, 13, 16, 7, 3, 2],

[ 0, 0, 0, 0, 0, 0, 3, 2, 6, 6, 10, 2, 5, 9, 15, 4, 1, 3],

[ 0, 0, 0, 0, 0, 0, 21, 1, 12, 2, 13, 5, 14, 6, 1, 20, 10, 3],

[ 0, 0, 0, 0, 0, 0, 3, 10, 3, 1, 5, 11, 1, 12, 19, 1, 3, 1]])

最后：

ds = tf.data.Dataset.from_tensor_slices(padded)

next(iter(ds))

<tf.Tensor: shape=(18,), dtype=int32, numpy=

array([ 0, 0, 0, 20, 10, 11, 16, 7, 4, 5, 12, 11, 6, 1, 17, 6, 2,

3])>

如果出于任何原因，您需要所有輸入（不僅僅是partial_x_train_actors_array）具有相同的填充形狀，您可以使用該maxlen參數。

反對回復 2023-03-22

精慕HU

TA貢獻1845條經驗獲得超8個贊

其中一個數據數組（即partial_x_train_actors_array）的元素沿第二個維度具有不同的長度（這就是錯誤抱怨沒有矩形的原因）。因此，您應該使它們具有相同的大?。ɡ缤ㄟ^填充或截斷），或者使用結構RaggedTensor（doc、guide）來存儲和處理它：

partial_x_train_actors_array = tf.ragged.constant(...)

在您希望按原樣獲取數據并使用tf.data.DatasetAPI（例如內部map方法）對其執行自定義或復雜處理的情況下，后一種方法特別有用和高效。

反對回復 2023-03-22

2 回答
0 關注
163 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

無法將“字符串”列表轉換為 tf.Dataset.from_tensor_slicer()

無法將“字符串”列表轉換為 tf.Dataset.from_tensor_slicer()

2 回答

添加回答