首頁猿問 BERT...

BERT 編碼層在評估期間為所有輸入生成相同的輸出 (PyTorch)

Python

阿晨1998 2022-11-24 15:21:24

我不明白為什么我的 BERT 模型在評估期間返回相同的輸出。我的模型在訓練期間的輸出似乎是正確的，因為值不同，但在評估期間完全相同。這是我的 BERT 模型類class BERTBaseUncased(nn.Module): def __init__(self): super(BERTBaseUncased, self).__init__() self.bert = BertModel.from_pretrained("bert-base-uncased") self.bert_drop = nn.Dropout(0.3) self.out = nn.Linear(768, 4) def forward(self, ids, mask, token_type_ids): _, o2 = self.bert(ids, attention_mask=mask, token_type_ids=token_type_ids) # Use one of the outputs bo = self.bert_drop(o2) return self.out(bo)我的數據集類class BERTDataset: def __init__(self, review, target, tokenizer, classes=4): self.review = review self.target = target self.tokenizer = tokenizer self.max_len = max_len self.classes = classes def __len__(self): return len(self.review) def __getitem__(self, item): review = str(self.review) review = " ".join(review.split()) inputs = self.tokenizer.encode_plus(review, None, add_special_tokens=True, max_length= self.max_len, pad_to_max_length=True, return_token_type_ids=True, return_attention_masks=True) ids = inputs["input_ids"] mask = inputs["attention_mask"] token_type_ids = inputs["token_type_ids"] return { 'ids': torch.tensor(ids, dtype=torch.long), 'mask': torch.tensor(mask, dtype=torch.long), 'token_type_ids': torch.tensor(token_type_ids, dtype=torch.long), 'targets': torch.tensor(to_categorical(self.target[item], self.classes), dtype=torch.float) }

查看完整描述

3 回答

白衣染霜花

TA貢獻1796條經驗獲得超10個贊

萬一其他人遇到問題，也許您忘記使用官方論文中推薦的學習率之一：5e-5、3e-5、2e-5

如果學習率太高（例如 0.01），梯度似乎會極化，從而導致 val 集重復出現相同的 logits。

反對回復 2022-11-24

慕后森

TA貢獻1802條經驗獲得超5個贊

在您的培訓代碼中，您不會返回經過培訓的模型。

看來您正在一個函數內訓練您的模型而不返回它。函數結束后權重將丟失。因此，在評估部分，模型輸出隨機值。

def train_fn(data_loader, model, optimizer, device, scheduler):

model.train()

total_loss = 0.0

for bi, d in tqdm(enumerate(data_loader), total=len(data_loader)):

ids = d['ids']

token_type_ids = d['token_type_ids']

mask = d['mask']

targets = d['targets']

ids = ids.to(device, dtype=torch.long)

token_type_ids = token_type_ids.to(device, dtype=torch.long)

mask = mask.to(device, dtype=torch.long)

targets = targets.to(device, dtype=torch.float)

optimizer.zero_grad()

outputs = model(

ids=ids,

mask=mask,

token_type_ids=token_type_ids

)

loss = loss_fn(outputs, targets)

total_loss += loss.item()

loss.backward()

optimizer.step()

scheduler.step()

return model, total_loss/len(data_loader) # this will help

反對回復 2022-11-24

紅糖糍粑

TA貢獻1815條經驗獲得超6個贊

問題出在我的數據加載器類中。我傳遞了整個數據集，而不僅僅是一行。

def __getitem__(self, item):
        review = str(self.review[item])
        review = " ".join(review.split())

這解決了它。感謝 Zabir Al Nazi 的協助。

反對回復 2022-11-24

3 回答
0 關注
288 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

BERT 編碼層在評估期間為所有輸入生成相同的輸出 (PyTorch)

BERT 編碼層在評估期間為所有輸入生成相同的輸出 (PyTorch)

3 回答

添加回答