首頁猿問數據生成Python

數據生成Python

Python

阿晨1998 2023-03-08 11:19:11

我正在嘗試基于現有數據集生成數據集，我能夠實現一種隨機更改文件內容的方法，但我無法將所有這些寫入文件。此外，我還需要將變化的單詞數寫入文件，因為我想用這個數據集來訓練神經網絡，你能幫幫我嗎？輸入：每個文件有 2 行文本。輸出：有 3（可能）行的文件：第一行不變，第二行根據方法更改，第三行顯示更改的單詞數（如果對于深度學習任務最好不這樣做，我會很高興建議，因為我是初學者）from random import randrangeimport osPath = "D:\corrected data\\"filelist = os.listdir(Path)if __name__ == "__main__": new_words = ['consultable', 'partie ', 'celle ', 'également ', 'forte ', 'statistiques ', 'langue ', 'cadeaux', 'publications ', 'notre', 'nous', 'pour', 'suivr', 'les', 'vos', 'visitez ', 'thème ', 'thème ', 'thème ', 'produits', 'coulisses ', 'un ', 'atelier ', 'concevoir ', 'personnalisés ', 'consultable', 'découvrir ', 'fournit ', 'trace ', 'dire ', 'tableau', 'décrire', 'grande ', 'feuille ', 'noter ', 'correspondant', 'propre',] nb_words_to_replace = randrange(10) #with open("1.txt") as file: for i in filelist: # if i.endswith(".txt"): with open(Path + i,"r",encoding="utf-8") as file: # for line in file: data = file.readlines() first_line = data[0] second_line = data[1] print(f"Original: {second_line}") # print(f"FIle: {file}") second_line_array = second_line.split(" ") for j in range(nb_words_to_replace): replacement_position = randrange(len(second_line_array)) old_word = second_line_array[replacement_position] new_word = new_words[randrange(len(new_words))] print(f"Position {replacement_position} : {old_word} -> {new_word}") second_line_array[replacement_position] = new_word res = " ".join(second_line_array) print(f"Result: {res}") with open(Path + i,"w") as f: for line in file: if line == second_line: f.write(res)

查看完整描述

1 回答

鳳凰求蠱

TA貢獻1825條經驗獲得超4個贊

簡而言之，您有兩個問題：

如何正確替換文件的第 2（和 3）行。
如何跟蹤更改的單詞數。

如何正確替換文件的第 2（和 3）行。

你的代碼：

with open(Path + i,"w") as f:

for line in file:

if line == second_line:

f.write(res)

未啟用閱讀。for line in file不管用。f已定義，但file改為使用。要解決此問題，請改為執行以下操作：

with open(Path + i,"r+") as file:

lines = file.read().splitlines() # splitlines() removes the \n characters

lines[1] = second_line

file.writelines(lines)

但是，您想向其中添加更多行。我建議你以不同的方式構建邏輯。

如何跟蹤更改的單詞數。

添加變量changed_words_count并在old_word != new_word

結果代碼：

for i in filelist:

filepath = Path + i

# The lines that will be replacing the file

new_lines = [""] * 3

with open(filepath, "r", encoding="utf-8") as file:

data = file.readlines()

first_line = data[0]

second_line = data[1]

second_line_array = second_line.split(" ")

changed_words_count = 0

for j in range(nb_words_to_replace):

replacement_position = randrange(len(second_line_array))

old_word = second_line_array[replacement_position]

new_word = new_words[randrange(len(new_words))]

# A word replaced does not mean the word has changed.

# It could be replacing itself.

# Check if the replacing word is different

if old_word != new_word:

changed_words_count += 1

second_line_array[replacement_position] = new_word

# Add the lines to the new file lines

new_lines[0] = first_line

new_lines[1] = " ".join(second_line_array)

new_lines[2] = str(changed_words_count)

print(f"Result: {new_lines[1]}")

with open(filepath, "w") as file:

file.writelines(new_lines)

注意：代碼未經測試。

反對回復 2023-03-08

1 回答
0 關注
115 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

數據生成Python

數據生成Python

1 回答

如何正確替換文件的第 2（和 3）行。

添加回答

如何正確替換文件的第 2（和 3）行。