午夜毛片观看不卡,亚洲视频伊人电影网

首頁免費課 NLP基礎+實戰讓機器“寫小說” 筆記

NLP基礎+實戰讓機器“寫小說”

最熱最新

雷紫薇 06:10
```
mnist=input_data.read_sets('Mnist_dataset',one_hot=True)
```
查看全部

0 采集收起來源：帶隱藏層的邏輯回歸代碼實戰
2024-05-06
雷紫薇 03:21

#載入數據集
mnist=input_data.read_sets('

查看全部

0 采集收起來源：帶隱藏層的邏輯回歸代碼實戰
2024-05-06
2020064680 01:58

什么是nlp？
nlp是英文natural language processing的英文縮寫，中文翻譯就是自然語言處理。他是一個交叉性的學科，包含以下內容：
1、計算機科學
2、人工智能
3、語言學
什么是自然語言
?·語言是人類交際的工具，是人類思維的載體
?·人造語言：編程語言，包括c++，basic等
?·自然語言：
? ? ? ?·形式：口語、書面語、手語
? ? ? ·語種：漢語，英語、日語、法語......
語言是研究語言規律的科學?

語言的構成：
? ? 語言：詞匯、和語法
? ? ? ? ?詞匯：詞和熟語
? ? ? ? ?詞：詞素
? ?語法：詞法和句法
? ? ? ? ? ?詞法：構形法和構詞法
? ? ? ? ? 句法：詞組造句法和造句法

自然語言的特點：
? ? ·自然語言充滿歧義，很難完全消解
? ? ? ? ·句法結構歧義
? ? ? ? ? ? ? ? ? ·咬死了獵人的狗
? ? ? ? ? ? ? ? ? 三個大學老師
? ? ·詞義歧義
? ? ? ? ? 他說：“她這個人真有意思”。她說：“他這個人怪有意思的”。于是人們以為他們有了那種意思，并讓他向她意思意思。他火了：“我根本沒有那個意思”！她也生氣了：“你們這么說是什么意思”？事后有人說，“真有意思”。也有人說：“真沒意思”。

查看全部

0 采集收起來源：NLP簡介
2022-04-18
666小刀666 07:55

import pickle

# 定義符號表
def token_lookup():
? ?symbols = {"。","，","“","”","；","，","！","？","（","）","-","\n"}
? ?tokens = {"P","C","Q","T","S","E","M","I","O","D","R"}
? ?return dict(zip(symbols,tokens))

# 保存預處理數據到指定的二進制文件中
def save_data(token,vocab_to_int,int_to_vocab):
? ?pickle.dump((token,vocab_to_int,int_to_vocab),"data\preprocess.p","wb")

# 從保存的數據文件加載到內存
def load_data():
? ?return pickle.load(open('data\preprocess.p',mode='rb'))

# 保存模型參數到二進制文件
def save_parameter(params):
? ?pickle.dump(params,open("data\params.p",'wb'))

# 加載模型參數到內存
def load_parameter():
? ?return pickle.load(open("data\params.p",mode="rb"))

查看全部

0 采集收起來源：字符預處理和模型相關數據保存
2021-07-31
666小刀666 11:57

# 1. 從文件中提取停止詞和訓練文本
def read_data():
? ?# 讀取停用詞
? ?stop_words = []
? ?with open("data/stop_words.txt","r",encoding="utf-8") as fStopWords:
? ? ? ?line = fStopWords.readline()
? ? ? ?while line:
? ? ? ? ? ?stop_words.append((line[:-1])) ? ? ?# 去\n
? ? ? ? ? ?line = fStopWords.readline()
? ?stop_words = set(stop_words)
? ?print("停用詞讀取完畢,共{n}個詞".format(n=len(stop_words)))

? ?# 讀取文本,預處理,粉刺,去除停用詞.得到詞典
? ?s_folder_path = "data/materials"
? ?ls_files = []
? ?for root,dirs,files in os.walk(s_folder_path):
? ? ? ?for file in files:
? ? ? ? ? ?if file.endswith(".txt"):
? ? ? ? ? ? ? ?ls_files.append(os.path.join(root,file))

? ?raw_word_list = []
? ?for item in ls_files:
? ? ? ?with open(item,"r",encoding="utf-8") as f:
? ? ? ? ? ?line = f.readline()
? ? ? ? ? ?while line:
? ? ? ? ? ? ? ?while "\n" in line:
? ? ? ? ? ? ? ? ? ?line = line.replace("\n","")
? ? ? ? ? ? ? ?while " " in line:
? ? ? ? ? ? ? ? ? ?line = line.replace(" ","")

? ? ? ? ? ? ? ?# 如果句子非空
? ? ? ? ? ? ? ?if len(line) > 0:
? ? ? ? ? ? ? ? ? ?raw_words = list(jieba.cut(line,cut_all=False))
? ? ? ? ? ? ? ? ? ?for _item in raw_words:
? ? ? ? ? ? ? ? ? ? ? ?# 去除停用詞
? ? ? ? ? ? ? ? ? ? ? ?if _item not in raw_words:
? ? ? ? ? ? ? ? ? ? ? ? ? ?raw_word_list.append(_item)
? ? ? ? ? ? ? ?lin = f.readline()
? ?return raw_word_list

words = read_data()
print("Data size:",len(words))

查看全部

0 采集收起來源：讀取停用詞并對訓練文本數據進行預處理
2021-07-31

666小刀666 12:09

#?5.?訓練模型

#?定義模型步長
num_steps?=?100001

with?tf.compat.v1.Session(graph=graph)?as?session:
????init.run()

????average_loss?=?0
????for?step?in?range(num_skips):
????????batch_inputs?,batch_labels?=?generate_batch(batch_size,num_skips,skip_window)
????????feed_dict?=?{train_inputs:batch_inputs,train_labels:batch_labels}

????????_,loss_val?=?session.run([optimizer,loss],feed_dict=feed_dict)
????????average_loss?+=?loss_val

????????if?step?%?2000?==?0:
????????????if?step?>?0:
????????????????average_loss?/=?2000
????????????print("average?loss?at?step:",step,":",average_loss)
????????????average_loss?=?0

????????if?step?%?10000?==?0:
????????????sim?=?similary.eval()
????????????valid_word?=?reverse_dictionary[valid_examples]
????????????top_k?=?8
????????????nearest?=?(-sim[i,:]).argsort()[:top_k]
????????????log_str?=?"Nearest?to?%s"%?valid_word
????????????for?k?in?range(top_k):
????????????????close_word?=?reverse_dictionary[nearest[k]]
????????????????log_str?=?"%s?%s,"%(log_str,close_word)
????????????print(log_str)
????final_embeddings?=?normnalized_embeddings.eval()

#?6.?輸出向量
with?open('output/word2vect.text',"w",encoding="utf-8")?as?fw2v:
????fw2v.write(str(vocabulary_size)?+?"?"?+?str(embedding_size)?+?"\n")
????for?i?in?range(final_embeddings.shape[0]):
????????sword?=?reverse_dictionary[i]
????????svector?=?""

????????for?j?in?range(final_embeddings.shape[1]):
????????????svector?=?svector?+?"?"?+?str(final_embeddings[i,j])
????????fw2v.write(sword,svector?+?"\n")

查看全部

0 采集收起來源：為skip-gram訓練模型

2021-07-31

666小刀666 17:31
# 4. 構建模型
batch_size = 128
embedding_size = 100
skip_window = 1
num_skips = 2
valid_size = 4 ? ? ?# 切記這個數字要和len(valid_word) 對應,否則會報錯
valid_window = 100
num_sampled = 64

# 驗證集
valid_word = ["說","實力","害怕","少林寺"]
valid_examples = [dictionary[li] for li in valid_word]

graph = tf.Graph()
with graph.as_default():
? ?# 輸入數據
? ?train_inputs = tf.compat.v1.placeholder(tf.int32,shape=[batch_size])
? ?train_labels = tf.compat.v1.placeholder(tf.int32,shape=[batch_size, 1])
? ?valid_dataset = tf.constant(valid_examples,dtype=tf.int32)

? ?# 權重矩陣
? ?embeddings = tf.Variable(tf.random.uniform([vocabulary_size,embedding_size],-1.0,1.0))

? ?# 選取張量 embeddings 中對應train_inputs 索引值
? ?embed = tf.nn.embedding_lookup(embeddings,train_inputs)

? ?# 轉化變量輸入,適配 NCE
? ?nce_weights = tf.Variable(tf.random.truncated_normal([vocabulary_size,embedding_size],stddev=1.0/math.sqrt(embedding_size)))

? ?nce_biases = tf.Variable(tf.zeros([vocabulary_size]),dtype=tf.float32)

? ?# 定義損失函數
? ?loss = tf.reduce_mean(tf.nn.nce_loss(weights=nce_weights,biases=nce_biases,lables=train_labels,num_sampled=num_sampled,num_classes=vocabulary_size))

? ?# 優化器
? ?optimizer = tf.compat.v1.train.GradientDescentOptimizer(1.0).minimize(loss)

? ?# 使用所學的詞向量來計算一個給定的minibatch與所有單詞之間的相識度
? ?norm = tf.sqrt(tf.reduce_mean(tf.square(embeddings),1,keepdims=True))
? ?normnalized_embeddings = embeddings / norm
? ?valid_embeddings = tf.nn.embedding_lookup(normnalized_embeddings,valid_dataset)
? ?similary = tf.matmul(valid_embeddings,normnalized_embeddings,transpose_b=True)

? ?init = tf.compat.v1.global_variables_initializer()
? ?
查看全部

0 采集收起來源：為skip-gram構建模型
2021-07-31
666小刀666 01:19

data,count,dictionary,reverse_dictionary = build_dataset(arg_words=words)
# 刪除 words 節省內存
del words

data_index = 0

# 3. 為 skip_gram 模型生成訓練參數
def generate_batch(arg_batch_size,arg_num_skips,arg_ski_windows):
? ?global data_index

? ?l_batch = np.ndarray(shape=arg_batch_size,dtype=np.int32) ? # (1,arg_batch_size)
? ?l_labels = np.ndarray(shape=(arg_batch_size,1),dtype=np.int32) ? #(arg_batch_size,1)
? ?span = 2 * arg_ski_windows + 1 # [我愛祖國]
? ?buffer = collections.deque(maxlen=span)

? ?for _ in range(span):
? ? ? ?buffer.append(data[data_index])
? ? ? ?data_index = (data_index + 1) % len(data)
? ?for i in range(arg_batch_size // arg_num_skips):
? ? ? ?target = arg_ski_windows
? ? ? ?targets_to_avoid = [arg_ski_windows]

? ? ? ?for j in range(arg_num_skips):
? ? ? ? ? ?while target in targets_to_avoid:
? ? ? ? ? ? ? ?target = random.randint(0,span - 1)
? ? ? ? ? ?targets_to_avoid.append(target)
? ? ? ? ? ?l_batch[i * arg_num_skips + j] = buffer[arg_ski_windows]
? ? ? ? ? ?l_labels[i * arg_ski_windows + j, 0] = buffer[target]
? ? ? ?buffer.append(data[data_index])
? ? ? ?data_index = (data_index + 1) % len(data)

? ?return l_batch, l_labels

# 顯示示例
batch,lables = generate_batch(arg_batch_size = 8, arg_num_skips = 2, arg_ski_windows = 1)
for i in range(8):
? ?print(batch[i],reverse_dictionary[batch[i]], "->", lables[i,0], reverse_dictionary[lables[i,0]])

查看全部

0 采集收起來源：為skip-gram生成相關參數
2021-07-31
666小刀666 07:37

# 2. 建立詞典以及生僻詞用 UNK 代替
vocabulary_size = 100000

def build_dataset(arg_words):
? ?# 詞匯編碼
? ?l_count = [["UNK",-1]]
? ?l_count.extend((collections.Counter(arg_words).most_common(vocabulary_size - 1)))
? ?print("l_count:",len(l_count))

? ?l_dictionary = dict()
? ?for word, _ in l_count:
? ? ? ?l_dictionary[word] = len(l_dictionary)

? ?# 使用生成的詞匯編碼將前面的 string list[arg_words] 轉為 num list[data]
? ?l_data = list[]
? ?unk_count = 0
? ?for word in arg_words:
? ? ? ?if word in l_dictionary:
? ? ? ? ? ?index = l_dictionary[word]
? ? ? ?else:
? ? ? ? ? ?index = 0
? ? ? ? ? ?unk_count += 1
? ? ? ?l_data.append(index)
? ?l_count[0][1] = unk_count

? ?# 反轉字典key為詞匯編碼,values為詞匯本身
? ?l_reverse_dictionary = dict(zip(l_dictionary.values(),l_dictionary.keys()))
? ?return l_data,l_count,l_dictionary,l_reverse_dictionary

# 刪除 words 節省內存
del words

data_index = 0

查看全部

0 采集收起來源：建立詞典
2021-07-31
慕妹2432231 00:39

記憶的概念

查看全部

0 采集收起來源：RNN處理NLP簡介
2020-11-04
蛋卷夾夾夾夾心 03:01

開發準備（二）

查看全部

0 采集收起來源：機器學習開發環境準備
2020-03-13
蛋卷夾夾夾夾心 01:27

開發準備（一）

查看全部

0 采集收起來源：機器學習開發環境準備
2020-03-13
蛋卷夾夾夾夾心 06:52

Reduction_indices=0 ：按行壓縮
Reduction_indices=1 ：按列壓縮

查看全部

0 采集收起來源：機器學習簡要介紹
2020-03-12
蛋卷夾夾夾夾心 04:04

圖片好形象

查看全部

0 采集收起來源：機器學習簡要介紹
2020-03-12
板栗的小栗子 04:23

這個筆記是干什么用的啊

查看全部

0 采集收起來源：機器學習簡要介紹
2020-03-06
Coder_zheng 06:52

豁然開朗。

查看全部

0 采集收起來源：機器學習簡要介紹
2020-03-04
幽幽樹 08:29

導入 seq2seq, 用他來計算算是函數loss

查看全部

0 采集收起來源：構建RNN圖計算
2020-03-01
幽幽樹 08:12

為詞向量創建嵌入層，提升效率

查看全部

0 采集收起來源：構建RNN3層網絡
2020-02-29
幽幽樹 02:59

LSTM模型構建的一些參數

查看全部

0 采集收起來源：構建LSTM單元
2020-02-29
幽幽樹 01:08

RNN與LSTM
RNN對之前的數據有記憶，但不可能長期保持這些記憶，否則會帶來數據分析和保存的問題。
LSTM是RNN的一種延申，選擇性記憶。使用Dropout把最該記憶的學習下來并保存

查看全部

0 采集收起來源：RNN處理NLP簡介
2020-02-29
幽幽樹 00:39

為什么有了BP、CNN，還需要RNN。??
傳統的，輸入、輸出獨立。
RNN引入了“記憶”

查看全部

0 采集收起來源：RNN處理NLP簡介
2020-02-29

舉報

0/150

提交

取消

開始學習

課程須知: 1、學員需要具備基本的python開發能力； 2、學員需要掌握基本的矩陣、線性代數基本概念；

老師告訴你能學到什么？: 1、了解機器學習發展情況 2、了解NLP當前應用場景 3、引入隱藏層，提高傳統機器學習算法性能 4、中文分詞框架的使用，熟悉中文自然語言處理中對材料進行預處理的過程 5、TF框架的使用，掌握基本的API操作和模型構建過程 6、通過RNN代碼實踐，初步實現“AI寫小說”效果

微信掃碼，參與3人拼團

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

NLP基礎+實戰 讓機器“寫小說”

NLP基礎+實戰讓機器“寫小說”