2 回答

TA貢獻1868條經驗 獲得超4個贊
我會使用排除停用詞列表中的單詞來處理文本內容array_filter
,然后計算每個單詞的出現次數array_count_values
,然后array_filter
計算出只出現一次的單詞。然后您可以將剩余的單詞(這將是輸出數組的鍵)寫入數據庫。例如:
$content = "How technology is helping to change the way people think about the food on their plate and the food impact for them. Technology could have a role to play in raising awareness of the impact our diets have on the planet.";
$stopwords = array('how', 'is', 'to', 'the', 'way', 'on', 'and', 'for', 'a', 'in', 'of', 'our', 'have');
// count all words in $content not in the stopwords list
$counts = array_count_values(array_filter(explode(' ', strtolower($content)), function ($w) use ($stopwords) {
return !in_array($w, $stopwords);
}));
// filter out words only seen once
$counts = array_filter($counts, function ($v) { return $v > 1; });
// write those words to the database
foreach ($counts as $key => $value) {
$this->db->query("INSERT INTO news (news_id, news_content) VALUES ('$id', '$key')");
}
對于您的示例數據,最終結果$counts將是:
Array
(
[technology] => 2
[food] => 2
[impact] => 2
)

TA貢獻1830條經驗 獲得超9個贊
我相信這里有很多選擇。
這是我的解決方案:您可以使用search_array()它。如果在數組中的 in_array 中未找到其他針,則搜索數組返回 false。如果找到另一個詞,它會返回密鑰。
根據您的需要,您可以使用以下這些選項之一。
//Option 1
//Words that actually appear more than once...
$new_arr = array();
foreach($exp as $key=>$e) {
//Must be this word only (therefore the true-statement
$search = array_search($e, $exp, true);
if ($search !== false && $search != $key) {
$new_arr[] = $e;
}
}
//Option 2
//
//Your question was not totally clear so I add this code as well
//Words with asterixes before and after that appear more than once
$new_arr = array();
foreach($exp as $key=>$e) {
//Two asterixes at the beginning of the sting and two at the end
//strtolower sets **Technology** and **technology** as a duplicate of word
if (substr($e,0,2) == "**" && substr($e,-2,2) == "**") {
$search = array_search(strtolower($e), $exp);
if ($search !== false && $search != $key) {
$new_arr[] = $e;
}
}
}
for($j = 0; $j < count($new_arr); $j++){
$this->db->query("INSERT INTO news (news_id, news_content)
VALUES ('$id', $new_arr[$j])");
}
正如有人在評論中提到的那樣,您應該通過在 INSERT 語句中輸入這種方式來防止 SQL 注入(您應該這樣做),但問題主要是關于在字符串中查找重復項以對它們執行某些操作,因此我不會更進一步有了那個評論。
結果數組$new_arr如下:(選項 1)
array (size=9)
0 => string 'the' (length=3)
1 => string 'the' (length=3)
2 => string '**food**' (length=8)
3 => string 'to' (length=2)
4 => string 'the' (length=3)
5 => string '**impact**' (length=10)
6 => string 'have' (length=4)
7 => string 'on' (length=2)
8 => string 'the' (length=3)
Technology和technology之所以不一樣,是因為它在其中一個詞中是大寫的 T。
結果數組$new_arr如下:(選項 2)
array (size=3)
0 => string '**food**' (length=8)
1 => string '**Technology**' (length=14)
2 => string '**impact**' (length=10)
- 2 回答
- 0 關注
- 115 瀏覽
添加回答
舉報