首頁猿問檢查數組中的每個元素是否都符合條件

檢查數組中的每個元素是否都符合條件

MongoDB

嚕嚕噠 2019-11-19 10:36:50

我有一些文件：date: Dateusers: [ { user: 1, group: 1 } { user: 5, group: 2 }]date: Dateusers: [ { user: 1, group: 1 } { user: 3, group: 2 }]我想查詢該集合以查找所有文檔，其中我的用戶數組中的每個用戶ID都位于另一個數組[1、5、7]中。在此示例中，僅第一個文檔匹配。我一直能找到的最佳解決方案是：$where: function() { var ids = [1, 5, 7]; return this.users.every(function(u) { return ids.indexOf(u.user) !== -1; });}不幸的是，這似乎損害了性能，在$ where文檔中指出：$ where評估JavaScript，無法利用索引。如何改善此查詢？

查看完整描述

3 回答

阿波羅的戰車

TA貢獻1862條經驗獲得超6個贊

您想要的查詢是這樣的：

db.collection.find({"users":{"$not":{"$elemMatch":{"user":{$nin:[1,5,7]}}}}})

這就是說，找到所有沒有元素不在列表1,5,7之外的文檔。

反對回復 2019-11-19

森欄

TA貢獻1810條經驗獲得超5個贊

我不知道更好，但是有幾種不同的方法可以解決此問題，具體取決于您可用的MongoDB版本。

不太確定這是否符合您的意圖，但是所示查詢將與第一個文檔示例匹配，因為在實現邏輯時，您正在匹配文檔數組中必須包含在樣本數組中的元素。

因此，如果您實際上希望文檔包含所有這些元素，那么$all操作員將是顯而易見的選擇：

db.collection.find({ "users.user": { "$all": [ 1, 5, 7 ] } })

但是，在假設您的邏輯實際上是預期的前提下，至少根據建議，您可以通過與$in運算符組合來“過濾”這些結果，從而減少需要您處理的文檔$where**條件在評估的JavaScript中：

db.collection.find({

"users.user": { "$in": [ 1, 5, 7 ] },

"$where": function() {

var ids = [1, 5, 7];

return this.users.every(function(u) {

return ids.indexOf(u.user) !== -1;

});

}

})

盡管實際掃描的結果將與匹配文檔中數組中元素的數量相乘，但您會得到一個索引，但是與沒有附加過濾器相比，它仍然更好。

甚至可能是你考慮的邏輯抽象$and結合使用，運營商$or也可能是$size根據您的實際情況數組操作：

db.collection.find({

"$or": [

{ "users.user": { "$all": [ 1, 5, 7 ] } },

{ "users.user": { "$all": [ 1, 5 ] } },

{ "users.user": { "$all": [ 1, 7 ] } },

{ "users": { "$size": 1 }, "users.user": 1 },

{ "users": { "$size": 1 }, "users.user": 5 },

{ "users": { "$size": 1 }, "users.user": 7 }

]

})

因此，這是匹配條件所有可能排列的產物，但是性能可能會根據可用的安裝版本而有所不同。

注意：實際上，在這種情況下完全失敗，因為這樣做完全不同，并且實際上導致邏輯上的失敗。$in

備選方案是使用聚合框架，這取決于收集中的文檔數量，MongoDB 2.6及更高版本的一種方法可能會影響哪種效率最高。

db.problem.aggregate([

// Match documents that "could" meet the conditions

{ "$match": {

"users.user": { "$in": [ 1, 5, 7 ] }

}},

// Keep your original document and a copy of the array

{ "$project": {

"_id": {

"_id": "$_id",

"date": "$date",

"users": "$users"

"users": 1,

}},

// Unwind the array copy

{ "$unwind": "$users" },

// Just keeping the "user" element value

{ "$group": {

"_id": "$_id",

"users": { "$push": "$users.user" }

}},

// Compare to see if all elements are a member of the desired match

{ "$project": {

"match": { "$setEquals": [

{ "$setIntersection": [ "$users", [ 1, 5, 7 ] ] },

"$users"

]}

}},

// Filter out any documents that did not match

{ "$match": { "match": true } },

// Return the original document form

{ "$project": {

"_id": "$_id._id",

"date": "$_id.date",

"users": "$_id.users"

}}

])

因此，該方法使用一些新引入的集合運算符來比較內容，但是當然您需要重組數組才能進行比較。

如所指出的，有一個直接的運算符可以做到這一點，$setIsSubset其中在單個運算符中可以實現上述組合運算符的等效功能：

db.collection.aggregate([

{ "$match": {

"users.user": { "$in": [ 1,5,7 ] }

}},

{ "$project": {

"_id": {

"_id": "$_id",

"date": "$date",

"users": "$users"

"users": 1,

}},

{ "$unwind": "$users" },

{ "$group": {

"_id": "$_id",

"users": { "$push": "$users.user" }

}},

{ "$project": {

"match": { "$setIsSubset": [ "$users", [ 1, 5, 7 ] ] }

}},

{ "$match": { "match": true } },

{ "$project": {

"_id": "$_id._id",

"date": "$_id.date",

"users": "$_id.users"

}}

])

或者采用另一種方法，同時仍然利用$sizeMongoDB 2.6 中的運算符：

db.collection.aggregate([

// Match documents that "could" meet the conditions

{ "$match": {

"users.user": { "$in": [ 1, 5, 7 ] }

}},

// Keep your original document and a copy of the array

// and a note of it's current size

{ "$project": {

"_id": {

"_id": "$_id",

"date": "$date",

"users": "$users"

"users": 1,

"size": { "$size": "$users" }

}},

// Unwind the array copy

{ "$unwind": "$users" },

// Filter array contents that do not match

{ "$match": {

"users.user": { "$in": [ 1, 5, 7 ] }

}},

// Count the array elements that did match

{ "$group": {

"_id": "$_id",

"size": { "$first": "$size" },

"count": { "$sum": 1 }

}},

// Compare the original size to the matched count

{ "$project": {

"match": { "$eq": [ "$size", "$count" ] }

}},

// Filter out documents that were not the same

{ "$match": { "match": true } },

// Return the original document form

{ "$project": {

"_id": "$_id._id",

"date": "$_id.date",

"users": "$_id.users"

}}

])

當然，哪一個仍然可以完成，盡管在2.6之前的版本中要花更長的時間：

db.collection.aggregate([

// Match documents that "could" meet the conditions

{ "$match": {

"users.user": { "$in": [ 1, 5, 7 ] }

}},

// Keep your original document and a copy of the array

{ "$project": {

"_id": {

"_id": "$_id",

"date": "$date",

"users": "$users"

"users": 1,

}},

// Unwind the array copy

{ "$unwind": "$users" },

// Group it back to get it's original size

{ "$group": {

"_id": "$_id",

"users": { "$push": "$users" },

"size": { "$sum": 1 }

}},

// Unwind the array copy again

{ "$unwind": "$users" },

// Filter array contents that do not match

{ "$match": {

"users.user": { "$in": [ 1, 5, 7 ] }

}},

// Count the array elements that did match

{ "$group": {

"_id": "$_id",

"size": { "$first": "$size" },

"count": { "$sum": 1 }

}},

// Compare the original size to the matched count

{ "$project": {

"match": { "$eq": [ "$size", "$count" ] }

}},

// Filter out documents that were not the same

{ "$match": { "match": true } },

// Return the original document form

{ "$project": {

"_id": "$_id._id",

"date": "$_id.date",

"users": "$_id.users"

}}

])

通常，這會找出不同的方法，嘗試一下，看看哪種方法最適合您。$in與您現有表單的簡單組合很可能是最好的組合。但是在所有情況下，請確保您具有可以選擇的索引：

db.collection.ensureIndex({ "users.user": 1 })

只要您以某種方式訪問它，這將為您提供最佳性能，如此處的所有示例所示。

判決

我對此很感興趣，因此最終設計了一個測試用例，以查看性能最佳的產品。因此，首先生成一些測試數據：

var batch = [];

for ( var n = 1; n <= 10000; n++ ) {

var elements = Math.floor(Math.random(10)*10)+1;

var obj = { date: new Date(), users: [] };

for ( var x = 0; x < elements; x++ ) {

var user = Math.floor(Math.random(10)*10)+1,

group = Math.floor(Math.random(10)*10)+1;

obj.users.push({ user: user, group: group });

}

batch.push( obj );

if ( n % 500 == 0 ) {

db.problem.insert( batch );

batch = [];

}

集合中有10000個文檔，其中長度為1..10的隨機數組保持1..0的隨機值，我得出了430個文檔的匹配計數（從$inmatch的7749減少），結果如下（平均）：

JavaScript with $in子句：420ms

總有$size：395ms

帶有組數組計數的聚合：650ms

包含兩個集合運算符的集合：275ms

聚合時間$setIsSubset：250ms

請注意，除了最后兩個樣本外，所有樣本均完成了約100ms 的峰值方差，并且最后兩個樣本均顯示了220ms的響應。最大的變化是在JavaScript查詢中，該查詢的結果也慢了100毫秒。

但是這里的要點是與硬件相關的，在我的筆記本電腦上的VM下，硬件并不是特別出色，但是可以提供一個思路。

因此，總體而言，特別是具有集合運算符的MongoDB 2.6.1版本顯然會在性能上勝出，而$setIsSubset作為單個運算符還會帶來一點額外收益。

鑒于（如2.4兼容方法所示），此過程中的最大開銷將是$unwind語句（超過100ms avg），因此，這特別有趣，因此，$in選擇的平均時間約為32ms，其余流水線階段將在不到100ms內執行一般。這樣就給出了聚合與JavaScript性能的相對概念。

反對回復 2019-11-19

小唯快跑啊

TA貢獻1863條經驗獲得超2個贊

我只是花了大部分時間試圖通過對象比較而不是嚴格的相等性來實現上述Asya的解決方案。所以我想在這里分享。

假設您將問題從userIds擴展到了完整用戶。您想查找所有文檔，其中其users數組中的每個項目都出現在另一個用戶數組中：[{user: 1, group: 3}, {user: 2, group: 5},...]

這是行不通的：db.collection.find({"users":{"$not":{"$elemMatch":{"$nin":[{user: 1, group: 3},{user: 2, group: 5},...]}}}}})因為$ nin僅適用于嚴格的平等。因此，我們需要找到一種不同的方式來表示對象數組的“不在數組中”。并且使用$where會大大降低查詢速度。

解：

db.collection.find({

"users": {

"$not": {

"$elemMatch": {

// if all of the OR-blocks are true, element is not in array

"$and": [{

// each OR-block == true if element != that user

"$or": [

"user": { "ne": 1 },

"group": { "ne": 3 }

]

}, {

"$or": [

"user": { "ne": 2 },

"group": { "ne": 5 }

]

}, {

// more users...

}]

}

})

完善邏輯：$ elemMatch匹配數組中沒有用戶的所有文檔。因此$ not將匹配數組中所有用戶的所有文檔。

反對回復 2019-11-19

3 回答
0 關注
1127 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

檢查數組中的每個元素是否都符合條件

檢查數組中的每個元素是否都符合條件

3 回答

添加回答