已解決430363個問題，去搜搜看，總會有你想問的

超過 10000 個條目的 JMS 序列化程序性能問題

首頁猿問超過 10000 個條目的 JMS...

超過 10000 個條目的 JMS 序列化程序性能問題

PHP

海綿寶寶撒 2021-10-15 15:18:20

目前我正在構建一個可以更新我的 ElasticSearch 索引的 PHP 命令。但是，我注意到的一件大事是，當我的數組包含超過 10000 個實體時，序列化實體會花費太多時間。我認為它會是線性的，但是 6 或 9k 實體都需要一分鐘（6 或 9k 之間沒有太大區別），但是當您超過 10k 時，它只會減慢到最多需要 10 分鐘的程度。... // we iterate on the documents previously requested to the sql database foreach($entities as $index_name => $entity_array) { $underscoreClassName = $this->toUnderscore($index_name); // elasticsearch understands underscored names $camelcaseClassName = $this->toCamelCase($index_name); // sql understands camelcase names // we get the serialization groups for each index from the config file $groups = $indexesInfos[$underscoreClassName]['types'][$underscoreClassName]['serializer']['groups']; foreach($entity_array as $entity) { // each entity is serialized as a json array $data = $this->serializer->serialize($entity, 'json', SerializationContext::create()->setGroups($groups)); // each serialized entity as json is converted as an Elastica document $documents[$index_name][] = new \Elastica\Document($entityToFind[$index_name][$entity->getId()], $data); } }...有一整節課都圍繞著這件事，但這就是花費大部分時間的事情。我可以理解序列化是一項繁重的操作并且需要時間，但是為什么 6、7、8 或 9k 之間幾乎沒有區別，但是當實體超過 10k 時，它需要花費很多時間？PS：作為參考，我在 github 上打開了一個問題。編輯：為了更準確地解釋我想要做的事情，我們在 Symfony 項目上有一個 SQL 數據庫，使用 Doctrine 將兩者鏈接起來，并且我們正在使用 ElasticSearch（以及捆綁 FOSElastica 和 Elastica）將我們的數據索引到 ElasticSearch。問題是，雖然 FOSElastica 負責更新 SQL 數據庫中更新的數據，但它不會更新包含此數據的每個索引。（例如，如果你有一個作者和他寫的兩本書，在 ES 中你會有兩本書，里面有作者和作者。FOSElastica 只更新作者，而不是兩本書中關于作者的信息）。因此，為了解決這個問題，我正在編寫一個腳本，該腳本偵聽通過 Doctrine 完成的每次更新，從而獲取與更新相關的每個 ElasticSearch 文檔，并對其進行更新。這有效，但在我的壓力測試中太長了，需要更新 10000 多個大文檔。編輯：要添加有關我嘗試過的內容的更多信息，我在使用 FOSElastica 的“populate”命令時遇到了同樣的問題。9k的時候，一切都很好，很流暢，10k的時候，真的需要很長時間。目前我正在運行測試，減少我的腳本中數組的大小并重置它，到目前為止沒有運氣。

查看完整描述

2 回答

守著星空守著你

TA貢獻1799條經驗獲得超8個贊

我改變了我的算法的工作方式，首先獲取所有需要更新的 id，然后以 500-1000 的批次從數據庫中獲取它們（我正在運行測試）。

* to avoid creating arrays with too much objects, we loop on the ids and split them by DEFAULT_BATCH_SIZE

* this way we get them by packs of DEFAULT_BATCH_SIZE and add them by the same amount

for ($i = 0 ; $i < sizeof($idsToRequest) ; $i++) {

$currentSetOfIds[] = $idsToRequest[$i];

// every time we have DEFAULT_BATCH_SIZE ids or if it's the end of the loop we update the documents

if ($i % self::DEFAULT_BATCH_SIZE == 0 || $i == sizeof($idsToRequest)-1) {

if ($currentSetOfIds) {

// retrieves from the database a batch of entities

$entities = $thatRepo->findBy(array('id' => $currentSetOfIds));

// serialize and create documents with the entities we got earlier

foreach($entities as $entity) {

$data = $this->serializer->serialize($entity, 'json', SerializationContext::create()->setGroups($groups));

$documents[] = new \Elastica\Document($entityToFind[$indexName][$entity->getId()], $data);

}

// update all the documents serialized

$elasticaType->updateDocuments($documents);

// reset of arrays

$currentSetOfIds = [];

$documents = [];

}

我正在以相同的數量更新它們，但它仍然沒有提高序列化方法的性能。我真的不明白它與序列化程序有什么不同，我有 9k 或 10k 個實體，而它從來不知道......

反對回復 2021-10-15

阿波羅的戰車

TA貢獻1862條經驗獲得超6個贊

在我看來，您應該檢查內存消耗：您正在構建一個大數組，其中列出了很多對象。

您有兩種解決方案：使用生成器避免構建該數組，或者嘗試每“x”次迭代推送您的文檔并重置您的數組。

我希望這能讓您了解如何處理此類遷移。

順便說一句，我差點忘了告訴你避免使用 ORM/ODM 存儲庫來檢索數據（在遷移腳本中）。問題是他們會使用對象并給它們加水，老實說，在龐大的遷移腳本中，你只會永遠等待。如果可能，只需使用 Database 對象，這可能足以滿足您的需求。

反對回復 2021-10-15

2 回答
0 關注
145 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

超過 10000 個條目的 JMS 序列化程序性能問題

超過 10000 個條目的 JMS 序列化程序性能問題

2 回答

添加回答