Enterprise Solr getSize and Attribute Indexing Issues

If you use Magento EE and have enabled Solr for search, you may run into a few issues depending on how you customize your template. One issue is that the Enterprise Search engine for Solr does not properly handle resetting the result count when 0 results are found. The other two have to do with the way it indexes the data; it does not properly add some attributes to the index if you only have “Used for Sorting in Product Listing” enabled and/or they are the datetime attribute type on non-simple products.

The first one is relatively easy to reproduce but probably is not run into very often. On a website I was developing, the client wanted to break out the search results into individual categories and display the top few of each category results on the page. The page would then link out to each individual category to view all the results inside that category. The problem we found was that, if the previous category returned results and the current one it was outputting had none, it would still think that it had the same number of results as the previous category.

Obviously this was not very nice to the display, we had categories getting listed as having results however it had nothing listed under them! The bug was eventually traced down to the _prepareQueryResponse() method in Enterprise_Search_Model_Adapter_Abstract, see if you can spot the issue:

protected function _prepareQueryResponse($response)
{
    $realResponse = $response->response;
    $_docs = $realResponse->docs;
    if (!$_docs) {
        return array();
    }
    $this->_lastNumFound = (int)$realResponse->numFound;
    $result = array();
    foreach ($_docs as $doc) {
        $result[] = $this->_objectToArray($doc);
    }

    return $result;
}

If you thought the issue is at line 6 then you would be correct. When no results are found it just returns an empty array, however, when results are found, it sets $this->_lastNumFound before returning out. The fix to this may seem to be simply extending this class in a module and overriding this function but it is not that easy.

The actual classes used by the Enterprise Search module are extended off this class using the full class name so we need to extend one or both of them in order to fix it. You will need to extend either Enterprise_Search_Model_Adapter_HttpStream if you do not have the Solr PHP extension installed and/or Enterprise_Search_Model_Adapter_PhpExtension if you do. Change the method to have it set the _lastNumFound before returning like so:

protected function _prepareQueryResponse($response)
{
    $realResponse = $response->response;
    $_docs = $realResponse->docs;
    if (!$_docs) {
        // Added this line to set results to 0 before returning
        $this->_lastNumFound = 0;
        return array();
    }
    $this->_lastNumFound = (int)$realResponse->numFound;
    $result = array();
    foreach ($_docs as $doc) {
        $result[] = $this->_objectToArray($doc);
    }

    return $result;
}

Now it should correctly return 0 for the getSize() command if it found no results after running a previous query that had results.

The next issue found is how it handles saving attributes inside the Solr database when indexing based on settings used in the Manage Attributes section. This involves all the same files as above only it is occurring inside the _prepareIndexProductData() method. This first adjustment has to do with how it handles attributes used solely for sorting:

// Preparing data for solr fields
if ($attribute->getIsSearchable() || $attribute->getIsVisibleInAdvancedSearch()
   || $attribute->getIsFilterable() || $attribute->getIsFilterableInSearch()
) {

The issue with this is the lack of check for attributes being used only for sorting on product listing in the if() condition. The is pretty simple to fix, just adjust the if() condition to be as follows:

// Preparing data for solr fields
if ($attribute->getIsSearchable() || $attribute->getIsVisibleInAdvancedSearch()
   || $attribute->getIsFilterable() || $attribute->getIsFilterableInSearch()
   || $attribute->getUsedForSortBy()
) {

This will make sure that attributes being used solely for sorting get properly added to the database in Solr. Again, you will need to extend the same models as above (HttpStream or PhpExtension or both) in order to fix it. The other issue with this method has to do with the following:

if ($backendType == 'datetime') {
    if (is_array($value)) {
        $preparedValue = array();
        foreach ($value as &$val) {
            $val = $this->_getSolrDate($storeId, $val);
            if (!empty($val)) {
                $preparedValue[] = $val;
            }
        }
        unset($val); //clear link to value
        $preparedValue = array_unique($preparedValue);
    } else {
        $preparedValue = $this->_getSolrDate($storeId, $value);
    }
}
...
// Preparing data for sorting field
if ($attribute->getUsedForSortBy()) {
    if (is_array($preparedValue)) {
        if (isset($preparedValue[$productId])) {
            $sortValue = $preparedValue[$productId];
        } else {
            $sortValue = null;
        }
    }

    if (!empty($sortValue)) {
        $fieldName = $this->getSearchEngineFieldName($attribute, 'sort');

        if ($fieldName) {
            $productIndexData[$fieldName] = $sortValue;
        }
    }
}

For most products, this would not be an issue since the datetime value would just be a string and get parsed normally. The problem starts to appear when you use Grouped Products or other product types that give child products to parents. All of a sudden the attribute value is an array and everything starts to go wrong.

You can see in the 2nd block, it tries to reference the value in the $preparedValue array with the key $productId but, when it is looping through the values to build the $preparedValue array, it discards all the original keys when it rebuilds the array! This guarantees that it will never see a value for the product and all products will become unsortable by this datetime attribute. The fix for this involves editing the method in 3 places:

if ($backendType == 'datetime') {
    if (is_array($value)) {
        $preparedValue = array();
        // Adjusted this to maintain the product ID key since we need it for sorting
        foreach ($value as $valProductId => &$val) {
            $val = $this->_getSolrDate($storeId, $val);
            if (!empty($val)) {
                // Have the new array maintain the productId keys
                $preparedValue[$valProductId] = $val;
            }
        }
        unset($val); //clear link to value
        // Removed the array_unique here since sorting uses direct key and we don't want the
        // expected key to disappear
    } else {
        $preparedValue = $this->_getSolrDate($storeId, $value);
    }
}
...
if ($fieldName && !empty($preparedValue)) {
    $productIndexData[$fieldName] = in_array($backendType, $this->_textFieldTypes)
        // Added array_unique to filter duplicate text before imploding since we removed it above
        ? implode(' ', array_unique((array)$preparedValue))
        : $preparedValue;
}
...
if ($searchWeight) {
    $fulltextData[$searchWeight][] = is_array($preparedValue)
        // Added array_unique to filter duplicate text before imploding since we removed it above
        ? implode(' ', array_unique($preparedValue))
        : $preparedValue;
}

Use the comments in the above snippet to see where and what was changed inside the code. This will fix the problem and allow you to once again use datetime attributes for sorting the product listing with the Solr engine enabled when using non-simple products. Again, same as the previous two fixes, it needs to be done in the extended HttpStream and/or PhpExtension models in order to work.

Make sure to run a full reindex of the catalogsearch_fulltext index after applying the indexing fixes to have all the corrected data updated into the Solr database.

Update: I created an extension with the above fixes inside of it, you can download it here.

2 responses to “Enterprise Solr getSize and Attribute Indexing Issues”

Leave a Reply

Your email address will not be published. Required fields are marked *