This page gives an overview of the Search microservice, including core features, for the Infosys Equinox platform.
Overview
Search is vital to delivering quality commerce experiences. It is fundamental for achieving business-driven metrics, such as conversion rates and cart size. Indexed keyword-based search workflows are often the preferred method for finding products, in comparison to searches performed by product category. Simply put, if products cannot be found, they cannot be bought.
The Infosys Equinox Search microservice (“Search”) produces high-converting, contextual, type-ahead results for users who are looking up items defined in Catalog, within the context of a consumer-facing storefront. Based on the open source Apache SOLR, Search gives site administrators and third-party systems integrators, the ability to optimize the product search experience. This is leveraged by using the best-in-class techniques that enable search workflows. Some examples are real-time spelling correction, synonym-based associations, faceted searching, and weighted results.
Search is coupled with the Catalog microservice. The product lookups are performed against indexes, that have previously been harvested from source catalogs, via a queue-based feed process.
Core Features
- Industry-standard search is already integrated with other services
- All setup work is completely out of the box
- Type-ahead, similar items, all the basic search functions you would expect
- Automatic facet normalization (for example, size)
- Search suggestions
- Contextual typeahead (for example, Shirts for Men)
- Boost and Bury search terms
- Synonym mapping
- Filtered search results
- Redirect to non-product search
- Semantic ID support for products and product lists (New!)
- NLP based search on the basis of search context. NLP search also lets users search for products by describing features t hat may appear in images. AI enabled Apps help to search based on semantic search and image search. Also supports recommendations based on search and personalized search results based on previous shopping patterns. (New!)
- Enables Indexing the data for search on the go. Catalog, pricing and merchandizing updates are done using automated batch jobs.
SOLR Terminology
S.No | Term | Description |
---|---|---|
1 | Collection | A complete logical index in a SolrCloud cluster. It is associated with a config set and is made up of one or more shards. If the number of shards is more than one, it is a distributed index, but SolrCloud lets you refer to it by the collection name and not worry about the shards parameter that is normally required for DistributedSearch. |
2 | Config Set | A set of config files necessary for a core to function properly. Each config set has a name. At minimum this will consist of solrconfig.xml (SolrConfigXml) and schema.xml (SchemaXml), but depending on the contents of those two files, may include other files. This is stored in Zookeeper. Config sets can be uploaded or updated using the upconfig command in the command-line utility or the bootsrap_confdir SOLR startup parameter. |
3 | Core | This is discussed in the General list (below) as SOLR Core. One difference with SolrCloud is that the config it uses is in Zookeeper. With traditional SOLR, the core’s config will be in the conf directory on the disk. This is a running instance of a Lucene index along with all the SOLR configuration (SolrConfigXml, SchemaXml, etc.) required to use it. A single SOLR application can contain 0 or more cores which are run largely in isolation but can communicate with each other if necessary via the CoreContainer. From a historical perspective: SOLR initially only supported one index, and the SolrCore class was a singleton for coordinating the low-level functionality at the “core” of SOLR. When support was added for creating and managing multiple Cores on the fly, the class was refactored to no longer be a Singleton, but the name stuck. |
4 | Leader | The shard replica that has won the leader election. Elections can happen at any time, but normally they are only triggered by events like a SOLR instance going down. When documents are indexed, SolrCloud will forward them to the leader of the shard, and the leader will distribute them to all the shard replicas. |
5 | Replica | One copy of a shard. Each replica exists within SOLR as a core. A collection named “test” created with numShards=1 and replicationFactor set to two will have exactly two replicas, so there will be two cores, each on a different machine (or SOLR instance). One will be named test_shard1_replica1 and the other will be named test_shard1_replica2. One of them will be elected to be the leader. |
6 | Shard | A logical piece (or slice) of a collection. Each shard is made up of one or more replicas. An election is held to determine which replica is the leader. This term is also in the General list below, but there it refers to SOLR cores. The SolrCloud concept of a shard is a logical division. A distributed index is partitioned into “shards”. Each shard corresponds to a SOLR core and contains a disjoint subset of the documents in the index. |
7 | Auto-warming | What SOLR does when it opens a new cache, and seeds it with key/value pairs based on the “top” keys from the old instance of the cache. |
8 | Constraint | A viable method of limiting a set of objects (*). |
9 | DisMax | Typically a reference to the DisMaxQParserPlugin but in older contexts may be referring to the DisMaxRequestHandler. |
10 | Facet | A distinct feature or aspect of a set of objects; “a way in which a resource can be classified” (*). |
11 | Field Collapsing | A specific use case of Result Grouping where the groups are dictated by the value of a field. |
12 | Filter | Depending on context, may be: Another word for “Constraint” The “fq” param which constrains the results from a query without influencing the scores. Specifically referring to the Lucene “Filter” class |
13 | NRT | Near Real Time: This refers to the general concept of wanting document updates to be “immediately” visible to search clients. |
14 | REquest Handler | A SOLR component that processes requests. For example, the DisMaxRequestHandler processes search queries by calling the DisMax Query Parser. Request Handlers can perform other functions, as well. |
15 | QTime | The elapsed time (in milliseconds) between the arrival of the request (when the SolrQueryRequest object is created) and the completion of the request handler. It does not include time spent in the response writer formatting/streaming the response to the client. |
16 | Query Parser | A SOLR component that parses the parameters and search terms submitted in a search query. |
17 | Searcher | In SOLR parlance, the term “Searcher” tends to refer to an instance of the SolrIndexSearcher class. This class is responsible for executing all searches done against the index, and manages several caches. There is typically one Searcher per SolrCore at any given time, and that searcher is used to execute all queries against that SolrCore, but there may be additional Searchers open at a time during cache warming (in which and “old Searcher” is still serving live requests while a “new Searcher” is being warmed up). |
18 | Slop | As in “phrase slop”: the number of positions two tokens need to be moved in order to match a phrase in a query. |
19 | Static warming | What users can do using newSearcher and firstSearcher event listeners to force explicit warming actions to be taken when one of these events happens — frequently it involves seeding one or more caches with values from “static” queries hard-coded in the solrconfig.xml . |
20 | SOLR Home Dir | Also referred to as the “SOLR Home Directory” or just “SOLR Home” this is the main directory where SOLR will look for configuration files, data, and plugins. Knowing which directory to use as the SOLR Home is the one piece of information that SOLR must either assume (the default is “./solr”) or be configured using some mechanism beyond SOLR’s normal configuration files. An example SOLR Home is included in SOLR releases and contains a README.txt explaining the directory structure. |
SOLR Scheme
<schema name="product" version="1.5">
<fields>
<!-- Mandatory field required for SOLR validation -->
<field name="_version_" type="long" indexed="true" stored="true"/>
<!-- search fields -->
<field name="collectionid" type="long" indexed="true" stored="false"/>
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<dynamicField name="index_number_*" type="float" indexed="true" stored="false" multiValued="true"/>
<dynamicField name="index_string_*" type="text" indexed="true" stored="false" multiValued="true"/>
<field name="groupid" type="string" indexed="true" stored="false" multiValued="false"/>
<field name="locale" type="string" indexed="true" stored="false" multiValued="true"/>
<dynamicField name="index_key_*" type="string" indexed="true" stored="false" multiValued="true"/>
<dynamicField name="suggestion_*" type="string" indexed="true" stored="true" multiValued="false"/>
<dynamicField name="key_*" type="string" indexed="true" stored="false" multiValued="true"/>
<dynamicField name="facet_*" type="string" indexed="true" stored="false" multiValued="true"/>
<dynamicField name="range_facet_*" type="float" indexed="true" stored="false" multiValued="true"/>
<dynamicField name="sort_number_*" type="float" indexed="true" stored="false"/>
<dynamicField name="sort_string_*" type="string" indexed="true" stored="false"/>
<field name="starttime" type="timestamp" indexed="true" stored="true"/>
<field name="historicalsale" type="long" indexed="true" stored="false"/>
<field name="endtime" type="timestamp" indexed="true" stored="true"/>
<field name="suggestion" type="text" indexed="true" stored="true" required="false" multiValued="true"/>
<field name="response" type="string" indexed="false" stored="true"/>
</fields>
<uniqueKey>id</uniqueKey>
<types>
<!-- common search fields types -->
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="timestamp" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory" outputUnigramsIfNoShingles="true" maxShingleSize="8"
outputUnigrams="true"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory" outputUnigramsIfNoShingles="true" maxShingleSize="8"
outputUnigrams="true"/>
</analyzer>
</fieldType>
</types>
</schema>
Revision History
2024-11-18 | JP – Verified and updated content until release 8.18.8.
2023-01-12 | AN – Updated content for 8.13 release.
2020-09-28 | AN – Updated the Core Features section.
2019-07-12 | AM – Added Core Features section for July 2019 release.
2019-01-23 | PLK – Page created and content uploaded.