CloudSearch is a fully-managed, full-featured search service in the AWS Cloud that makes it easy to set up, manage, and scale a search solution.
You can quickly add rich search capabilities to your website or application.
CloudSearch
Automatically provisions the required resources.
Deploys a highly tuned search index.
Easy configuration and can be up & running in less than one hour.
Search and ability to upload searchable data.
Automatically scales for data and traffic.
Self-healing clusters
High availability with Multi-AZ
CloudSearch uses Apache Solr as the underlying text search engine and
Apache Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Solr powers the search and navigation features of many of the world's largest internet sites.
Can be used to index and search both structured and unstructured data.
Content can come from multiple sources and can include database fields along with files in a variety of formats, web pages, and so on.
Supports indexing features like algorithmic stemming , dictionary stemming, stopword dictionary.
Stemming is the process of reducing a word to its root form. This ensures variants of a word match during a search. ; am, are, is ⇒ be
Algorithmic stemmers, which stem words based on a set of rules.
Dictionary stemmers, which stem words by looking them up in a dictionary.
They are filler words that help sentences flow better, but provide very little context on their own. Stop words are usually words like “to”, “I”, “has”, “the”, “be”, “or”, etc.
Can support customizable result ranking i.e. relevancy.
Supports search features for text search, different query types (range, boolean etc), sorting, facets for filtering, grouping etc-
Supports enhanced features for auto suggestions, highlighting, spatial search, fuzzy search etc
CloudSearch supports Multi-AZ option and it deploys additional instances in a second AZ in the same region.
CloudSearch can offer significantly lower total cost of ownership compared to operating and managing your own search environment.
Search domain is a data container and a set of services that make the data searchable.
Document service that allows data uploading to domain for indexing.
Search service that enables search requests against the indexed data.
Configuration service for controlling the domains behavior (include relevance ranking)
Search domain can’t be automatically migrated from one region to another. New domain in the target region needs to be created, configured and data uploaded, and then the original domain deleted.
Indexed data to be made searchable
Can be submitted through a REST based web service url.
Has to be in JSON or XML format.
Is represented as a document with a unique document ID and multiple fields either to be search on to needed to be just retrieved.
CloudSearch generates a search index from the document data according to the index fields configured for the domain.
Data updates can be submitted by to add, update and delete documents.
Data can be uploaded using secure and encrypted SSL HTTPS connection.
Search domains scale in two dimensions: data and traffic
A search instance is a single search engine in the cloud that indexes documents and responds to search requests with a finite amount of RAM and CPU resources for indexing data and processing requests.
Search domain can have one or more search partitions, portion of the data which fits on a single search instance, and the number of search partitions can change as the documents are indexed
CloudSearch can determine the size and number of search instances required to deliver low latency, high throughput search performance.
When a search domain is created , a single instance is deployed.
CloudSearch automatically scales the domain by adding instances as the volume of data or traffic increases.
Scaling for data
CloudSearch handles scaling for data by
Vertical scaling by increasing the size of the instance, when the amount of data exceeds a single search instance.
Horizontal scaling using search partitions, when the amount of data exceeds the capacity of the largest search instance type.
Number of search instances required to hold the index partitions is sometimes referred to as the domain’s width.
CloudSearch reduces the number of partitions and size of search instances if the amount of data reduces.
Scaling for traffic
CloudSearch handles Scaling for traffic by
Vertical scaling by increasing the size of the instance, when the amount of traffic exceeds a single search instance.
Horizontal scaling by deploying a duplicate search instance to provide additional processing power i.e. the complete number of partitions are duplicated.
CloudSearch reduces the number of partitions and size of search instances if the traffic reduces.
Number of duplicate search instances is sometimes referred to as the domain’s depth.
CloudSearch provides features to index and search both structured data and plain text as well as unstructured data like pdf, word documents.
CloudSearch provides near real-time indexing for document updates.
Indexing features include
tokenization,
stopwords,
stemming
synonyms
Search features include
faceted search ; Faceted search refers to a way to explore large amounts of data by displaying summaries about various partitions of the data and later allowing to narrow the navigation to a specific partition. Analyze, organize, and filter large sets of product inventory based on filters such as size, color, price, and brand.
free text search, Boolean search expressions,
customizable relevance ranking, query time rank expressions,
grouping
field weighting, searching and sorting
Other features like
Autocomplete suggestions
Highlighting
Geospatial search
New data types: date, double, 64 bit signed int, LatLon
Dynamic fields
Index field statistics
Sloppy phrase search
Term boosting
Enhanced range searching for all field types
Search filters that don’t affect relevance
Support for multiple query parsers: simple, structured, lucene, dismax
Query parser configuration options