 
										Elasticsearch is a search engine based on Lucene, which is a free and open source information retrieval software library. It provides a distributed, multi tenant capable, full text search engine with an HTTP web interface and schema free json documents. This can be used as a replacement of document stores like MongoDB. Elasticsearch uses denormalization to improve the search performance and is one of the popular enterprise search engines, which is currently being used by many big organizations like Wikipedia, The Guardian, StackOverflow, GitHub etc
Elasticsearch is developed in Java and is under the terms of the Apache License. Official clients are available in Java, .NET (C#), PHP, Python, Apache Groovy, Ruby and many other languages. According to the DB-Engines ranking(https://en.wikipedia.org/wiki/DB-Engines_ranking), Elasticsearch is the most popular enterprise search engine followed by Apache Solr.
Elastic Stack, formerly the “ELK stack” comprises of
These products are designed to work as an integrated solution.
Elasticsearch is a near real-time search platform. For a document to be available for search, it takes a slight latency after it is indexed.
Data is stored in nodes. A node can be considered as a server. A cluster comprises of one or more nodes that holds your entire data together. Clusters provide indexing and search capabilities across all nodes.
You can group documents of similar characteristics to an index. Any number of indexes can exist in a single cluster.
One or more more types can be defines on an index. A type is a logical category/partition of the index which can have any developer defined semantics. The basic unit of information in Elasticsearch is a document. These documents are expressed in JSON.
In an index/type, you van store as many documents as you want. Though a document physically resides in an index, a document must be assigned to a type inside an index.
For efficient handling of hardware usage, Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. Sharding is important for two main reasons.
→It allow s you to horizontally split or scale your content volume.
→ It allows you to distribute and parallelizes operations across shards
In a network/cloud environment, we can expect failures any time. So, it is useful and highly recommended to have a fail-over mechanism in case if a shard/node gores off-line or simply fail. To handle this, Elasticsearch allows you to make on one or more copies of your shards into replica shards or replicas.
Replication is important mainly because
→It provides high availability in case if a shard fails. It is important to note that replica shard is never allocated to the same node as the original shard.
→ It allows you to scale out your search volume/throughput since searches can be executed on all replicas in parallel.
In Elasticsearch, index is a collection of type just as database is a collection of tables in RDBMS (Relation Database Management System). Every table is a collection of rows just as every mapping is a collection of JSON objects Elasticsearch.
| Index | Database | 
| Shard | Shard | 
| Mapping | Table | 
| Field | Field | 
| JSON Object | Tuple | 
Analysis is the process of converting text, like the body of any email, into tokens or terms which are added to the inverted index for searching. Analysis is performed by an analyzer which can be either a built-in analyzer or a custom analyzer defined per index.
Let’s consider an example in which a built in ‘english’ analyzer converts the following sentence at the time of indexing.
"The QUICK brown foxes jumped over the lazy dog!"
The ‘english’ analyzer converts this sentence into distinct tokens. It will then lowercase each token, remove frequent stopwords (“the”) and reduce the terms to their word stems (foxes → fox, jumped → jump). In the end, the following terms will be added to the inverted index:
[ quick, brown, fox, jump, over, lazy, dog ]
This same analysis process is applied to the query string at search time in full text queries like the match query to convert the text in the query string into terms of the same form as those that are stored in the inverted index.
For instance, a user might search for:
"a quick fox"
which would be analyzed by the same 'english‘ analyzer into the following terms:
[ quick, fox ]
As we have applied the same analyzer to both the text and the query string, the terms from the query string exactly match the terms from the text in the inverted index, which means that this query would match our example document.
Usually the same analyzer should be used both at index time and at search time, and full text queries like the search query will use the mapping to look up the analyzer to use for each field.
An analyzer — whether built-in or custom — is just a package which contains three lower-level building blocks: character filters, tokenizers, and token filters.
A character filter receives the original text as a stream of characters and can transform the stream by adding, removing, or changing characters. For instance, a character filter could be used to strip HTML elements like <b> from the stream.
An analyzer may have zero or more character filters, which are applied in order.
A tokenizer receives a stream of characters, breaks it up into individual tokens, and outputs a stream of tokens. For example, a whitespace tokenizer breaks text into tokens whenever it sees any whitespace. It would convert the text "Quick brown fox!" into the terms [Quick, brown, fox!].
The tokenizer is also responsible for recording the order or position of each term and the start and end character offsets of the original word which the term represents.
An analyzer must have exactly one tokenizer.
A token filter receives the token stream and may add, remove, or change tokens. For example, a lowercase token filter converts all tokens to lowercase
An analyzer may have zero or more token filters, which are applied in order.
Elasticsearch ships with a wide range of built-in analyzers, which can be used in any index without further configuration:
The standard analyzer divides text into terms on word boundaries, as defined by the Unicode Text Segmentation algorithm. It removes most punctuation, lowercases terms, and supports removing stop words.
The simple analyzer divides text into terms whenever it encounters a character which is not a letter. It lowercases all terms.
The whitespace analyzer divides text into terms whenever it encounters any whitespace character. It does not lowercase terms.
The stop analyzer is like the simple analyzer, but also supports removal of stop words.
The keyword analyzer is a “noop” analyzer that accepts whatever text it is given and outputs the exact same text as a single term.
The pattern analyzer uses a regular expression to split the text into terms. It supports lower-casing and stop words.
Elasticsearch provides many language-specific analyzers like english or french.
The fingerprint analyzer is a specialist analyzer which creates a fingerprint which can be used for duplicate detection.
If you do not find an analyzer suitable for your needs, you can create a custom analyzer which combines the appropriate character filters, tokenizer, and token filters.
The high-level full text queries are usually used for running full text queries on full text fields like the body of an email. They understand how the field being queried is analyzed and will apply each field’s analyzer to the query string before executing.
The queries in this group are:
match query
The standard query for performing full text queries, including fuzzy matching and phrase or proximity queries.
match_phrase query
Like the match query but used for matching exact phrases or word proximity matches.
match_phrase_prefix query
The poor man’s search-as-you-type. Like the match_phrase query, but does a wildcard search on the final word.
multi_match query
The multi-field version of the match query.
common terms query
A more specialized query which gives more preference to uncommon words.
query_string query
Supports the compact Lucene query string syntax, allowing you to specify AND|OR|NOT conditions and multi-field search within a single query string. For expert users only.
simple_query_string query
A simpler, more robust version of the query_string syntax suitable for exposing directly to users.
Configuring an index with ‘english’ analyzer:
PUT /my_index { "mappings": { "blog": { "properties": { "title": { "type": "string", "analyzer": "english" } } } } }
The, we can add some sample documents to this index.
PUT /my_index/blog/1 { "title": "I'm happy for this fox" } PUT /my_index/blog/2 { "title": "I'm not happy about my fox problem"
Now, search in the documents with ‘english’ analyzer.
GET /_search
{
    "query": {
        "match_phrase" : {
            "message" : {
                "query" : "happy",
                "analyzer" : "english"
            }
        }
    }
}
Now, both the documents will be returned in the search result.
References and courtesy:
http://www.elasticsearchtutorial.com
https://www.tutorialspoint.com