A guide for .NET Developers on using Elasticsearch

Is Elasticsearch a good choice for .NET developers despite being built on Java? Absolutely! Elasticsearch offers compelling reasons to consider it for full-text searching in any project.

In recent years, Elasticsearch has significantly advanced, simplifying full-text search and providing advanced features like text autocompletion and aggregation pipelines.

Don’t worry if incorporating a Java-based service into your .NET environment seems daunting. Once Elasticsearch is installed and configured, you’ll primarily work with a fantastic .NET package: NEST.

This article guides you on utilizing the powerful Elasticsearch search engine in your .NET projects.

Installation and Configuration

Setting up Elasticsearch in your development environment involves downloading Elasticsearch and, if desired, Kibana.

Once unzipped, you can utilize a batch file similar to this:

1
2
3
4
5
6
7
cd "D:\elastic\elasticsearch-5.2.2\bin"
start elasticsearch.bat
 
cd "D:\elastic\kibana-5.0.0-windows-x86\bin"
start kibana.bat
 
exit

After initiating both services, access the local Kibana server (typically at http://localhost:5601) to experiment with indexes, types, and searches using pure JSON, as detailed here.

Getting Started

As a diligent developer with management support, begin by creating a unit test project and writing a SearchService with at least 90% code coverage.

The initial step involves configuring the app.config file to provide a connection string for the Elasticsearch server.

While Elasticsearch is free, consider Elastic.co’s Elastic Cloud service. This hosted service streamlines maintenance and configuration, and offers a two-week free trial for testing.

For local execution, a configuration key like this suffices:

1
<add key="Search-Uri" value="http://localhost:9200" />

Elasticsearch uses port 9200 by default, but you can customize it.

ElasticClient and the NEST Package

ElasticClient, provided by the NEST package, handles most tasks for us.

Install the package first.

Configure the client like so:

1
2
3
4
5
var node = new Uri(ConfigurationManager.AppSettings["Search-Uri"]);
var settings = new ConnectionSettings(node);
settings.ThrowExceptions(alwaysThrow: true); // I like exceptions
settings.PrettyJson(); // Good for DEBUG
var client = new ElasticClient(settings);

Indexing and Mapping

To enable searching, data needs to be stored in Elasticsearch, a process called “indexing.”

“Mapping” connects your database data to objects that are serialized and stored in Elasticsearch. This tutorial utilizes Entity Framework (EF).

Typically, Elasticsearch is used for site-wide searching. You’ll employ feeds, digests, or Google-like searches encompassing results from various entities like users, blog posts, products, categories, events, etc.

Instead of a single database table or entity, you’ll aggregate diverse data, potentially extracting common properties such as title, description, date, author/owner, image, and more. This usually involves multiple queries, especially when using an ORM for retrieving data from different sources like blog posts, users, products, etc.

Structure your projects by creating an index for each major type, such as “blog post” or “product.” Add specific Elasticsearch types within an index for finer distinctions. For instance, an “article” index could contain types like “story,” “video article,” and “podcast,” all retrieved using a similar database query.

Remember, each index needs at least one type, often named after the index itself.

Create additional classes for mapping entities. A common approach is using a [DocumentSearchItemBase](https://github.com/yohney/elastic-net-example/blob/master/src/Elastic.Example/Elastic.Example.Services/Mappings/SearchItemDocumentBase.cs) class, extended by specialized classes like BlogPostSearchItem, ProductSearchItem, and so on.

Define mapper expressions within these classes for easy modification later.

Early on, a large SearchService class with switch-case statements might be tempting for handling mappings and indexing for different entity types. However, a more elegant solution involves a smart IndexDefinition class and index-specific definition classes.

A base IndexDefinition class can hold a list of available indexes and helper methods like required analyzers and status reports. Derived index-specific classes classes handle database querying and data mapping for each index. This approach simplifies adding new entities to Elasticsearch later on by creating a new SomeIndexDefinition class inheriting from IndexDefinition and implementing the necessary data retrieval methods.

Communicating with Elasticsearch

Elasticsearch’s query language is central to its functionality. Constructing query objects is key to interacting with it.

Underneath, Elasticsearch exposes functionalities as a JSON-based API over HTTP.

While the API and query object structure are intuitive, real-world scenarios can be complex.

A typical Elasticsearch search request needs:

  • Targeted index and types

  • Pagination details (items to skip and return)

  • Specific type selection (for aggregations)

  • The query itself

  • Highlight definition (optional hit highlighting)

For example, you might need a search feature limiting premium content to specific users or restricting content visibility to authors’ “friends.”

Query construction is crucial for addressing such scenarios, particularly the query segment, which is our focus here.

Queries are recursive structures combining BoolQuery and other queries like MatchPhraseQuery, TermsQuery, DateRangeQuery, and ExistsQuery. These cover basic requirements and provide a good starting point.

[MultiMatch](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html) queries are vital for specifying search fields and fine-tuning results, which we’ll revisit later.

MatchPhraseQuery filters by foreign keys (in SQL terms) or static values like enums, such as finding articles by a specific author (AuthorId) or retrieving all public articles (ContentPrivacy=Public).

TermsQuery resembles SQL’s “in” operator, returning articles by a user’s friends or products from specific merchants. Avoid using large arrays (e.g., 10,000 members) for performance reasons.

DateRangeQuery is self-explanatory.

ExistsQuery includes or excludes documents based on the presence of a specific field.

These, combined with BoolQuery, enable complex filtering logic.

Imagine a blog site with an AvailableFrom field indicating post visibility.

A filter like AvailableFrom <= Now wouldn’t include documents lacking this field (common in aggregated data). To address this, combine ExistsQuery with DateRangeQuery within a BoolQuery requiring at least one condition to be met. Example:

1
2
3
4
BoolQuery
    Should (at least one of the following conditions should be fulfilled)
        DateRangeQuery with AvailableFrom condition
        Negated ExistsQuery for field AvailableFrom

Negating queries isn’t straightforward but achievable with BoolQuery:

1
2
3
BoolQuery
    MustNot
        ExistsQuery

Automation and Testing

Writing tests alongside development is highly recommended.

This facilitates experimentation and ensures that new changes (e.g., complex filters) don’t break existing functionality. Instead of “unit tests,” consider “integration tests” as mocking Elasticsearch’s behavior might not be accurate.

Practical Examples

With indexing, mapping, and filtering set up, let’s explore tweaking search parameters for better results.

In a recent project, Elasticsearch powered a user feed aggregating content, ordered by creation date, and searchable with various options. While the feed is straightforward (using a date field for ordering), search requires optimization.

Elasticsearch needs guidance on data relevance. Suppose we have data with Title, Tags (array), and Body fields, with the Body field potentially containing HTML.

Handling Spelling Mistakes

Requirement: Search should return results despite spelling errors or different word endings. For example, searching for “thing” or “wood” should match an article titled “Magnificent Things You Can Do with a Wooden Spoon.”

We’ll utilize analyzers, tokenizers, char filters, and token filters applied during indexing.

  • Define analyzers, potentially per index.

  • Apply analyzers to specific document fields using attributes or fluent API. Attributes are used in this example.

  • Analyzers combine filters, char filters, and tokenizers.

For partial word matching, create the “autocomplete” analyzer:

  • English stopwords filter: Removes common English words like “and” or “the.”

  • Trim filter: Removes surrounding whitespace from tokens.

  • Lowercase filter: Converts characters to lowercase for case-insensitive searching.

  • Edge-n-gram tokenizer: Enables partial matching by storing word fragments. For instance, “My granny has a wooden chair” would be stored as “woo,” “wood,” “woode,” and “wooden,” allowing matches for “wood” with at least three letters. MinGram and MaxGram define the minimum and maximum character lengths (3 and 15 in our case).

Combine these elements:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
analysis.Analyzers(a => a
	.Custom("autocomplete", cc => cc
		.Filters("eng_stopwords", "trim", "lowercase")
		.Tokenizer("autocomplete")
	)
	.Tokenizers(tdesc => tdesc
		.EdgeNGram("autocomplete", e => e
			.MinGram(3)
			.MaxGram(15)
			.TokenChars(TokenChar.Letter, TokenChar.Digit)
		)
	)
	.TokenFilters(f => f
		.Stop("eng_stopwords", lang => lang
			.StopWords("_english_")
		)
	);

Apply the analyzer to desired fields:

1
2
3
4
5
6
7
8
9
public class SearchItemDocumentBase
{
	...

	[Text(Analyzer = "autocomplete", Name = nameof(Title))]
	public string Title { get; set; }
	
	...
}

Now, let’s address common requirements in content-rich applications.

Removing HTML

Requirement: Some fields might contain HTML.

Avoid returning HTML tags when searching for terms like “section” or “body.” Strip out HTML during indexing, leaving only the content.

Elasticsearch provides a helpful char filter:

1
2
3
4
5
6
analysis.Analyzers(a => a
	.Custom("html_stripper", cc => cc
		.Filters("eng_stopwords", "trim", "lowercase")
		.CharFilters("html_strip")
		.Tokenizer("autocomplete")
	)

Apply it like this:

1
2
[Text(Analyzer = "html_stripper", Name = nameof(HtmlText))]
public string HtmlText { get; set; }

Field Importance

Requirement: Title matches should be prioritized over content matches.

Elasticsearch allows boosting results based on the matched field using the boost option in the search query:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
const int titleBoost = 15;

.Query(qx => qx.MultiMatch(m => m
	.Query(searchRequest.Query.ToLower())
	.Fields(ff => ff
		.Field(f => f.Title, boost: titleBoost)
		.Field(f => f.Summary)
		...
	)
	.Type(TextQueryType.BestFields)
) && filteringQuery)

The MultiMatch query proves valuable in such scenarios, which are common when certain fields are more relevant than others.

Fine-tuning boost values requires experimentation.

Content Prioritization

Requirement: Rank certain articles higher based on author importance or engagement metrics (likes, shares, upvotes).

Implement a custom scoring function in Elasticsearch. We’ll define an “Importance” field (double value, greater than 1) representing article importance. You can define your own importance factor. Multiple boost and scoring modes are possible. Here’s one approach:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
.Query(q => q
	.FunctionScore(fsc => fsc
		.BoostMode(FunctionBoostMode.Multiply)
		.ScoreMode(FunctionScoreMode.Sum)
		.Functions(f => f
			.FieldValueFactor(b => b
				.Field(nameof(SearchItemDocumentBase.Rating))
				.Missing(0.7)
				.Modifier(FieldValueFactorModifier.None)
			)
		)
		.Query(qx => qx.MultiMatch(m => m
			.Query(searchRequest.Query.ToLower())
			.Fields(ff => ff
				...
			)
			.Type(TextQueryType.BestFields)
		) && filteringQuery)
	)
)

Prioritizing Full-Word Matches

Requirement: Prioritize full-word matches over partial matches.

To prioritize exact matches, add a “Keywords” field to your documents. Instead of the autocomplete analyzer, use a keyword tokenizer and a boost factor to prioritize exact matches.

This field will only match complete words, unlike the autocomplete analyzer.

Conclusion

This article provided an overview of setting up Elasticsearch in your .NET project and creating a powerful search feature.

While there’s a learning curve, the effort is worthwhile, especially when achieving excellent search results.

Always include comprehensive test cases with expected outcomes to avoid regressions when experimenting with parameters.

Find the complete code for this article available on GitHub, which utilizes data from the TMDB database to demonstrate search result improvements.

Licensed under CC BY-NC-SA 4.0