Introduction
A customer asked us some questions about full-text search in SQL Server 2019 and I decided to summarize some useful information in this blog entry.
Sometimes you want to perform a search using one or more keywords over one or multiple character columns in a table. Clustered, nonclustered or column stored indexes (organized in a B-Tree structure) will help you with such a task. You can of course use the LIKE operator and do wildcard text searches, but this is still inefficient. Full-text search in SQL Server and Azure SQL lets you perform full-text queries against character based-data in your tables.
Full-text search uses an inverted index which stores keywords from the underlying columns.
For this article I used SQL Server 2019. I provisioned a Windows Server 2019 and installed a fresh SQL Server including the feature “Full-Text search and semantic extraction” at installation time. I you already have an instance of SQL server and want to know if the feature is already installed, you can verify it with the following query:

If the feature is not installed, you need to run the installation wizard again or contact your database administrator. For additional information you can visit this article: Step by Step Process to Add Full Text Search on SQL Server 2019.
Alternatives to SQL Server
This article focuses on how to take advantage of SQL Server functionalities to do full-text search on your queries. However, there are other sophisticated and more scalable approaches you should consider, for example:
- Azure Cognitive Search: part of Azure Cognitive Services, it allows you to index structured and unstructured data for search applications. It also offers semantic search capabilities to understand user intent and contextually rank the most relevant search results.
- Azure Data Explorer (ADX): Azure data explorer provides a set of operators, like search or scan, which at the same time provide several parameters to perform queries on character-based columns with cloud scalability.
Sample Database in this article
To create full-text indexes and test them, I used the AdventureWorksDW2019 sample database, which can be downloaded from here.
I selected the table dbo.DimProduct, which contains nvarchar(400) columns to hold the description of a product in different languages (one column per language).
Full-Text Search
In order to start using full text search once you verified this feature is installed, you need to perform the following steps:
Create a full text catalog
- Open your SQL client (in my case SSMS)
- Expand your database –> Storage –> Full Text Catalog
- Right click on “Full Text Catalog” and select “New Full-Text Catalog…”

- This will open a dialog box:

- Give the catalog a name
- Optionally provide an owner (by default is dbo)
- Set as default catalog: Specifies that the catalog is the default catalog. When full-text indexes are created without a full-text catalog explicitly specified, the default catalog is used.
- Accent sensitivity: establish whether an accent is taken into consideration when comparing two strings.
If you want to create the catalog programmatically (with T-SQL) and provide additional options, please visit: CREATE FULLTEXT CATALOG (Transact-SQL).
Now you should be able to see the newly created catalog:

Create a full text index
- Select the table where you want to perform full text search, in our example dbo.DimProduct
- Right click on the table, go to “Full-Text index” and select “Define Full-Text Index…

- This opens the SQL Server Full-Text Indexing Wizard. Select “Next”:

- Select the unique index for the table, in this case, its primary key:

- Select the column to be added to the index (EnglishDescription in this example), then click on “Next”:

- Leave “Automatically” checked and click on “Next”:

- Choose a file group: Populate a full-text index is very I/O intensive. As best practice: “locate a full-text index in the database filegroup that is best for maximizing I/O performance or locate the full-text indexes in a different filegroup on another volume.”
- Select the catalog we’ve created previously or create a new want if it’s needed.
- Associate a stoplist: to prevent the full-text index including commonly occurring strings that are not relevant for the search a list of stopwords is used, and it’s called stoplist. As a result, full-text queries will not search on stopwords.
- Leave “Select index filegroup” value as default, that is, the PRIMARY file group will be used. As full-text stoplist leave “<system>”, which contains the most commonly used stopwords for each supported language. You can also create your own stoplist.

- The last window of the Wizard is a summary of the entered values. Finally, click on “Finish”:

If everything went well, you will get a confirmation windows like the following:

Warmup – Example using LIKE
To warmup let’s do a word search using LIKE in the WHERE clause, which is not consuming the created full-text index.
I want to get the list price, English name and size range of products including the word “downhill” in their description:

If we have a look at the execution plan a clustered index scan gets a 100% cost and finished in 0.004s, so no surprises:

Let’s continue with predicates consuming the index.
Example using CONTAINS
CONTAINS is a predicate used in the WHERE clause that can search for:
- A single word or phrase.
- A prefix of a single word or phrase.
- A word within a certain distance to another word.
- A word inflexionally generated from another word.
- A synonym of a word using a thesaurus.
It is important to notice that CONTAINS’s search for characters in the word or phrase is not case-sensitive.
Let’s again search products with the word “downhill” in the EnglishDescription column:

Using contain our query returns the same results, no surprises again. But his time the full-text index was used by the engine, that’s the FulltextMatch operation. This time the engine was so fast that the cost is not even visible in milliseconds.

Example using FREETEXT
FREETEXT is also a predicate to be used in WHERE clauses, and it searches for values that match the meaning and not just the exact wording of the words in the search condition, so the engine performs the following actions where FREETEXT is used:
- Word-breaking: it breaks a string into individual words.
- Stemming: it generates inflectional forms of the words.
- It uses the thesaurus to generate a list of synonyms for the search.
Let’s use our “downhill” example again, this time with FULLTEXT:

Again, the query result is the same. If we have a look at the execution plan, the engine also uses the full-text index. Some estimated costs vary from the CONTAINS example, but the performance is still quite similar:

Full-text queries using FREETEXT are less precise than those full-text queries using CONTAINS. No special meaning is given to any of the reserved keywords or wildcard characters.
Example using CONTAINSTABLE
CONTAINSTABLE is similar to CONTAINS predicate but returns a rank and a full-text key for each row. It is used in the FROM clause instead and is referenced as if it were a regular table.
To obtain the rows you want from the original table, a join can be specified with the CONTAINSTABLE rows. This is a code snippet you can use as guidance:
SELECT select_list
FROM table AS FT_TBL INNER JOIN
CONTAINSTABLE(table, column, contains_search_condition) AS KEY_TBL
ON FT_TBL.unique_key_column = KEY_TBL.[KEY];
For example, let’s search the product names where the product description in English contains the word “aluminum” near the word “frame” or the word “alloy”:

Note that the CONTAINSTABLE return to extra columns as explained above:
- Rank: an integer number from 0 to 1000. The higher the rank, the more relevant the returned row. The calculation of the rank is quite complex, for additional information please visit this link.
- Key: The unique key used in the full-text index. In the example above the ProductKey.
Example using FREETEXTTABLE
The FREETEXTTABLE is used in the FROM clause of the query and returns a table containing the rows that match the meaning and not just the exact word or string. FREETEXTTABLE is referenced as if it were a regular table name.
In the next example, we want to search in the GermanDescription column the bikes with lightweight frames (in German “leichter Rahmen”). The Language is explicitly specified. We are also using the top_n_by_rank parameter to indicate that we want the top 100 ranked results to be returned:

Multilanguage search
A single column can support a single full-text index in a single language. This is because the parsing function for a full-text index needs to know about how to break up words, and what token are considered as noise in a particular language. Further, when you query a full-text index, you would get less meaningful answers if the full-text data was in multiple languages.
If separated columns are available for different languages, you need to set the correspondent language for every column. If you are not sure which language is stored in the column or you can’t avoid having more than language in a single column, a workaround is to use the word breaker for the most complex language that might be stored in the column.
Example: in our sample table dbo.DimProduct, there are product description available in different language. Let’s add German descriptions column to our full-text index:
This time let’s use this approach using SSMS:
- Navigate to the database, then expand Storage and Full Text Catalogs. Right-click on your catalog name and select “Properties”:

- In the properties window select “Table/Views”. The dbo.DimProduct shows up in the tables assigned to the catalog. Navigate the eligible columns and check “GermanDescription”, select “German” as the language for the Word Breaker and click “OK”:

We can now verify which columns and languages are part of our full-text index:

Add synonyms to your full-text index search using a thesaurus file
You can add synonyms for your search using a full-text thesaurus, which is basically in this context a XML file containing a group of synonyms to expand your search. There are other configurations not covered by this article. If you want additional information please visit Configure and Manage Thesaurus Files for Full-Text Search.
There are already empty thesaurus files for each supported language in the server in this location: <SQL_Server_data_files_path>\MSSQL13.MSSQLSERVER\MSSQL\FTDATA\

In our example, lets add an entry in the English thesaurus file for aluminum and aluminium. Aluminum is preferred in North America, whereas Aluminium is the spelling preferred in the United Kingdom and most other English-speaking nations.
For that, open the file tseng.xml with notepad or other text editor:

There are already some sample expansions in the existing file for English (tsenu.xml). You can delete or leave them; the important thing is to add a new expansion for our example using the syntax as shown above and remove the XML comments.
Before that, if you look for “aluminium” you will get no results, when using i.e. CONTAINS:

To activate our changes, apart from saving the file you need to call the stored procedure sys.sp_fulltext_load_thesaurus_file:
EXEC sys.sp_fulltext_load_thesaurus_file 1033;
Remark: Only members of the sysadmin fixed server role or the system administrator can execute the sp_fulltext_load_thesaurus_file stored procedure. Only system administrators can update, modify, or delete thesaurus files.
The input parameter 1033 corresponds to the Integer mapping the locale identifier (LCID) of the language for which you want to load the thesaurus XML definition.
Now let’s try again the example above a see if the thesaurus helps us:

As you can see, adding a synonym for aluminum increased the quality of the full-text search. One tricky thing is the syntax for the CONTAINS function, which contains the “FORMSOF” in the generation term:
<generation_term> ::=
FORMSOF ( { INFLECTIONAL | THESAURUS } , <simple_term> [ ,...n ] )
Conclusion
SQL Server is a powerful tool that can be used for scenarios where you need to search words or strings in text data. It not only provides simple search capabilities but also sophisticated options for semantic searches. It also provides multilanguage capabilities to enrich the search experience. We have not explored additional features to include our own stoplist or perform complex query scenarios. These topics are quite interesting, and I encourage you to have a look at them.
References:
- https://www.mssqltips.com/sqlservertip/6841/add-full-text-search-sql-server/
- https://www.mssqltips.com/sqlservertip/6840/sql-server-full-text-search/
- https://www.red-gate.com/simple-talk/databases/sql-server/learn/understanding-full-text-indexing-in-sql-server/
- https://www.mssqltips.com/sqlservertip/1491/sql-server-full-text-search-noise-words-and-thesaurus-configurations/
- https://www.sqlshack.com/hands-full-text-search-sql-server/
- https://learn.microsoft.com/en-us/sql/relational-databases/search/get-started-with-full-text-search?view=sql-server-ver16
- https://learn.microsoft.com/en-us/sql/relational-databases/search/query-with-full-text-search?view=sql-server-ver15
- https://learn.microsoft.com/en-us/sql/relational-databases/search/choose-a-language-when-creating-a-full-text-index?redirectedfrom=MSDN&view=sql-server-ver16
- Expert performance indexing SQL Server 2012: by Jason Strate, Ted Krueger. Released July 2012. Publisher(s): Apress. ISBN: 9781430237426.
- https://www.red-gate.com/simple-talk/databases/sql-server/learn/sql-server-full-text-search-language-features/
Leave a Reply