Is it necessary to have a Thesaurus file for SQL Server?

Version 1

    Details

    The HEAT application is able to do complex searches, using inflectional forms of the words used in the searches. This article explains how it works and why it is not necessary to provide a Thesaurus file to the SQL Server.


    Resolution

    When a Microsoft SQL Server is installed, it provides by itself a stopword and a thesaurus file for each language supported.

    The  stopwords are used for avoiding unuseful searches with words like "a",  "the", "with", etc.  SQL Server then allows to add stoplists with customized stopwords, that the administrator does not want to be searched in the Full Text Catalogs.

    For the stopwords, in the SQL Server:

      1. It is possible to check which file is using. In this      case, the number 1033 indicates the English (US) language. For English      (UK) would be 2057, but the result should be the same:
             EXEC sp_help_fulltext_system_components @component_type =      'wordbreaker', @param = 1033;
             GO
      2. Also, it is possible to see the stopwords that are in      that list for that language (http://technet.microsoft.com/en-us/library/cc280523(v=sql.110).aspx):
             select * from sys.fulltext_system_stopwords where language_id = 1033
      3. Finally, it is possible to see the customized stop lists      created on the system
             select * from sys.fulltext_stoplists

      The thesaurus file allows the system to search in a Full Text Catalog for similar words as the one that is given as a parameter. It is possible to use THESAURUS, that chooses words that have the same meaning, or using INFLECTIONAL, that chooses alternative inflection forms for the match words. HEAT Service Management uses this functionality in some queries. Otherwise, SQL Server provides an XML file that is basically empty, which allows the administrator to add specific words to the thesaurus (http://msdn.microsoft.com/en-GB/library/ms142491.aspx).

      1. For checking the system, execute this query. Some      queries in HEAT work in this way, which improves the results.
             SELECT * FROM      sys.dm_fts_parser('FORMSOF(INFLECTIONAL,"live")',1033,0,0)
             It should give the following words as result:
             lived
             lives
             living
             live
      2. There are free thesaurus files that can be used, but it      is not likely to be useful for the HEAT system, as the queries are based in      the INFLECTIONAL form and not in THESAURUS.  Please read this      article:http://thinknook.com/sql-server-english-thesaurus-for-fulltext-search-2012-02-07/.      The file can be downloaded from this location: http://thinknook.com/wp-content/uploads/2012/02/tsglobal-sql-thesaurus-xml.zip.      These are the steps to add it to the SQL Server installation:
             1.- Extract the content of the file, rename it to tseng.xml for      English (UK) or rename it to tsenu.xml for English (USA) or      use the original name for using it as default Thesaurus, and      copy it to the folder C:\Microsoft SQL      Server\MSSQL11.MSSQLSERVER\MSSQL\FTData. (C: can be substituted for      other volume)
             2.- As is the global file, execute the following instruction from the SQL      Server Management Studio:
             For Default
             EXEC sp_fulltext_load_thesaurus_file 0
             For English (USA)
             EXEC sp_fulltext_load_thesaurus_file 1033
             For English (UK)
             EXEC sp_fulltext_load_thesaurus_file 2057
             3.- It is possible to test it works with a query similar to this one.      Please see the difference depending of the language used (0, 1033 or      2057):
             SELECT * FROM sys.dm_fts_parser('FORMSOF(THESAURUS,"always")',2057,0,0)

      Summarizing, the queries in HEAT use the FORMSOF - INFLECTIONAL functionality of SQL Server, that looks into the records using similar ways of writing the words used in the search, so having a Theraurus file it is not necessary.