SQL Server 2008 for Developers UTS Short Course Course Website Course Timetable & Materials Resources http://sharepoint.ssw.com.au/Training/UTSSQL/ Course Overview Session Date Time Topic 1 Tuesday 06-03-2012 18:00 - 21:00 SQL Server 2008 Management Studio 2 Tuesday 13-03-2012 18:00 - 21:00 T-SQL Enhancements 3 Tuesday 20-03-2012 18:00 - 21:00 High Availability 4 Tuesday 27-03-2012 18:00 - 21:00 CLR Integration 5 Tuesday 03-04-2012 18:00 - 21:00 Full-Text Search What we did last week CLR Integration .NET .NET FX CLR What we did last week CLR Integration Stored Proc Functions Triggers Bottom Line Use T-SQL for all data operations Use CLR assemblies for any complex calculations and transformations Homework? Find all products that have a productnumber starting with BK Find all products with "Road" in the name that are Silver Find a list of products that have no review Find the list price ([listprice]) of all products in our shop What is the sum of the list price of all our products Find the product with the maximum and minimum listprice Find a list of products with their discount sale (hint see Sales.SalesOrderDetail) Find the sum of prices of the products in each subcategory Session 5 SQL Server Full-Text Search using Full-Text search in SQL Server 2008 Agenda What is Full text search The old way 2005 The new way 2008 How to Querying What is Fulltext search SELECT * FROM [Northwind].[dbo].[Employees] WHERE Notes LIKE '%grad%‘ What is REAL Fulltext search Allows searching for text/words in columns Based on special index Similar words Plural of words Full-text index (Full text catalog) SELECT * FROM [Northwind].[dbo].[Employees] WHERE FREETEXT(*,'grad‘) Theory Full-Text Search Terminology 1/3 Full-text index Information about words and their location in columns Used in full text queries Full-text catalog Group of full text indexes (Container) Word breaker Tokenizes text based on language Full-Text Search Terminology 2/3 Token Stemmer Generate inflectional forms of a word (language specific) Filter Word identified by word breaker Extract text from files stored in a varbinary(max) or image column Population or Crawl Creating and maintaining a full-text index. Full-Text Search Terminology 3/3 Stopwords/Stoplists not relevant word to search e.g. ‘and’, ‘a’, ‘is’ and ‘the’ in English Accent insensitivity cafè = cafe Fulltext search – Under the hood The old way! SQL 2005 The new way! SQL 2008 How to Administration Administering Full-Text Search Full-text administration can be separated into three main tasks: Creating/altering/dropping full-text catalogs Creating/altering/dropping full-text indexes Scheduling and maintaining index population. Administering Full-Text Search sp_fulltext_catalog sp_help_fulltext_catalogs_cursor sp_fulltext_column sp_help_fulltext_columns sp_fulltext_database sp_help_fulltext_columns_cursor sp_fulltext_service sp_help_fulltext_tables sp_fulltext_table sp_help_fulltext_tables_cursor sp_help_fulltext_catalogs Index vs. Full-text index Full-text indexes Stored in the file system, but administered through the database. Stored under the control of the database in which they are defined Only 1 full-text index allowed per table Regular SQL Server indexes Stored under the control of the database in which they are defined Several regular indexes allowed per table Addition of data to full-text indexes, called population, can be requested Updated automatically when the through either a schedule or a specific data upon which they are based is request, or can occur automatically inserted, updated, or deleted with the addition of new data Administering Full-Text Search Automatic update of index Manually repopulate full text index Slows down database performance Time consuming Asynchronous process in the background Periods of low activity Index not up to date How to Creating a Full Text Catalog SQL 2005 Only SQL 2008 is smart SQL 2005 Creating a Full-Text Catalog (SQL 2005) Syntax CREATE FULLTEXT CATALOG catalog_name [ON FILEGROUP filegroup ] [IN PATH 'rootpath'] [WITH <catalog_option>] [AS DEFAULT] [AUTHORIZATION owner_name ] <catalog_option>::= ACCENT_SENSITIVITY = {ON|OFF} Example USE AdventureWorks_FulllText CREATE FULLTEXT CATALOG AdventureWorks_FullTextCatalog ON FILEGROUP FullTextCatalog_FG WITH ACCENT_SENSITIVITY = ON AS DEFAULT AUTHORIZATION dbo Creating a Full-Text Catalog Step by step 1. Create a directory on the operating system named C:\test 2. Launch SSMS, connect to your instance, and open a new query window 3. Add a new filegroup to the AdventureWorks_FulllText USE Master GO ALTER DATABASE AdventureWorks_FulllText GO ALTER DATABASE AdventureWorks_FulllText ADD FILE (NAME = N’ AdventureWorks_FulllText _data’, FILENAME=N’C:\TEST\ AdventureWorks_FulllText _data.ndf’, SIZE=2048KB, FILEGROTH=1024KB ) TO FILEGROUP [FTFG1] GO 4. Create a full-text catalog on the FTFG1 filegroup by executing the following command: USE AdventureWorks_FulllText GO CREATE FULLTEXT CATALOG AWCatalog on FILEGROUP FTFG1 IN PATH ‘C:\TEST’ AS DEFAULT; GO SQL 2008 SQL 2008 How to Creating Full Text Indexes Property of column Full-text Index property window How to Index and Catalog Population Populating a Full-Text Index Because of the external structure for storing full-text indexes, changes to underlying data columns are not immediately reflected in the full-text index. Instead, a background process enlists the word breakers, filters and noise word filters to build the tokens for each column, which are then merged back into the main index either automatically or manually. This update process is called population or a crawl. To keep your fulltext indexes up to date, you must periodically populate them. Populating a Full-Text Index You can choose from there modes for full-text population: Full Incremental Update Populating a Full-Text Index Full Incremental Read and process all rows Very resource-intensive Automatically populates the index for rows that were modified since the last population Requires timestamp column Update Uses changes tracking from SQL Server (inserts, updates, and deletes) Specify how you want to propagate the changes to the index • • AUTO automatic processing MANUAL implement a manual method for processing changes Populating a Full-Text Index Example ALTER FULLTEXT INDEX ON Production.ProductDescription START FULL POPULATION; ALTER FULLTEXT INDEX ON Production.Document START FULL POPULATION; Populating a Full-Text Catalog Syntax ALTER FULLTEXT CATALOG catalog_name { REBUILD [ WITH ACCENT_SENSITIVITY = { ON | OFF } ] | REORGANIZE | AS DEFAULT } REBUILD deletes and rebuild ACCENT_SENSITIVITY change REORGANIZE merges all changes Performance Frees up disk and memory Populating a Full-Text Catalog Example USE AdventureWorks_FulllText; ALTER FULLTEXT CATALOG AdventureWorks_FullTextCatalog REBUILD WITH ACCENT_SENSITIVITY=OFF; -- Check Accentsensitivity SELECT FULLTEXTCATALOGPROPERTY('AdventureWorks_FullTextCatalog', 'accentsensitivity'); Managing Population Schedules In SQL 2000, full text catalogs could only be populated on specified schedules SQL 2005/2008 can track database changes and keep the catalog up to date, with a minor performance hit Querying SQL Server Using Full-Text How to Search Querying SQL Server Using Full-Text Search Full-Text query keywords FREETEXT FREETEXTTABLE CONTAINS CONTAINSTABLE FREETEXT Fuzzy search (less precise ) Inflectional forms (Stemming) Related words (Thesaurus) FREETEXT Fuzzy search (less precise ) Inflectional forms (Stemming) Related words (Thesaurus) SELECT ProductDescriptionID, Description FROM Production.ProductDescription WHERE [Description] LIKE N'%bike%'; SELECT ProductDescriptionID, Description FROM Production.ProductDescription WHERE FREETEXT(Description, N’bike’); FREETEXTTABLE + rank column Value between 1 and 1,000 Relative number, how well the row matches the search criteria SELECT PD.ProductDescriptionID, PD.Description, KEYTBL.[KEY], KEYTBL.RANK FROM Production.ProductDescription AS PD INNER JOIN FREETEXTTABLE(Production.ProductDescription, Description, N’bike’) AS KEYTBL ON PD.ProductDescriptionID = KEYTBL.[KEY] CONTAINS • Lets you precise what fuzzy matching algorithm to use SELECT ProductDescriptionID, Description FROM Production.ProductDescription WHERE CONTAINS(Description, N'bike'); SELECT ProductDescriptionID, Description FROM Production.ProductDescription WHERE CONTAINS(Description, N‘”bike*”'): FORMSOF • Lets you precise what fuzzy matching algorithm to use SELECT ProductDescriptionID, Description FROM Production.ProductDescription WHERE CONTAINS(Description, N' FORMSOF (INFLECTIONAL, ride) '); SELECT ProductDescriptionID, Description FROM Production.ProductDescription WHERE CONTAINS(Description, N' FORMSOF (THESAURUS, ride) '); INFLECTIONAL Consider word stems in search “ride“ “riding", “riden", .. THESAURUS Return Synonyms "metal“ "gold", "aluminium"," steel", .. Word proximity NEAR ( ~ ) How near words are in the text/document SELECT ProductDescriptionID, Description FROM Production.ProductDescription WHERE CONTAINS(Description, N'mountain NEAR bike'); SELECT ProductDescriptionID, Description FROM Production.ProductDescription WHERE CONTAINS(Description, N'mountain ~ bike'); SELECT ProductDescriptionID, Description FROM Production.ProductDescription WHERE CONTAINS(Description, 'ISABOUT (mountain weight(.8), bikes weight (.2) )'); Querying SQL Server Using Full-Text Search Full-text search much more powerful than LIKE More specific, relevant results Better performance • LIKE for small amounts of text • Full-text search scales to huge documents Provides ranking of results Common uses Search through the content in a text-intensive, database driven website, e.g. a knowledge base Search the contents of documents stored in BLOB fields Perform advanced searches • e.g. with exact phrases - "to be or not to be" (however needs care!) • e.g. Boolean operators - AND, OR, NOT, NEAR Writing FTS terms The power of FTS is in the expression which is passed to the CONTAINS or CONTAINSTABLE function Several different types of terms: Simple terms Prefix terms Generation terms Proximity terms Weighted terms Simple terms Either words or phrases Quotes are optional, but recommended Matches columns which contain the exact words or phrases specified Case insensitive Punctuation is ignored e.g. CONTAINS(Column, CONTAINS(Column, CONTAINS(Column, CONTAINS(Column, 'SQL') ' "SQL" ') 'Microsoft SQL Server') ' "Microsoft SQL Server" ') Prefix terms Matches words beginning with the specified text e.g. CONTAINS(Column, ' "local*" ') • matches local, locally, locality CONTAINS(Column, ' "local wine*" ') • matches "local winery", "locally wined" Generation terms Inflectional FORMSOF(INFLECTIONAL, "expression") "drive“ "drove", "driven", .. (share the same stem) When vague words such as "best" are used "good" is OK Thesaurus FORMSOF(THESAURUS, "expression") "metal“ "gold", "aluminium"," steel", .. Both return variants of the specified word, but variants are determined differently Thesaurus Supposed to match synonyms of search terms – but the thesaurus seems to be very limited Does not match plurals Not particularly useful http://technet.microsoft.com/enus/library/cc721269.aspx#_Toc202506231 Proximity terms Syntax CONTAINS(Column, 'local NEAR winery') CONTAINS(Column, ' "local" NEAR "winery" ') Important for ranking Both words must be in the column, like AND Terms on either side of NEAR must be either simple or proximity terms Weighted terms Each word can be given a rank Can be combined with simple, prefix, generation and proximity terms e.g. CONTAINS(Column, 'ISABOUT( performance weight(.8), comfortable weight(.4) )') CONTAINS(Column, 'ISABOUT( FORMSOF(INFLECTIONAL, "performance") weight (.8), FORMSOF(INFLECTIONAL, "comfortable") weight (.4) )') Pro Contra Pros? Cons? Disadvantages Full text catalogs Disk space Up-to-date Continuous updating performance hit Queries Complicated to generate Generated as a string Generated on the client Advantages Backing up full text catalogs SQL 2005 Included in SQL backups by default Retained on detach and re-attach Option in detach dialog to include keep the full text catalog In SQL2008 you don’t have to worry about this Advantages Much more powerful than LIKE Specific Ranking Performance Pre-computed ranking (FREETEXTTABLE) Configurable Population Schedule Continuously track changes, or index when the CPU is idle Quick tips - Podcasts Pluralcast - SQL Server Under the Covers http://shrinkster.com/1ff4 Dotnetrocks - Search for SQL Server http://www.dotnetrocks.com/archives.aspx RunAsRadio - Search for SQL Server http://www.runasradio.com/archives.aspx Session 5 Lab Full text search Download from Course Materials Site (to copy/paste scripts) or type manually: http://sharepoint.ssw.com.au/Training/UTSSQL/ 3 things… EricPhan@ssw.com.au http://ericphan.info twitter.com/ericphan 3 things… mehmet@ssw.com.au http://blog.ozdemir.id.au twitter.com/mozdemir_au Thank You! Gateway Court Suite 10 81 - 91 Military Road Neutral Bay, Sydney NSW 2089 AUSTRALIA ABN: 21 069 371 900 Phone: + 61 2 9953 3000 Fax: + 61 2 9953 3105 info@ssw.com.au www.ssw.com.au