SQL Server: “New” String Functions

String manipulation in T-SQL code is not a new concept. I intentionally used “new” to indicate that some of the string functions discussed in this article were added after SQL Server 2016, which was the version I was using until a recent upgrade. Microsoft is constantly adding new features to SQL Server, many times it is to add features already available in other platforms. Here are a few string functions that were added after SQL Server 2016 that I’ve used recently that simplified a lot of my code. All examples will reference the sample HR data from my recent post on SQL Server Greatest and Least Functions.

  1. CONCAT_WS
    1. Sample Use Case
  2. STRING_AGG
    1. Sample Use Case
  3. STRING_SPLIT
    1. Sample Use Case
  4. Wrapping Up

CONCAT_WS

CONCAT_WS is useful for concatenating, or joining, strings with a separator (WS). Use cases could include building a delimited list across several columns or from multiple variables, building a fully qualified database object name from the parts, or any number of things where you need to build a list. CONCAT_WS input values are the separator in the first parameter, followed by N-values that will be concatenated.

Using the sample HR data, we can generate output of employee names and departments in LAST NAME,FIRST NAME,DEPARTMENT NAME format.

SELECT
	e.emp_id,
	CONCAT_WS(',',e.lname,e.fname,d.dept_name) AS emp_info	
FROM
	#tbl_employees AS e
JOIN
	#tbl_emp_dept AS ed
ON
	e.emp_id = ed.emp_id
JOIN
	#tbl_departments AS d
ON
	d.dept_id = ed.dept_id
CONCAT_WS output

Sample Use Case

A recent use case for CONCAT_WS in a project was to combine a set of columns into a single string, and then use HASHBYTES to generate a hashed value for the data as a quick comparison method for data changes. Before CONCAT_WS, I was using CROSS APPLY with FOR XML RAW to accomplish this. Below are samples of both methods – CONCAT_WS makes things much easier to read. The output from HASHBYTES is different due to the different inputs; however, the actual hash value doesn’t matter if using the same hash method and inputs on source and target as in my case we don’t store the hash value, it’s generated at run-time.

SELECT
	e.emp_id,
	h.x AS hash_input,
	HASHBYTES('MD5',h.x) AS emp_name_hash
FROM 
	#tbl_employees AS e
CROSS APPLY(
	SELECT
		e.lname,
		e.fname
	FOR XML RAW) AS h(x)


/*USING CONCAT_WS*/
SELECT
	e.emp_id,
	CONCAT_WS('%',e.lname,e.fname) AS hash_input,
	HASHBYTES('MD5',CONCAT_WS('%',e.lname,e.fname)) AS emp_name_hash
FROM
	#tbl_employees AS e
SQL Output

STRING_AGG

STRING_AGG does for rows of data what CONCAT_WS does for columns. STRING_AGG combines the values from a single column of data and generates a delimited list of the values. Before STRING_AGG, I used methods such as += with a variable or different variations of the STUFF function with CROSS APPLY and FOR XML. STRING_AGG certainly simplified these methods. Notice in the code sample below that the output is the same, but the STRING_AGG code is one line versus several lines with nested functions and conversions.

/*CROSS APPLY WITH FOR XML PATH AND STUFF*/
DECLARE @emp_id_list VARCHAR(100)
SELECT
    @emp_id_list =
(
    SELECT
        STUFF((
                  SELECT '|' + CONVERT(VARCHAR,emp_id) FROM #tbl_employees FOR XML PATH('')
              )
             ,1
             ,1
             ,''
             )
);
SELECT @emp_id_list AS employee_ids

/*STRING AGG*/
SELECT STRING_AGG(emp_id,'|') AS employee_ids FROM #tbl_employees;
Pipe-delimited list of employee_ids

Sample Use Case

In a recent project, I used STRING_AGG to generate a delimited list of all possible ID values in a table as part of an application stored procedure. One of the stored procedure input parameters is a pipe-delimited list of ID values to return; however, it could also be set to NULL by default to return all values. By combining STRING_AGG with STRING_SPLIT I was able to accomplish applying this “dynamic filter” easily.

/*VARIABLE DECLARATION*/
DECLARE
	@input_param VARCHAR(50) = '1|6' /*INPUT VALUE SPECIFIED*/
	,@all_emp_ids VARCHAR(50) = NULL

/*GET LIST OF ALL EMPLOYEE IDS*/
SELECT @all_emp_ids = STRING_AGG(emp_id,'|') FROM #tbl_employees

/*IF INPUT IS NULL, SET INPUT TO ALL*/
SELECT @input_param = COALESCE(@input_param,@all_emp_ids)

/*GET EMPLOYEE DATA*/
SELECT e.* FROM #tbl_employees AS e
JOIN STRING_SPLIT(@input_param,'|') AS l
ON l.value = e.emp_id
Employee Data Filtered
/*VARIABLE DECLARATION*/
DECLARE
	@input_param VARCHAR(50) = NULL /*INPUT IS DEFAULT/NULL*/
	,@all_emp_ids VARCHAR(50) = NULL

/*GET LIST OF ALL EMPLOYEE IDS*/
SELECT @all_emp_ids = STRING_AGG(emp_id,'|') FROM #tbl_employees

/*IF INPUT IS NULL, SET INPUT TO ALL*/
SELECT @input_param = COALESCE(@input_param,@all_emp_ids)

/*GET EMPLOYEE DATA*/
SELECT e.* FROM #tbl_employees AS e
JOIN STRING_SPLIT(@input_param,'|') AS l
ON l.value = e.emp_id
All Employee Data

STRING_SPLIT

STRING_SPLIT was first introduced in SQL Server 2016 and provided functionality that many developers had written custom functions for. I have written several variants of a split function in my career. STRING_SPLIT does exactly what it sounds like – an input string is split using a delimiter character into its parts and returned as a table. In SQL Server 2022, a new optional parameter was added that includes the ordinal value on the output which indicates the value’s relative location in the original string.

Sample Use Case

Sometimes I need to split email addresses into parts (username/domain) for storage or other analysis. String split makes that relatively simple to do. Below are some randomly generated email addresses in a table and using STRING_SPLIT with a CROSS APPLY and the new SQL Server 2022 ordinal option, I can easily retrieve the full email address, username, and domain in a single table.

CREATE TABLE #tbl_email (email_address VARCHAR(250));

INSERT INTO #tbl_email (email_address)
VALUES
('cope-dereck99@hotmail.com      '),
('miner_christopher44@hotmail.com'),
('yang_barrett88@mail.com        '),
('babcock_ingrid5@outlook.com    '),
('vance-fletcher58@mail.com      '),
('alberto_bautista56@yahoo.com   ');

SELECT
    e.*
   ,u.n AS user_name
   ,d.n AS domain_name
FROM
    #tbl_email AS e
CROSS APPLY
(
    SELECT
        value
    FROM
        STRING_SPLIT(e.email_address, '@', 1)
    WHERE
        ordinal = 1
) AS u(n)
CROSS APPLY
(
    SELECT
        value
    FROM
        STRING_SPLIT(e.email_address, '@', 1)
    WHERE
        ordinal = 2
) AS d(n);	
Email Address, Username and Domain output

Wrapping Up

String manipulation in T-SQL is fairly common in practice and SQL Server does a nice job of providing functions to help with common manipulations that many developers wrote custom functions for. The functions discussed in this article significantly simplified many of my stored procedures recently and made the code much more readable. “Future me” will thank “current me” for swapping out all the custom work for simple, easy-to-read functions.

Leave a comment