T-SQL Tuesday #92: Lessons Learned the Hard Way

tsql2sday-300x300This month’s T-SQL Tuesday is hosted by Raul Gonzalez and he’s asked everyone to share things we might be a bit embarrassed about:

For this month, I want you peers to write about those important lessons that you learned the hard way, for instance something you did and put your systems down or maybe something you didn’t do and took your systems down. It can be also a bad decision you or someone else took back in the day and you’re still paying for it…

  • In the stress/performance testing portion of an upgrade of a critical system, we were short on disk space. So, rather than having a separate set of VMs for the performance testing (as we needed to be able to get back to functional testing quickly), we decided to just take VM snapshots of all the servers. Testing was delayed a day or two – but we didn’t switch off the snapshots. Then we started testing and performance was terrific…for about five minutes. Then everything came to a screeching halt. Panicked, we thought we were going to need a pile of new hardware until the VMWare admin realized that our disks were getting hammered and we still had those active snapshots.
    Lesson learned: If you take VM-level snapshots of your database server and let them “soak” for an extended period, you’re gonna have a bad time. Unless you need to take a snapshot of the host OS or instance configuration itself, use a database snapshot instead of a VM-level snapshot.

  • A couple of times, I’ve had under-performing VMs running SQL Server. As I hadn’t been involved in the configuration, I thought everything had been provisioned properly. Turns out…not so much. Memory reservations, storage configuration, power profiles, all set up for suboptimal performance.
    Lesson learned: Ask your VMWare admin if they’ve perused the best practices guide and review things yourself before going down the rabbit hole of SQL Server configuration & query tuning. If the underlying systems aren’t configured well, you’ll spin your wheels for a long time.

  • In doing a configuration review of a rather large (production) instance, I noted that at least one configuration option was still set to the default value – Cost Threshold for Parallelism was stuck at 5. Running sp_BlitzCache, I found that I had quite a few simple queries going parallel and huge CXPACKET waits. CXPACKET isn’t bad per se, but if you’ve got a low-cost query that’s going parallel and waiting on threads where it could be running faster overall single-threaded (verified this was the case for several of the top offenders), increasing the cost threshold can help. I did some checking, verified that it was a configuration change I could make on the fly, and set the value to 50.
    And then everything. Slowed. Down.
    When I made this configuration change on the test instance, it wasn’t much a problem. But that was a much smaller instance, with much less traffic. What I failed to fully comprehend was the impact of this operation. I overlooked that changing this setting (and a number of others I wasn’t aware of) blows out the plan cache. In the case of this instance, about 26Gb of plan cache. Not only was performance impacted while the plan cache was re-filled, we took a hit while all the old plans were being evicted from cache.
    Lesson learned: Even if it seemed OK in test, those “low impact” changes can have a much larger impact on production unless you can make test mirror production in every way. So plan when you make these changes accordingly.

We learn the most from our mistakes. We can learn almost as much from the mistakes of others. Learn from mine.

Don’t Forget the Network

A few weeks ago I was looking at a query and got tripped up by the network and my own forgetfulness. It was a pretty simple query with a simple-looking execution plan. It didn’t even do that much work. About 20K logical reads and returned 200K records. For a server as large as the one I was working with, this should have been nothing. Instead, was waiting three minutes to get my results. My first thought was “aha, this query must need tuning, maybe a new index.”

I started experimenting in a non-production copy of the database on the instance. Added a couple indexes, adjusted the query itself, even a few query/index hints just to see what worked. Each change definitely had an impact on the query. The execution plan shape changed a bit, and the costs shifted around between a few operators. But still I was waiting minutes for the complete results.

The Plot Thickens

I tried my query against the development server. Production has 24X as much RAM, 3X as many CPU cores, and faster disks when compared to the dev box, so I expected even worse performance. Nope. Near-instant results, even for the original query with no additional indexes, hints or any other trickery.

OK, something’s definitely up now. I hopped back to production and while running my query, I ran sp_BlitzFirst to peek at what was going on. Lo and behold, the top wait stat was ASYNC_NETWORK_IO. I’ve seen lots of this in the past, attributed to poorly-written applications that just couldn’t consume the data as fast as SQL Server could churn it out. But this is SQL Server Management Studio, it’s absorbing the data as fast as it can receive it.

And that’s when it hit me. SSMS was consuming the data as quickly as it received it. The development server is in the office. The production server is in a datacenter 1/3 of a continent away and the connection between that datacenter and the office sometimes gets bogged down. The problem wasn’t that SSMS couldn’t drink from the firehose – the firehose had a clamp around it!

To test this idea, I made one last change to the query – I selected the data into a temp table instead of sending the results to SSMS. Round-trip time: two seconds. Conclusion: the network itself was causing the slowdown and this query needs no tuning!

As it turns out, SSMS has a way to discard query results so that you don’t have to resort to dumping results to a temp table – Discard query results after execution. It’s similar to a /dev/null for result sets. If you only need an actual execution plan but no query results (like I did here), check it out.

I Finally Get Cross Apply!

For years I’ve looked at various queries online in sample code, diagnostic queries using DMVs, and the like and seen CROSS APPLY in the FROM clauses. But I’ve never really managed to comprehend what it was for or how it worked because I never saw a case where it was directly applied to something I was doing.

Finally, this week I had a breakthrough. I was working on updating a bunch of data but it was breaking on a small subset of that data. In this case, I was attempting to JOIN two tables on fields that should have been INTs, but in a very small number of cases one side was using a comma-delimited string. The user told me that someone else had done these updates in the past and didn’t encounter the problem I was having (so I knew that it was something i was doing “wrong”), but given that it was only a handful of broken updates she was OK with manually doing the updates (we were scripting it because we were updating potentially tens of thousands of records).

I am not OK with manually fixing this in the future. I wanted to know how the other DBA had done it before. I dug into some history and found CROSS APPLY. My nemesis. I was determined to figure out how to use it this time.

Setting the Stage

Let’s set up three simple tables to keep track of airports and what state each airport is in. But our developer doesn’t totally get database design and in his state-to-airport mapping table, he allows for a comma-separated list of airports associated with each state.

    CREATE TABLE #States
    ([Id]      INT IDENTITY(1, 1),
    StateName NVARCHAR(30) NOT NULL
    );
    CREATE TABLE #Airports
    ([Id]     INT IDENTITY(1, 1),
    IATACode CHAR(3) NOT NULL
    );
    CREATE TABLE #StateAirports
    (StateId  INT PRIMARY KEY NOT NULL,
    Airports NVARCHAR(50)
    )

This makes getting a list of airports and their associated state names tricky at best if we don’t know about CROSS APPLY. With CROSS APPLY, it’s pretty straightforward.

Solution

Here’s the finished query.

    SELECT s.statename,
        a.iatacode
    FROM #StateAirports SA1
        CROSS APPLY string_split(SA1.airports, ',') AS SA2
        JOIN #Airports A ON A.Id = SA2.value
        JOIN #states S ON S.Id = SA1.stateid

string_split() is a Table Valued Function which we finally got in SQL Server 2016 after far too many years of having to write (or, let’s face it, copy from someone’s blog post) inefficient string splitting functions. Important note: even if your database engine is SQL Server 2016, the database you’re operating in must be at CompatibilityLevel 130

Breaking it down

If we take CROSS APPLY and break it down into its parts, it finally starts to make sense.
* APPLY the string_split() function to the Airports field of the #StateAirports table
* Append the each row of string_split()‘s output to the corresponding row on #StateAirports (similar to a CROSS JOIN but not exactly)

So now I have N rows for each StateId in #StateAirports, where N is the number of values in the comma-separated field. And JOINed to each row is one of the rows from the output of string_split().

    SELECT *
    FROM #StateAirports SA1
        CROSS APPLY string_split(SA1.airports, ',') AS SA2

From there, the query is pretty normal otherwise, JOINing to the other two tables to translate the state & airport ID numbers to their text values.

Hopefully this helps others get a handle on CROSS APPLY and find useful places for it. This had been a head-scratched for me for years, but only because I didn’t have an example that clearly broke down how to use it and what was going on. In hindsight, I probably could have used it in some analysis I did at a previous job but instead resorted to parsing & processing comma-separated data in a PowerShell script.

Adding Application Name to Invoke-SqlCmd2

In a previous post, I expressed some frustration over Invoke-SqlCmd not setting an Application Name for its ODBC connection, leaving us with the generic .NET SqlClient Library when looking at active sessions in sp_who2 and sp_whoisactive (and any other monitoring tool). Unfortunately, I can’t really do anything about Invoke-SqlCmd aside from posting a suggestion on Connect or the Client Tools Trello board, but Invoke-SqlCmd2 has the same issue and that’s on GitHub. So, here we go!

In its current form, if a SqlConnection object isn’t passed into Invoke-SqlCmd2, the cmdlet does the following:

if ($Credential)
{
$ConnectionString = "Server={0};Database={1};User ID={2};Password=`"{3}`";Trusted_Connection=False;Connect Timeout={4};Encrypt={5}" -f $SQLInstance,$Database,$Credential.UserName,$Credential.GetNetworkCredential().Password,$ConnectionTimeout,$Encrypt
}
else
{
$ConnectionString = "Server={0};Database={1};Integrated Security=True;Connect Timeout={2};Encrypt={3}" -f $SQLInstance,$Database,$ConnectionTimeout,$Encrypt
}
$conn = New-Object System.Data.SqlClient.SQLConnection
$conn.ConnectionString = $ConnectionString
Write-Debug "ConnectionString $ConnectionString"

I decided to change this around so that it no longer uses string formatting, but instead a SqlConnectionStringBuilder. I had a couple reasons for this:

  • It will eliminate redundant code. There are several common elements in each of the ConnectionStrings above. If more complex logic is needed, there are potentially more copies of this ConnectionString kicking around.
  • It’s prone to copy/paste and other editing errors. If there’s a change that affects both versions of the ConnectionString and the developer just copies the line from one branch of the if statement to the other, code will be lost or invalid values will be substituted because of positioning.

With this in mind, I factored the common elements out to build the base of the ConnectionString, then added the remaining elements conditionally based on the cmdlet inputs.

$CSBuilder = New-Object -TypeName System.Data.SqlClient.SqlConnectionStringBuilder
$CSBuilder["Server"] = $SQLInstance
$CSBuilder["Database"] = $Database
$CSBuilder["Connection Timeout"] = $ConnectionTimeout
if ($Encrypt) {
$CSBuilder["Encrypt"] = $true
}
if ($Credential) {
$CSBuilder["Trusted_Connection"] = $false
$CSBuilder["User ID"] = $Credential.UserName
$CSBuilder["Password"] = $Credential.GetNetworkCredential().Password
} else {
$CSBuilder["Integrated Security"] = $true
}

Before going any further in adding support for inserting Application Name into the ConnectionString, I had to add a parameter to the cmdlet itself.

[Parameter( Position=11, Mandatory=$false )]
[Alias( 'Application', 'AppName' )]
[String]
$ApplicationName

With that complete, I can now add it into the SqlConnectionStringBuilder.

if ($ApplicationName) {
$CSBuilder["Application Name"] = $ApplicationName
} else {
$ScriptName = (Get-PSCallStack)[-1].Command.ToString()
if ($ScriptName -ne "") {
$CSBuilder["Application Name"] = $ScriptName
}
}

Because ApplicationName is an optional parameter, I had to account for cases where it’s not specified by the caller. What the code above is doing is looking at the entire call stack and going back up to the very top to get the name of the script file that was run and ultimately called Invoke-SqlCmd2. Finally, I extract the ConnectionString from the SqlConnectionStringBuilder and assign it to the SqlConnection‘s ConnectionString property.

$ConnectionString = $CSBuilder.ToString()
$conn.ConnectionString = $ConnectionString
Write-Debug "ConnectionString $ConnectionString"

My code changes complete, I reviewed my work and sent my first real pull request off to Warren (B|T) as PR #7 for the module.

Make Your Application’s Name Heard

Odds are, you’ve got more than one application or script accessing your database or SQL Server instance at any given time. You’re probably stacking them on a small number of servers in an attempt to cut down on licensing costs and resource usage. All those Powershell scripts running on the central job server are running under a single service account, and you’ve got a lazy vendor who set up both the website and back-end application server to run under the same account, maybe even on a single app/web server.

Then the call comes in. “The database is slow.” Before firing up your monitoring suite, you take a quick pass with sp_who2 or sp_whoisactive and you’re greeted with a dozen sessions, all sporting a program_name of .Net SqlClient Data Provider. Terrific! Which session is coming from which application and causing all the trouble? Not so easy to figure out.

Fortunately, the .NET SqlClient (and other ODBC drivers as well) has a built-in solution. Your application’s connection string has quite a few parameters available to provide configuration and information, and one that seems to get overlooked is Application Name. This one does exactly what it says on the tin – it lets you specify a name that will be displayed to anyone looking for it in SQL Server, including sp_whoisactive. Anyplace you have the ability to write a connection string, you can use this. It costs you nothing!

  • If you’re using System.Data.SqlClient.ConnectionStringBuilder, it’s just another item in the properties collection.
  • If you’re constructing your connection string as a regular string, just add Application Name=Andy's Awesome App; to the end of your current connection string.

$DBConnection = New-Object System.Data.SqlClient.SqlConnection;
$DBCSBuilder = New-Object System.Data.SqlClient.SqlConnectionStringBuilder;
$DBCSBuilder['Data Source'] = ".\sql2016";
$DBCSBuilder['Initial Catalog'] = "WideWorldImporters";
$DBCSBuilder['Application Name'] = "Andy's Awesome Application";
$DBCSBuilder['Integrated Security'] = "true";
$DBConnection.ConnectionString = $DBCSBuilder.ToString();

#alternative method:
#$DBConnection.ConnectionString = "Data Source=.\sql2016;Initial Catalog=WideWorldImporters;Integrated Security=true;Application name=Andy's Awesome Application;"

$DBConnection.Open();
$QueryCmd = $DBConnection.CreateCommand();
$QueryCmd.CommandText = "select c.CityName from application.cities c join application.stateprovinces s on c.StateProvinceID = s.stateprovinceid join application.countries ct on ct.CountryID = s.CountryID where ct.CountryName = 'United States' and s.StateProvinceName = 'New York'; waitfor delay '00:00:15';";
$SqlAdapter = New-Object System.Data.SqlClient.SqlDataAdapter;
$QueryCmd.Connection = $DBConnection;
$SqlAdapter.SelectCommand = $QueryCmd;
$DataSet = New-Object System.Data.DataSet;
$SqlAdapter.Fill($DataSet);
$DataSet.Tables[0];


import-module sqlserver;
Invoke-Sqlcmd -ServerInstance .\sql2016 -Database wideworldimporters -Query "select c.CityName from application.cities c join application.stateprovinces s on c.StateProvinceID = s.stateprovinceid join application.countries ct on ct.CountryID = s.CountryID where ct.CountryName = 'United States' and s.StateProvinceName = 'New York'; waitfor delay '00:00:15';"
remove-module sqlserver;

Here’s the difference between using SqlClient and Invoke-SQLCmd in Powershell when seen from the SQL Server side:

invoke-sqlcmd-app-name

While the .NET Framework lets you specify this fairly easily, as we move up in abstraction layers these options become buried or entirely unavailable. This is the case with the Powershell Invoke-SqlCmd cmdlet. One of the many benefits of using Invoke-SqlCmd2 is that it does construct its own connection string (if you don’t pass in a SqlConnection object) but it doesn’t expose Application Name as an available parameter for it.

In an upcoming post, I’ll show how I added Application Name support to Invoke-SqlCmd2.

Name Your Defaults So SQL Server Doesn’t

Something in SQL Server that isn’t always obvious to beginners is that when you create a default value for a column on a table, SQL Server creates a constraint (much like a primary or foreign key). All constraints must have a name, and if one isn’t specified SQL Server will generate one for you. For example:

CREATE TABLE [dbo].[point_types] (
[typeid] [int] NOT NULL DEFAULT(NEXT VALUE FOR [pointtypeid])
,[typename] [nvarchar](30) NOT NULL DEFAULT 'Unspecified'
,CONSTRAINT [PK_PointType] PRIMARY KEY CLUSTERED ([typeid] ASC)
)
GO

We’ve got a simple table here and both fields have a default value set (the primary key’s value is generated from a sequence object, pointtypeid). Time goes on, and a change in policy comes up which requires that I change the default value of typename to Unknown. To do this, I have to drop the constraint and re-create it. To find the name of the constraint, I can either ask sp_help, or run this query:

SELECT all_columns.NAME
,default_constraints.NAME
,default_constraints.DEFINITION
FROM sys.all_columns
INNER JOIN sys.tables
ON all_columns.object_id = tables.object_id
INNER JOIN sys.schemas
ON tables.schema_id = schemas.schema_id
INNER JOIN sys.default_constraints
ON all_columns.default_object_id = default_constraints.object_id
WHERE schemas.NAME = 'dbo'
AND tables.NAME = 'point_types';

I’ve got my constraint name now, so I can drop it & re-create it

NameDefaults01

ALTER TABLE [dbo].[point_types]

DROP CONSTRAINT DF__point_typ__typen__21B6055D;
GO

ALTER TABLE [dbo].[point_types] ADD DEFAULT('Unknown')
FOR [typename];
GO

And if I re-run the above query, I can see that the constraint’s name is different.

NameDefaults02

This means that everywhere I need to change this constraint (development, test and production), I’ll need to figure out the constraint name in that particular database and drop it before re-creating it. This makes a deployment script a bit messier, as it needs more code to find those constraint names

DECLARE @constraintname VARCHAR(50);

SELECT @constraintname = default_constraints.NAME
FROM sys.all_columns
INNER JOIN sys.tables
ON all_columns.object_id = tables.object_id
INNER JOIN sys.schemas
ON tables.schema_id = schemas.schema_id
INNER JOIN sys.default_constraints
ON all_columns.default_object_id = default_constraints.object_id
WHERE schemas.NAME = 'dbo'
AND tables.NAME = 'point_types'

DECLARE @sql NVARCHAR(200) = N'alter table [dbo].[point_types] drop constraint ' + @constraintname;

PRINT @sql;

NameDefaults03

EXECUTE sp_executesql @sql;

ALTER TABLE [dbo].[point_types] ADD DEFAULT('Unknown')
FOR [typename];
GO

But this doesn’t really solve my problem, it just works around it. It’s still messy and fragile. If I need to do other operations on the default constraint, I need to go through the same exercise to find its name.

Fortunately, SQL Server lets us name default constraints just like any other constraint, and by doing so we avoid this trouble. By setting my own name for the constraint, I know what it’ll be in every database, without having to query system tables. The name can be set in both the CREATE TABLE statement and an independent ALTER TABLE.

CREATE TABLE [dbo].[point_types] (
[typeid] [int] NOT NULL DEFAULT(NEXT VALUE FOR [pointtypeid])
,[typename] [nvarchar](30) NOT NULL CONSTRAINT [DF_PT_TypeName] DEFAULT 'Unspecified'
,CONSTRAINT [PK_PointType] PRIMARY KEY CLUSTERED ([typeid] ASC)
);
GO

ALTER TABLE [dbo].[point_types]

DROP CONSTRAINT [DF_PT_TypeName];
GO

ALTER TABLE [dbo].[point_types] ADD CONSTRAINT [DF_PT_TypeName] DEFAULT('Unknown')
FOR [typename];
GO

I can also combine these in the next deployment that requires a change to the default constraint, dropping the system-generated name and establishing my own static name to make things simpler in the future.

Is explicitly naming default (or any other) constraints necessary? No, but doing so helps your database document itself, and it makes future deployment/promotion scripts simpler and less prone to breakage. SQL Server needs a name for the constraint regardless; it’s worth specifying it yourself.

Don’t Trust the Wizard

The one wizard you can trust
The one wizard you can trust

If you need to move data from one table into a new table, or even tables in a database into another database, the Import/Export Wizard in SQL Server Management Studio looks pretty tempting. Set up a source & destination, click a few buttons, kick back with a cup of tea and watch the progress bars, right?

It turns out that the wizard just isn’t as smart as it may seem. If you’re not careful, you won’t get what you’re expecting. Let’s check it out.

We’ll start by creating a real simple table in a database, containing a primary key and a computed column.

Create table sample.dbo.SourceTable (
RowId int identity(1,1) not null primary key,
Num1 int not null,
Num2 int not null,
Total as (Num1+Num2)
);

Let’s populate it with a few rows of data, then update some of that data to make sure the computed column is working. Remember, this is just to demonstrate the idea.

insert into sample.dbo.SourceTable (Num1, Num2) values (1,2);
go 100
select top 5 * from sample.dbo.SourceTable order by RowId;
update sample.dbo.SourceTable set Num1 = Num1 * RowId where RowId <= 3;
select top 5 * from sample.dbo.SourceTable order by RowId;

wizard_Screenshot 1

Great! We’ve got data, the computed columns are working, let’s copy it over to a new table in another database. We’ll just going to click Next, Next, Next through the wizard this time around.

wizard_Screenshot 2
wizard_Screenshot 3
wizard_Screenshot 4
wizard_Screenshot 5
Success! Our table has been copied and the data’s all there.

select top 5 * from Sample2.dbo.SourceTable order by RowId;

wizard_Screenshot 6

Let’s do some work on our new table and check out the results.

select top 5 * from Sample2.dbo.SourceTable order by RowId;
update Sample2.dbo.SourceTable set Num2 = Num2 * RowId where RowId < 3;
select top 5 * from Sample2.dbo.SourceTable order by RowId;

wizard_Screenshot 7

Woah! That’s not right. That Total column is supposed to be Num1 + Num2, and last time I checked 2 + 4 was not 4. Let’s keep going and try adding a new record the same way it was done earlier.

insert into Sample2.dbo.SourceTable (Num1, Num2) values (100,200);

Cannot insert the value NULL into column 'RowId', table 'Sample2.dbo.SourceTable'; column does not allow nulls. INSERT fails.

Huh. Now that’s really odd, isn’t it? RowId is supposed to be an identity – we shouldn’t have to populate it. What is going on here? Let’s script out the table.

USE [Sample2]
GO

/****** Object: Table [dbo].[SourceTable] Script Date: 2015-11-10 22:36:23 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[SourceTable](
[RowId] [int] NOT NULL,
[Num1] [int] NOT NULL,
[Num2] [int] NOT NULL,
[Total] [int] NULL
) ON [PRIMARY]

GO

This is all kinds of wrong! What happened to the primary key? Or the computed column? Well, it turns out that the wizard isn’t that smart, and if you just take all the default values, you’re going to get burned. Let’s go back to the wizard and click that Edit Mappings button in the Select Source Tables and Views screen.
wizard_Screenshot 8

Well…that looks like what we got above. And it’s not what we wanted. If we click Edit SQL, this is confirmed – the table being created is not defined the same way the source table is being defined.
wizard_Screenshot 9
Fortunately, we can edit the SQL here and make it match the source table definition, then finish the wizard.
wizard_Screenshot 10
OK, data’s copied – what do we have?

select top 5 * from Sample3.dbo.SourceTable order by RowId;
update Sample3.dbo.SourceTable set Num2 = Num2 * RowId where RowId < 3;
select top 5 * from Sample3.dbo.SourceTable order by RowId;
insert into Sample3.dbo.SourceTable (Num1, Num2) values (100,200);
select * from sample3.dbo.SourceTable where rowid >= 100 order by RowId;

wizard_Screenshot 11

Everything’s there, and it’s working the way it’s supposed to. Lesson learned: don’t blindly trust the defaults, especially the ones in a wizard. Double-check everything, and then verify that your schema works the way you expect it to before doing any business with it.

Padding Fields for Fixed-Position Data Formats

Fixed-position data formats will seemingly be with us forever. Despite the relative ease of parsing CSV (or other delimited formats), or even XML, many data exchanges require a fixed-position input. Characters 1-10 are X, characters 11-15 are Y and if the source data is fewer than 5 characters, we have to left-pad with a filler character, etc. When you’re accustomed to working with data that says what it means and means what it says, having to add “extra fluff” like left-padding your integers with a half-dozen zeroes can be a hassle.

I received a draft of a stored procedure recently which had to do exactly this. The intent is for the procedure to output the data almost entirely formatted as required, one record per line in the output file, and dump the result set to a file on disk. As it was given to me, the procedure was peppered with CASE statements like this (only more complex) in the SELECT clause:

-- Method 1
select case len(cast(logid as varchar))
when 9 then '0' + cast(logid as varchar)
when 8 then '00' + cast(logid as varchar)
when 7 then '000' + cast(logid as varchar)
when 6 then '0000' + cast(logid as varchar)
when 5 then '00000' + cast(logid as varchar)
when 4 then '000000' + cast(logid as varchar)
when 3 then '0000000' + cast(logid as varchar)
when 2 then '00000000' + cast(logid as varchar)
when 1 then '000000000' + cast(logid as varchar)
when 0 then '0000000000' + cast(logid as varchar)
end as logid
,logtext from cachedb.dbo.logs;

It’s perfectly valid, it works, and there’s nothing inherently wrong with it. But I find it a bit tough to read, and it could become trouble if the format changes later, as additional (or fewer) cases will have to be accounted for. Fortunately, the day I received this procedure was right around the day I learned about the REPLICATE() T-SQL function. Maybe we can make this simpler:

select replicate('0',10-len(cast(logid as varchar))) + cast(logid as varchar) as logid,logtext from cachedb.dbo.logs;

Not bad. But it leaves us with a magic number and similar to the previous example, if the file format changes we have to seek out these magic numbers and fix them. This is easily remedied by defining these field lengths at the beginning of the procedure, so that they’re all in one place if anything needs to change.

-- Method 2
declare @paddedlength int = 10;
select replicate('0',@paddedlength-len(cast(logid as varchar))) + cast(logid as varchar) as logid,logtext from cachedb.dbo.logs;

Yet another approach would be to pad out the value beyond what we need, then trim the resulting string back to the required length. Again, we have to be careful to not leave ourselves with magic numbers; the solution is the same as when using REPLICATE():

-- Method 3
select right('0000000000' + cast(logid as varchar), 10) as logid,logtext from cachedb.dbo.logs;
-- Or, with more flexibility/fewer magic numbers
-- Method 4
declare @paddedlength int = 10;
select right(replicate('0',@paddedlength) + cast(logid as varchar), @paddedlength) as logid,logtext from cachedb.dbo.logs;

All four methods yield the same results, as far as the data itself is concerned. But what about performance? For a table with 523,732 records, execution times were:

  1. 2,000ms CPU time, 261,785ms elapsed
  2. 2,265ms CPU time, 294,399ms elapsed
  3. 2,000ms CPU time, 297,593ms elapsed
  4. 2,078ms CPU time, 302,045ms elapsed

Each method had an identical execution plan, so I’m probably going to opt for the code that’s more readable and maintainable – method 2 or 4.

As with any tuning, be sure to test with your own data & queries.

Connecting SQLite to SQL Server with PowerShell

This post is part of Ed Leighton-Dick’s SQL New Blogger Challenge. Please follow and support these new (or reborn) bloggers.

I’m working with a number of SQLite databases as extra data sources in addition to the SQL Server database I’m primarily using for a project. Brian Davis (b|t) wrote a blog post a few years ago that covers setting up the connection quite well. In my case, I’ve got nine SQLite databases to connect to, and that gets tedious. PowerShell to the rescue!

I started by installing the SQLite ODBC Drivers and creating one ODBC connection for reference. Brian’s post linked above covers that well. But I don’t want to do it eight more times, so I’ll use the first DSN as a template so I can script the creation of the rest.

I named my DSN GSAKMyFinds. To inspect the DSN, I can use the Get-OdbcDsn cmdlet.

Get-OdbcDsn -Name GSAKMyFinds;
Name : GSAKMyFinds
DsnType : System
Platform : 64-bit
DriverName : SQLite3 ODBC Driver
Attribute : {Database, Description, NoTXN, LongNames...}

This looks pretty simple, but there’s a collection of Attributes I need to look at too. I’ll do this by expanding that property with Select-Object.

Get-OdbcDsn -Name GSAKMyFinds | Select-Object -ExpandProperty Attribute |Format-Table -AutoSize;
Name        Value                                               
----        -----                                               
Database    C:\Users\andy\Dropbox\GSAK8\data\My Finds\sqlite.db3
Description                                                     
NoTXN       0                                                   
LongNames   0                                                   
Timeout                                                         
FKSupport   0                                                   
JDConv      0                                                   
StepAPI     0                                                   
BigInt      0                                                   
NoWCHAR     0                                                   
SyncPragma                                                      
LoadExt                                                         
OEMCP       0                                                   
NoCreat     0                                                   
ShortNames  0                                                   

Now I have everything I need to create a new DSN with Add-OdbcDsn. All of my SQLite databases are stored in a directory structure under C:\Users\andy\Dropbox\GSAK8\data\, with each one in a different subdirectory. For now, I’ll just create one to make sure that I’m doing it right, then use Get-OdbcDsn to see if it matches with my GUI-created DSN.

Add-OdbcDsn -Name GSAKPuzzles -DriverName "SQLite3 ODBC Driver" -Platform 64-bit -DsnType System -SetPropertyValue "Database=C:\Users\andy\Dropbox\GSAK8\data\Far-off puzzles\sqlite.db3";
Get-OdbcDsn -Name GSAKPuzzles;
Get-OdbcDsn -Name GSAKPuzzles | Select-Object -ExpandProperty Attribute |Format-Table -AutoSize;

Results:

Name       : GSAKPuzzles
DsnType    : System
Platform   : 64-bit
DriverName : SQLite3 ODBC Driver
Attribute  : {Database}

Name     Value                                                      
----     -----                                                      
Database C:\Users\andy\Dropbox\GSAK8\data\Far-off puzzles\sqlite.db3

Looks pretty good! Note that not all of the Attributes seen above are here; those are default values that are set when creating the DSN through the GUI. After deleting my first two test DSN, I can move on to looping through all of my SQLite databases and creating DSNs for all of them. SQLite databases are just files on your filesystem, so by iterating over all of the db3 files in the parent directory I can build the list of files to point my DSNs at.

Get-ChildItem -Path C:\users\andy\Dropbox\gsak8\data -Recurse -Filter sqlite.db3 | Select-Object -ExpandProperty FullName | ForEach-Object {
 $DSNName = $_.split("\")[6];
 Add-OdbcDsn -Name $DSNName -DriverName "SQLite3 ODBC Driver" -Platform 64-bit -DsnType System -SetPropertyValue "Database=$_";
};
Get-OdbcDsn -DriverName "SQLite3 ODBC Driver";

In a few seconds, the DSNs are created.

Name               DsnType Platform DriverName          Attribute                                   
----               ------- -------- ----------          ---------                                   
GSAKMain           System  64-bit   SQLite3 ODBC Driver {Database, Description, NoTXN, LongNames...}
Far-off puzzles    System  64-bit   SQLite3 ODBC Driver {Database}                                  
Home200            System  64-bit   SQLite3 ODBC Driver {Database}                                  
My Finds           System  64-bit   SQLite3 ODBC Driver {Database}                                  
My Hides           System  64-bit   SQLite3 ODBC Driver {Database}                                  
New England        System  64-bit   SQLite3 ODBC Driver {Database}                                  
Niagara Falls      System  64-bit   SQLite3 ODBC Driver {Database}                                  
NJ                 System  64-bit   SQLite3 ODBC Driver {Database}                                  
Seattle            System  64-bit   SQLite3 ODBC Driver {Database}

Next up is creating the linked servers in SQL Server. I created one with Management Studio using all the defaults, then scripted it to see what I need to do. The only parts I really need are sp_addlinkedserver and sp_addlinkedsrvlogin; the defaults for the other options are good enough for what I’m doing here (this may not be true for you, so be sure to check!).

EXEC master.dbo.sp_addlinkedserver @server = N'GSAKMAIN', @srvproduct=N'GSAKMain', @provider=N'MSDASQL', @datasrc=N'GSAKMain'
 /* For security reasons the linked server remote logins password is changed with ######## */
EXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname=N'GSAKMAIN',@useself=N'False',@locallogin=NULL,@rmtuser=NULL,@rmtpassword=NULL

Now I can put this into a PowerShell loop and run it for all of my other DSNs.

$AllDSNs = Get-OdbcDsn -DriverName "SQLite3 ODBC Driver";
foreach ($DSN in $AllDSNs) {
    $CreateLinkedServerSP =@"
EXEC master.dbo.sp_addlinkedserver @server = N'$($DSN.Name)', @srvproduct=N'$($DSN.Name)', @provider=N'MSDASQL', @datasrc=N'$($DSN.Name)';
EXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname=N'$($DSN.Name)',@useself=N'False',@locallogin=NULL,@rmtuser=NULL,@rmtpassword=NULL;
"@;
    invoke-sqlcmd -query $CreateLinkedServerSP -serverinstance sql2014 -database master;
}

I let this run and when it’s finished, all my DSNs are linked servers ready to be queried.

LinkedServers

Because I’m going to be querying all of these linked servers together, I wrote some additional code to give me a skeleton query performing a UNION across all of my linked servers which I can use as a starting point.

Here’s the final script:

#require -version 3.0
#requires -module sqlps
set-strictmode -Version latest;
set-location c:;

Get-ChildItem -Path C:\users\andy\Dropbox\gsak8\data -Recurse -Filter sqlite.db3|Select-Object -ExpandProperty FullName | ForEach-Object {
	$DSNName = $_.split("\")[6];
	Add-OdbcDsn -Name $DSNName -DriverName "SQLite3 ODBC Driver" -Platform 64-bit -DsnType System -SetPropertyValue "Database=$_";
};

$AllDSNs = Get-OdbcDsn -DriverName "SQLite3 ODBC Driver";
foreach ($DSN in $AllDSNs) {
    $CreateLinkedServerSP =@"
EXEC master.dbo.sp_addlinkedserver @server = N'$($DSN.Name)', @srvproduct=N'$($DSN.Name)', @provider=N'MSDASQL', @datasrc=N'$($DSN.Name)';
EXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname=N'$($DSN.Name)',@useself=N'False',@locallogin=NULL,@rmtuser=NULL,@rmtpassword=NULL;
"@;
    invoke-sqlcmd -query $CreateLinkedServerSP -serverinstance sql2014 -database master;
}

$AllDSNs = Get-OdbcDsn -DriverName "SQLite3 ODBC Driver";
$AllDatabasesUnion = "";
foreach ($DSN in $AllDSNs) {
    $AllDatabasesUnion += "SELECT * FROM OPENQUERY([$($DSN.Name)], 'select * from caches') UNION ALL`n";
}
$AllDatabasesUnion = $AllDatabasesUnion.Substring(0,$AllDatabasesUnion.Length - 10);
Write-Output $AllDatabasesUnion;

And the query that it generated for me:

SELECT * FROM OPENQUERY([GSAKMain], 'select * from caches') UNION ALL
SELECT * FROM OPENQUERY([Far-off puzzles], 'select * from caches') UNION ALL
SELECT * FROM OPENQUERY([Home200], 'select * from caches') UNION ALL
SELECT * FROM OPENQUERY([My Finds], 'select * from caches') UNION ALL
SELECT * FROM OPENQUERY([My Hides], 'select * from caches') UNION ALL
SELECT * FROM OPENQUERY([New England], 'select * from caches') UNION ALL
SELECT * FROM OPENQUERY([Niagara Falls], 'select * from caches') UNION ALL
SELECT * FROM OPENQUERY([NJ], 'select * from caches') UNION ALL
SELECT * FROM OPENQUERY([Seattle], 'select * from caches')

With a little exploration of the PowerShell OdbcDsn cmdlets, I’ve eliminated a tedious process and prevented any accidental mouse clicks in a GUI.

PASS Summit: Things to Do, People to See

PASS Summit is nearly upon us. I’m excited to be attending my second Summit in Seattle and cannot wait to get there to see everyone. With one Summit and a few SQL Saturdays under my belt I’ve got a laundry list of things and people I can’t miss, and very little time to pack it all into.

Let’s Meet!

The greatest part of Summit (and SQL Saturday) for me is meeting people and exchanging ideas. If you haven’t experienced it, #SQLFamily is amazing. When I reached the convention center two years ago, the first feeling that hit me was “I finally found my people!” We’re all friendly, I swear. Just say “hi, I’m <your name here>.”  I guarantee you will find people who are into the same stuff you’re into, and I’m not talking just talking about SQL Server. Music, dance, outdoor activities, all kinds of stuff. We have a common thing that brought us together, but that’s not what keeps us together. It is an amazing community and it just keeps getting better. On Sunday, as you’re decompressing from the event and travel, you will miss these people who you didn’t even know a week before.

You can even connect strangers with common interests. In 2012, I met someone over a power outlet who asked if I’d done anything with a particular piece of hardware and what I thought of it. Turns out that I hadn’t, but I knew that a former co-worker was also in attendance and he had used the hardware, so I gave them each others’ contact information.

Ping me on Twitter, find me at one of the places/events listed below, breakfast or lunch in the dining hall, or if you think you see me passing in the hall (picture on my Twitter profile), say something (and if it’s not me, you’ll meet someone else, which is still awesome). Maybe even dinner at the airport on Friday evening.

Get on Twitter

So many things happen at Summit which are announced and/or organized via Twitter. The main hashtag to follow is (I think) #summit14 but once you hit the ground you’ll start figuring out who and what to follow to get all the dirt.

Schedule

Tuesday

I’m arriving in Seattle late Tuesday morning and doing some sightseeing before checking into the hotel and Summit late in the afternoon. Then it’s off to the welcome reception. The first of several visits to Tap House Grill may be in order too.

Wednesday

Wednesday starts dark & early with #SQLRun at 6 AM. I had a great time getting a 5K in before dawn at my first Summit and I’m sure this one will be great too. Don’t forget to bring the right gear; it’s pre-dawn and right now the forecast is for 50°F and rain (in Seattle. Go figure).

Aside from the sessions throughout the day, I’ll probably be found in the Community Zone. I’ll also be serving as an Ambassador helping to direct people to the dining hall for lunch, posted outside room 4C so stop by and say hi.

Wednesday evening, I’m hosting a dinner for geocachers at the Daily Grill at 6:15 PM. If you’re a cacher, or just curious about it, stop by!

Once we’ve wrapped up there, I’ll go wherever the wind may take me; probably back to the Tap House.

Thursday

Thursday is my light day at Summit. I don’t have any sessions double-booked and the only event I really need to catch is the Argenis Without Borders folks in their fuzzy rainbow leggings.

Thursday evening I’ll be at the Ray Gun Lounge for Table Top Game Night. I’m looking forward to getting to know folks there and learn some new games. We don’t play a lot of table top games at home and I’d like to change that.

Friday

Lots more sessions on Friday, plus winding everything down. By the afternoon, I’ll probably be beat and just trying to rest at the Community Zone.

I fly out late Friday night, so I’ll be trying to find dinner somewhere between the convention center and airport. I’ll probably kill a lot of time in the terminal by wandering around, playing Ingress.

Packing List

At my first Summit, I learned a few lessons about what to take and what not to take. The most important thing to bring: empty space for all the stuff you’ll bring home. SWAG from the exhibitors, souvenirs, books and more. Next most important: power! Electrical outlets are few and far between, and there will be 5000 people vying for them to top off their phones and tablets. A quick rundown of some of the stuff that might not be obvious to bring (or easily forgotten) that I’m packing:

  • Small (1 pint) widemouth water bottle. I’m partial to this Nalgene bottle I got at a 5K earlier this year.
  • NUUN electrolyte tabs. Water gets boring after a while. These will help you stave off SQLPlague (don’t forget your vitamins too!).
  • Comfortable shoes. You’ll be on your feet a lot and walking even more; the convention center is big. Not to mention the evening activities.
  • A small notepad for taking non-session notes – phone numbers, names, etc. I love my Field Notes notebook.
  • A larger notepad for taking notes in sessions. Oh, and don’t forget a pen or three. I’ve tried doing notes on a tablet and on a computer, and it just doesn’t work as well as paper & pen for me. Bonus: no batteries!
  • Hand sanitizer. Because when you have 5000 people in one place, germs get around in a hurry no matter how careful you are.
  • good wall charger for your devices. I found myself short chargers last time and had to buy one at Radio Shack. It didn’t cut it. This one has two USB ports that charge at 2.1A, which will give you a good boost when you get near a plug, and you can share with a friend. It’ll also recharge pretty much anything while you sleep. Best of all, it’s really compact.
  • A good external battery pack. Matt Slocum (b | t) got me hooked on the Anker E5 15000 mAH battery. 2 ports so you can share with a friend and it’ll recharge most phones 4-5 times from a completely empty battery.
  • Plenty of USB cords to go with both of the above.
  • Business cards! I ordered mine at the office too late last time and had to get some made at Staples in a pinch.
  • A small, light backpack to carry all of this in (well, not the shoes). Session rooms get cramped, so carrying a big pack can be a pain.
  • A lock code on my phone and tablet. I normally don’t use one but at any large gathering like this, it’s better to be safe.
  • A list of the people I need to see/find/meet/reconnect with.

This Summit is going to be a blast. I cannot wait. There’s only two things I don’t look forward to:

  1. Having to sleep (I’ll miss stuff!)
  2. It’ll eventually end

Next Tuesday cannot come soon enough.