My First Migration with dbatools

I’ve been a proponent of dbatools for close to a year now and even contributed to the project, but surprisingly haven’t been a heavy user of it. Mostly due to a lack of opportunity. I’m aware of many of the functions by virtue of working on the built-in documentation and following the project and presentations about it.

So when the need arose to move a development/test instance of SQL Server from a VM onto a physical server, I knew exactly what I wanted to do. I was warned that the contents of this instance had been moved once before and it resulted in over a week of work and a bunch of trouble. I can’t speculate on why this was as I wasn’t there to see it, but I wasn’t going to let that happen on my watch. So, with equal parts hubris and stubbornness (and a dash of naïveté), I dove in. We have the technology. We will migrate this thing.

The advertising for Start-DbaMigration makes it look so easy. Source, destination, your method of moving the data, and you’re done. Right? Well, sure – in a small, controlled sandbox. This one was neither. About 150 databases. Two dozen Agent jobs. User account cleanup. Different drive letters and sizes. And when it was all over, the server name, instance name, and IP of the new box had to match the old one so that we didn’t disrupt production or the developers.

Of course we’re going to rehearse this. But with the destination being a physical machine, I didn’t have the luxury of rolling back a snapshot each time, or restarting from a golden image. And I couldn’t do everything because it wasn’t an isolated environment – I couldn’t test all the Agent jobs (don’t want emails going out in error) and couldn’t reconfigure the IP or server name. Which meant that my script had to clean up any artifacts from previous runs before doing the migration. Each time.

I also wanted to bring the new instance up in a controlled fashion as opposed to just moving everything and letting it go, so that I could check things out before letting them break. I also had to work in checkpoints so the network/server admin could do his pieces. Which meant that after the migration, everything on the old server had to be stopped, and Agent jobs on the new one disabled (but with a record of what was enabled/disabled on the source, so I could replicate it).

I rehearsed what I could about a half-dozen times. Each time through took about 4 hours (having multiple tests helps build confidence in your elapsed time estimates), primarily because of the amount of data that had to be moved (about 700GB). Each time, I found another tweak needed. Maybe not entirely necessary, but I was out to prove something. I didn’t want this migration to be “good enough, a little rough around the edges” – this had to work right, right away.

This is truly standing on the shoulders of giants. Without the thousands of person-hours put in by Chrissy and the rest of the team, a short script like this to do a mountain of work simply is not possible. It’s not just having the huge amount of code to build on – it’s the suite of tests they run with every pull request that tells me that I can trust it’ll work right.

Looking back on it, there’s definitely a few things I’d change in this script, and more dbatools functions I could have used. But after successfully testing a couple times, I didn’t want to break what was working.

When the migration was complete, I did a brief checkout and then gave my server admin the green light. He flipped names & IPs around, and then I ran Repair-DbaServerName which I had just discovered a few days earlier. I was expecting to do it manually but I trust the dbatools crew and their test suite more than myself on this one as I’ve never done this before. When that was complete, I had a grand total of three issues (that I could find):

  • Database owners weren’t set appropriately. I was able to resolve this via Set-DbaDatabaseOwner easily enough.
  • Outgoing dbmail didn’t work. Turns out the SMTP relay on the new server wasn’t started. Easy fix.
  • I had a Linked Server on my production instance which was unable to communicate to the new test server. This took me the longest to figure out. We checked everything – SQL Server Configuration Manager, the network itself, and then finally my colleague suggested testing something outside SQL Server – mapping a drive from production to test. This last test succeeded, which pointed us at the SQL Server connection specifically. The root cause: I had two firewall rules on the new server that blocked connections from all but servers on the local subnet. The production server isn’t on the local subnet.

None of these are total showstoppers. I had workarounds (or quick solutions) for them and as this is a test instance we could live with minor inconvenience for a day or two. One or two final tests, and I was satisfied that everything was working properly so I went ahead and enabled my Agent jobs. Some of them still have incorrect owners but I can fix that later – they were wrong on the source instance too.

I consider this migration a huge success. We had 95% functionality by 9am. By 3pm, the last real problems were resolved (and only that late due to a series of meetings keeping me away from my desk). Most importantly, it was achieved with minimal downtime for the development and QA teams. I’m now one week post-migration and everything is still running smoothly on the new instance.

Advertisements

dbatools Badge Ribbons at PASS Summit

One of the (many) fun things to do at PASS Summit is to check out the ribbons people have attached to their badges. Some are witty or goofy, others informational, others technical, and still more that let you express how you identify with a community within the community.

To celebrate dbatools and the awesome team & community around it, two limited edition badges will be available from/distributed by me and a handful of other folks all week at Summit. Check ’em out:

Be on the lookout for these badges and talk to us about dbatools! What you like, what you’d like to see changed, new feature ideas, questions about how to use functions, anything at all. Even if you’ve never used dbatools, we love talking about it and showing people the awesome things they can do with it so please, introduce yourself!

T-SQL Tuesday #92: Lessons Learned the Hard Way

tsql2sday-300x300This month’s T-SQL Tuesday is hosted by Raul Gonzalez and he’s asked everyone to share things we might be a bit embarrassed about:

For this month, I want you peers to write about those important lessons that you learned the hard way, for instance something you did and put your systems down or maybe something you didn’t do and took your systems down. It can be also a bad decision you or someone else took back in the day and you’re still paying for it…

  • In the stress/performance testing portion of an upgrade of a critical system, we were short on disk space. So, rather than having a separate set of VMs for the performance testing (as we needed to be able to get back to functional testing quickly), we decided to just take VM snapshots of all the servers. Testing was delayed a day or two – but we didn’t switch off the snapshots. Then we started testing and performance was terrific…for about five minutes. Then everything came to a screeching halt. Panicked, we thought we were going to need a pile of new hardware until the VMWare admin realized that our disks were getting hammered and we still had those active snapshots.
    Lesson learned: If you take VM-level snapshots of your database server and let them “soak” for an extended period, you’re gonna have a bad time. Unless you need to take a snapshot of the host OS or instance configuration itself, use a database snapshot instead of a VM-level snapshot.

  • A couple of times, I’ve had under-performing VMs running SQL Server. As I hadn’t been involved in the configuration, I thought everything had been provisioned properly. Turns out…not so much. Memory reservations, storage configuration, power profiles, all set up for suboptimal performance.
    Lesson learned: Ask your VMWare admin if they’ve perused the best practices guide and review things yourself before going down the rabbit hole of SQL Server configuration & query tuning. If the underlying systems aren’t configured well, you’ll spin your wheels for a long time.

  • In doing a configuration review of a rather large (production) instance, I noted that at least one configuration option was still set to the default value – Cost Threshold for Parallelism was stuck at 5. Running sp_BlitzCache, I found that I had quite a few simple queries going parallel and huge CXPACKET waits. CXPACKET isn’t bad per se, but if you’ve got a low-cost query that’s going parallel and waiting on threads where it could be running faster overall single-threaded (verified this was the case for several of the top offenders), increasing the cost threshold can help. I did some checking, verified that it was a configuration change I could make on the fly, and set the value to 50.
    And then everything. Slowed. Down.
    When I made this configuration change on the test instance, it wasn’t much a problem. But that was a much smaller instance, with much less traffic. What I failed to fully comprehend was the impact of this operation. I overlooked that changing this setting (and a number of others I wasn’t aware of) blows out the plan cache. In the case of this instance, about 26Gb of plan cache. Not only was performance impacted while the plan cache was re-filled, we took a hit while all the old plans were being evicted from cache.
    Lesson learned: Even if it seemed OK in test, those “low impact” changes can have a much larger impact on production unless you can make test mirror production in every way. So plan when you make these changes accordingly.

We learn the most from our mistakes. We can learn almost as much from the mistakes of others. Learn from mine.

Don’t Forget the Network

A few weeks ago I was looking at a query and got tripped up by the network and my own forgetfulness. It was a pretty simple query with a simple-looking execution plan. It didn’t even do that much work. About 20K logical reads and returned 200K records. For a server as large as the one I was working with, this should have been nothing. Instead, was waiting three minutes to get my results. My first thought was “aha, this query must need tuning, maybe a new index.”

I started experimenting in a non-production copy of the database on the instance. Added a couple indexes, adjusted the query itself, even a few query/index hints just to see what worked. Each change definitely had an impact on the query. The execution plan shape changed a bit, and the costs shifted around between a few operators. But still I was waiting minutes for the complete results.

The Plot Thickens

I tried my query against the development server. Production has 24X as much RAM, 3X as many CPU cores, and faster disks when compared to the dev box, so I expected even worse performance. Nope. Near-instant results, even for the original query with no additional indexes, hints or any other trickery.

OK, something’s definitely up now. I hopped back to production and while running my query, I ran sp_BlitzFirst to peek at what was going on. Lo and behold, the top wait stat was ASYNC_NETWORK_IO. I’ve seen lots of this in the past, attributed to poorly-written applications that just couldn’t consume the data as fast as SQL Server could churn it out. But this is SQL Server Management Studio, it’s absorbing the data as fast as it can receive it.

And that’s when it hit me. SSMS was consuming the data as quickly as it received it. The development server is in the office. The production server is in a datacenter 1/3 of a continent away and the connection between that datacenter and the office sometimes gets bogged down. The problem wasn’t that SSMS couldn’t drink from the firehose – the firehose had a clamp around it!

To test this idea, I made one last change to the query – I selected the data into a temp table instead of sending the results to SSMS. Round-trip time: two seconds. Conclusion: the network itself was causing the slowdown and this query needs no tuning!

As it turns out, SSMS has a way to discard query results so that you don’t have to resort to dumping results to a temp table – Discard query results after execution. It’s similar to a /dev/null for result sets. If you only need an actual execution plan but no query results (like I did here), check it out.

I Finally Get Cross Apply!

For years I’ve looked at various queries online in sample code, diagnostic queries using DMVs, and the like and seen CROSS APPLY in the FROM clauses. But I’ve never really managed to comprehend what it was for or how it worked because I never saw a case where it was directly applied to something I was doing.

Finally, this week I had a breakthrough. I was working on updating a bunch of data but it was breaking on a small subset of that data. In this case, I was attempting to JOIN two tables on fields that should have been INTs, but in a very small number of cases one side was using a comma-delimited string. The user told me that someone else had done these updates in the past and didn’t encounter the problem I was having (so I knew that it was something i was doing “wrong”), but given that it was only a handful of broken updates she was OK with manually doing the updates (we were scripting it because we were updating potentially tens of thousands of records).

I am not OK with manually fixing this in the future. I wanted to know how the other DBA had done it before. I dug into some history and found CROSS APPLY. My nemesis. I was determined to figure out how to use it this time.

Setting the Stage

Let’s set up three simple tables to keep track of airports and what state each airport is in. But our developer doesn’t totally get database design and in his state-to-airport mapping table, he allows for a comma-separated list of airports associated with each state.

    CREATE TABLE #States
    ([Id]      INT IDENTITY(1, 1),
    StateName NVARCHAR(30) NOT NULL
    );
    CREATE TABLE #Airports
    ([Id]     INT IDENTITY(1, 1),
    IATACode CHAR(3) NOT NULL
    );
    CREATE TABLE #StateAirports
    (StateId  INT PRIMARY KEY NOT NULL,
    Airports NVARCHAR(50)
    )

This makes getting a list of airports and their associated state names tricky at best if we don’t know about CROSS APPLY. With CROSS APPLY, it’s pretty straightforward.

Solution

Here’s the finished query.

    SELECT s.statename,
        a.iatacode
    FROM #StateAirports SA1
        CROSS APPLY string_split(SA1.airports, ',') AS SA2
        JOIN #Airports A ON A.Id = SA2.value
        JOIN #states S ON S.Id = SA1.stateid

string_split() is a Table Valued Function which we finally got in SQL Server 2016 after far too many years of having to write (or, let’s face it, copy from someone’s blog post) inefficient string splitting functions. Important note: even if your database engine is SQL Server 2016, the database you’re operating in must be at CompatibilityLevel 130

Breaking it down

If we take CROSS APPLY and break it down into its parts, it finally starts to make sense.
* APPLY the string_split() function to the Airports field of the #StateAirports table
* Append the each row of string_split()‘s output to the corresponding row on #StateAirports (similar to a CROSS JOIN but not exactly)

So now I have N rows for each StateId in #StateAirports, where N is the number of values in the comma-separated field. And JOINed to each row is one of the rows from the output of string_split().

    SELECT *
    FROM #StateAirports SA1
        CROSS APPLY string_split(SA1.airports, ',') AS SA2

From there, the query is pretty normal otherwise, JOINing to the other two tables to translate the state & airport ID numbers to their text values.

Hopefully this helps others get a handle on CROSS APPLY and find useful places for it. This had been a head-scratched for me for years, but only because I didn’t have an example that clearly broke down how to use it and what was going on. In hindsight, I probably could have used it in some analysis I did at a previous job but instead resorted to parsing & processing comma-separated data in a PowerShell script.

Adding Application Name to Invoke-SqlCmd2

In a previous post, I expressed some frustration over Invoke-SqlCmd not setting an Application Name for its ODBC connection, leaving us with the generic .NET SqlClient Library when looking at active sessions in sp_who2 and sp_whoisactive (and any other monitoring tool). Unfortunately, I can’t really do anything about Invoke-SqlCmd aside from posting a suggestion on Connect or the Client Tools Trello board, but Invoke-SqlCmd2 has the same issue and that’s on GitHub. So, here we go!

In its current form, if a SqlConnection object isn’t passed into Invoke-SqlCmd2, the cmdlet does the following:

if ($Credential)
{
$ConnectionString = "Server={0};Database={1};User ID={2};Password=`"{3}`";Trusted_Connection=False;Connect Timeout={4};Encrypt={5}" -f $SQLInstance,$Database,$Credential.UserName,$Credential.GetNetworkCredential().Password,$ConnectionTimeout,$Encrypt
}
else
{
$ConnectionString = "Server={0};Database={1};Integrated Security=True;Connect Timeout={2};Encrypt={3}" -f $SQLInstance,$Database,$ConnectionTimeout,$Encrypt
}
$conn = New-Object System.Data.SqlClient.SQLConnection
$conn.ConnectionString = $ConnectionString
Write-Debug "ConnectionString $ConnectionString"

I decided to change this around so that it no longer uses string formatting, but instead a SqlConnectionStringBuilder. I had a couple reasons for this:

  • It will eliminate redundant code. There are several common elements in each of the ConnectionStrings above. If more complex logic is needed, there are potentially more copies of this ConnectionString kicking around.
  • It’s prone to copy/paste and other editing errors. If there’s a change that affects both versions of the ConnectionString and the developer just copies the line from one branch of the if statement to the other, code will be lost or invalid values will be substituted because of positioning.

With this in mind, I factored the common elements out to build the base of the ConnectionString, then added the remaining elements conditionally based on the cmdlet inputs.

$CSBuilder = New-Object -TypeName System.Data.SqlClient.SqlConnectionStringBuilder
$CSBuilder["Server"] = $SQLInstance
$CSBuilder["Database"] = $Database
$CSBuilder["Connection Timeout"] = $ConnectionTimeout
if ($Encrypt) {
$CSBuilder["Encrypt"] = $true
}
if ($Credential) {
$CSBuilder["Trusted_Connection"] = $false
$CSBuilder["User ID"] = $Credential.UserName
$CSBuilder["Password"] = $Credential.GetNetworkCredential().Password
} else {
$CSBuilder["Integrated Security"] = $true
}

Before going any further in adding support for inserting Application Name into the ConnectionString, I had to add a parameter to the cmdlet itself.

[Parameter( Position=11, Mandatory=$false )]
[Alias( 'Application', 'AppName' )]
[String]
$ApplicationName

With that complete, I can now add it into the SqlConnectionStringBuilder.

if ($ApplicationName) {
$CSBuilder["Application Name"] = $ApplicationName
} else {
$ScriptName = (Get-PSCallStack)[-1].Command.ToString()
if ($ScriptName -ne "") {
$CSBuilder["Application Name"] = $ScriptName
}
}

Because ApplicationName is an optional parameter, I had to account for cases where it’s not specified by the caller. What the code above is doing is looking at the entire call stack and going back up to the very top to get the name of the script file that was run and ultimately called Invoke-SqlCmd2. Finally, I extract the ConnectionString from the SqlConnectionStringBuilder and assign it to the SqlConnection‘s ConnectionString property.

$ConnectionString = $CSBuilder.ToString()
$conn.ConnectionString = $ConnectionString
Write-Debug "ConnectionString $ConnectionString"

My code changes complete, I reviewed my work and sent my first real pull request off to Warren (B|T) as PR #7 for the module.

Make Your Application’s Name Heard

Odds are, you’ve got more than one application or script accessing your database or SQL Server instance at any given time. You’re probably stacking them on a small number of servers in an attempt to cut down on licensing costs and resource usage. All those Powershell scripts running on the central job server are running under a single service account, and you’ve got a lazy vendor who set up both the website and back-end application server to run under the same account, maybe even on a single app/web server.

Then the call comes in. “The database is slow.” Before firing up your monitoring suite, you take a quick pass with sp_who2 or sp_whoisactive and you’re greeted with a dozen sessions, all sporting a program_name of .Net SqlClient Data Provider. Terrific! Which session is coming from which application and causing all the trouble? Not so easy to figure out.

Fortunately, the .NET SqlClient (and other ODBC drivers as well) has a built-in solution. Your application’s connection string has quite a few parameters available to provide configuration and information, and one that seems to get overlooked is Application Name. This one does exactly what it says on the tin – it lets you specify a name that will be displayed to anyone looking for it in SQL Server, including sp_whoisactive. Anyplace you have the ability to write a connection string, you can use this. It costs you nothing!

  • If you’re using System.Data.SqlClient.ConnectionStringBuilder, it’s just another item in the properties collection.
  • If you’re constructing your connection string as a regular string, just add Application Name=Andy's Awesome App; to the end of your current connection string.

$DBConnection = New-Object System.Data.SqlClient.SqlConnection;
$DBCSBuilder = New-Object System.Data.SqlClient.SqlConnectionStringBuilder;
$DBCSBuilder['Data Source'] = ".\sql2016";
$DBCSBuilder['Initial Catalog'] = "WideWorldImporters";
$DBCSBuilder['Application Name'] = "Andy's Awesome Application";
$DBCSBuilder['Integrated Security'] = "true";
$DBConnection.ConnectionString = $DBCSBuilder.ToString();

#alternative method:
#$DBConnection.ConnectionString = "Data Source=.\sql2016;Initial Catalog=WideWorldImporters;Integrated Security=true;Application name=Andy's Awesome Application;"

$DBConnection.Open();
$QueryCmd = $DBConnection.CreateCommand();
$QueryCmd.CommandText = "select c.CityName from application.cities c join application.stateprovinces s on c.StateProvinceID = s.stateprovinceid join application.countries ct on ct.CountryID = s.CountryID where ct.CountryName = 'United States' and s.StateProvinceName = 'New York'; waitfor delay '00:00:15';";
$SqlAdapter = New-Object System.Data.SqlClient.SqlDataAdapter;
$QueryCmd.Connection = $DBConnection;
$SqlAdapter.SelectCommand = $QueryCmd;
$DataSet = New-Object System.Data.DataSet;
$SqlAdapter.Fill($DataSet);
$DataSet.Tables[0];


import-module sqlserver;
Invoke-Sqlcmd -ServerInstance .\sql2016 -Database wideworldimporters -Query "select c.CityName from application.cities c join application.stateprovinces s on c.StateProvinceID = s.stateprovinceid join application.countries ct on ct.CountryID = s.CountryID where ct.CountryName = 'United States' and s.StateProvinceName = 'New York'; waitfor delay '00:00:15';"
remove-module sqlserver;

Here’s the difference between using SqlClient and Invoke-SQLCmd in Powershell when seen from the SQL Server side:

invoke-sqlcmd-app-name

While the .NET Framework lets you specify this fairly easily, as we move up in abstraction layers these options become buried or entirely unavailable. This is the case with the Powershell Invoke-SqlCmd cmdlet. One of the many benefits of using Invoke-SqlCmd2 is that it does construct its own connection string (if you don’t pass in a SqlConnection object) but it doesn’t expose Application Name as an available parameter for it.

In an upcoming post, I’ll show how I added Application Name support to Invoke-SqlCmd2.

Name Your Defaults So SQL Server Doesn’t

Something in SQL Server that isn’t always obvious to beginners is that when you create a default value for a column on a table, SQL Server creates a constraint (much like a primary or foreign key). All constraints must have a name, and if one isn’t specified SQL Server will generate one for you. For example:

CREATE TABLE [dbo].[point_types] (
[typeid] [int] NOT NULL DEFAULT(NEXT VALUE FOR [pointtypeid])
,[typename] [nvarchar](30) NOT NULL DEFAULT 'Unspecified'
,CONSTRAINT [PK_PointType] PRIMARY KEY CLUSTERED ([typeid] ASC)
)
GO

We’ve got a simple table here and both fields have a default value set (the primary key’s value is generated from a sequence object, pointtypeid). Time goes on, and a change in policy comes up which requires that I change the default value of typename to Unknown. To do this, I have to drop the constraint and re-create it. To find the name of the constraint, I can either ask sp_help, or run this query:

SELECT all_columns.NAME
,default_constraints.NAME
,default_constraints.DEFINITION
FROM sys.all_columns
INNER JOIN sys.tables
ON all_columns.object_id = tables.object_id
INNER JOIN sys.schemas
ON tables.schema_id = schemas.schema_id
INNER JOIN sys.default_constraints
ON all_columns.default_object_id = default_constraints.object_id
WHERE schemas.NAME = 'dbo'
AND tables.NAME = 'point_types';

I’ve got my constraint name now, so I can drop it & re-create it

NameDefaults01

ALTER TABLE [dbo].[point_types]

DROP CONSTRAINT DF__point_typ__typen__21B6055D;
GO

ALTER TABLE [dbo].[point_types] ADD DEFAULT('Unknown')
FOR [typename];
GO

And if I re-run the above query, I can see that the constraint’s name is different.

NameDefaults02

This means that everywhere I need to change this constraint (development, test and production), I’ll need to figure out the constraint name in that particular database and drop it before re-creating it. This makes a deployment script a bit messier, as it needs more code to find those constraint names

DECLARE @constraintname VARCHAR(50);

SELECT @constraintname = default_constraints.NAME
FROM sys.all_columns
INNER JOIN sys.tables
ON all_columns.object_id = tables.object_id
INNER JOIN sys.schemas
ON tables.schema_id = schemas.schema_id
INNER JOIN sys.default_constraints
ON all_columns.default_object_id = default_constraints.object_id
WHERE schemas.NAME = 'dbo'
AND tables.NAME = 'point_types'

DECLARE @sql NVARCHAR(200) = N'alter table [dbo].[point_types] drop constraint ' + @constraintname;

PRINT @sql;

NameDefaults03

EXECUTE sp_executesql @sql;

ALTER TABLE [dbo].[point_types] ADD DEFAULT('Unknown')
FOR [typename];
GO

But this doesn’t really solve my problem, it just works around it. It’s still messy and fragile. If I need to do other operations on the default constraint, I need to go through the same exercise to find its name.

Fortunately, SQL Server lets us name default constraints just like any other constraint, and by doing so we avoid this trouble. By setting my own name for the constraint, I know what it’ll be in every database, without having to query system tables. The name can be set in both the CREATE TABLE statement and an independent ALTER TABLE.

CREATE TABLE [dbo].[point_types] (
[typeid] [int] NOT NULL DEFAULT(NEXT VALUE FOR [pointtypeid])
,[typename] [nvarchar](30) NOT NULL CONSTRAINT [DF_PT_TypeName] DEFAULT 'Unspecified'
,CONSTRAINT [PK_PointType] PRIMARY KEY CLUSTERED ([typeid] ASC)
);
GO

ALTER TABLE [dbo].[point_types]

DROP CONSTRAINT [DF_PT_TypeName];
GO

ALTER TABLE [dbo].[point_types] ADD CONSTRAINT [DF_PT_TypeName] DEFAULT('Unknown')
FOR [typename];
GO

I can also combine these in the next deployment that requires a change to the default constraint, dropping the system-generated name and establishing my own static name to make things simpler in the future.

Is explicitly naming default (or any other) constraints necessary? No, but doing so helps your database document itself, and it makes future deployment/promotion scripts simpler and less prone to breakage. SQL Server needs a name for the constraint regardless; it’s worth specifying it yourself.

Don’t Trust the Wizard

The one wizard you can trust
The one wizard you can trust

If you need to move data from one table into a new table, or even tables in a database into another database, the Import/Export Wizard in SQL Server Management Studio looks pretty tempting. Set up a source & destination, click a few buttons, kick back with a cup of tea and watch the progress bars, right?

It turns out that the wizard just isn’t as smart as it may seem. If you’re not careful, you won’t get what you’re expecting. Let’s check it out.

We’ll start by creating a real simple table in a database, containing a primary key and a computed column.

Create table sample.dbo.SourceTable (
RowId int identity(1,1) not null primary key,
Num1 int not null,
Num2 int not null,
Total as (Num1+Num2)
);

Let’s populate it with a few rows of data, then update some of that data to make sure the computed column is working. Remember, this is just to demonstrate the idea.

insert into sample.dbo.SourceTable (Num1, Num2) values (1,2);
go 100
select top 5 * from sample.dbo.SourceTable order by RowId;
update sample.dbo.SourceTable set Num1 = Num1 * RowId where RowId <= 3;
select top 5 * from sample.dbo.SourceTable order by RowId;

wizard_Screenshot 1

Great! We’ve got data, the computed columns are working, let’s copy it over to a new table in another database. We’ll just going to click Next, Next, Next through the wizard this time around.

wizard_Screenshot 2
wizard_Screenshot 3
wizard_Screenshot 4
wizard_Screenshot 5
Success! Our table has been copied and the data’s all there.

select top 5 * from Sample2.dbo.SourceTable order by RowId;

wizard_Screenshot 6

Let’s do some work on our new table and check out the results.

select top 5 * from Sample2.dbo.SourceTable order by RowId;
update Sample2.dbo.SourceTable set Num2 = Num2 * RowId where RowId < 3;
select top 5 * from Sample2.dbo.SourceTable order by RowId;

wizard_Screenshot 7

Woah! That’s not right. That Total column is supposed to be Num1 + Num2, and last time I checked 2 + 4 was not 4. Let’s keep going and try adding a new record the same way it was done earlier.

insert into Sample2.dbo.SourceTable (Num1, Num2) values (100,200);

Cannot insert the value NULL into column 'RowId', table 'Sample2.dbo.SourceTable'; column does not allow nulls. INSERT fails.

Huh. Now that’s really odd, isn’t it? RowId is supposed to be an identity – we shouldn’t have to populate it. What is going on here? Let’s script out the table.

USE [Sample2]
GO

/****** Object: Table [dbo].[SourceTable] Script Date: 2015-11-10 22:36:23 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[SourceTable](
[RowId] [int] NOT NULL,
[Num1] [int] NOT NULL,
[Num2] [int] NOT NULL,
[Total] [int] NULL
) ON [PRIMARY]

GO

This is all kinds of wrong! What happened to the primary key? Or the computed column? Well, it turns out that the wizard isn’t that smart, and if you just take all the default values, you’re going to get burned. Let’s go back to the wizard and click that Edit Mappings button in the Select Source Tables and Views screen.
wizard_Screenshot 8

Well…that looks like what we got above. And it’s not what we wanted. If we click Edit SQL, this is confirmed – the table being created is not defined the same way the source table is being defined.
wizard_Screenshot 9
Fortunately, we can edit the SQL here and make it match the source table definition, then finish the wizard.
wizard_Screenshot 10
OK, data’s copied – what do we have?

select top 5 * from Sample3.dbo.SourceTable order by RowId;
update Sample3.dbo.SourceTable set Num2 = Num2 * RowId where RowId < 3;
select top 5 * from Sample3.dbo.SourceTable order by RowId;
insert into Sample3.dbo.SourceTable (Num1, Num2) values (100,200);
select * from sample3.dbo.SourceTable where rowid >= 100 order by RowId;

wizard_Screenshot 11

Everything’s there, and it’s working the way it’s supposed to. Lesson learned: don’t blindly trust the defaults, especially the ones in a wizard. Double-check everything, and then verify that your schema works the way you expect it to before doing any business with it.

Padding Fields for Fixed-Position Data Formats

Fixed-position data formats will seemingly be with us forever. Despite the relative ease of parsing CSV (or other delimited formats), or even XML, many data exchanges require a fixed-position input. Characters 1-10 are X, characters 11-15 are Y and if the source data is fewer than 5 characters, we have to left-pad with a filler character, etc. When you’re accustomed to working with data that says what it means and means what it says, having to add “extra fluff” like left-padding your integers with a half-dozen zeroes can be a hassle.

I received a draft of a stored procedure recently which had to do exactly this. The intent is for the procedure to output the data almost entirely formatted as required, one record per line in the output file, and dump the result set to a file on disk. As it was given to me, the procedure was peppered with CASE statements like this (only more complex) in the SELECT clause:

-- Method 1
select case len(cast(logid as varchar))
when 9 then '0' + cast(logid as varchar)
when 8 then '00' + cast(logid as varchar)
when 7 then '000' + cast(logid as varchar)
when 6 then '0000' + cast(logid as varchar)
when 5 then '00000' + cast(logid as varchar)
when 4 then '000000' + cast(logid as varchar)
when 3 then '0000000' + cast(logid as varchar)
when 2 then '00000000' + cast(logid as varchar)
when 1 then '000000000' + cast(logid as varchar)
when 0 then '0000000000' + cast(logid as varchar)
end as logid
,logtext from cachedb.dbo.logs;

It’s perfectly valid, it works, and there’s nothing inherently wrong with it. But I find it a bit tough to read, and it could become trouble if the format changes later, as additional (or fewer) cases will have to be accounted for. Fortunately, the day I received this procedure was right around the day I learned about the REPLICATE() T-SQL function. Maybe we can make this simpler:

select replicate('0',10-len(cast(logid as varchar))) + cast(logid as varchar) as logid,logtext from cachedb.dbo.logs;

Not bad. But it leaves us with a magic number and similar to the previous example, if the file format changes we have to seek out these magic numbers and fix them. This is easily remedied by defining these field lengths at the beginning of the procedure, so that they’re all in one place if anything needs to change.

-- Method 2
declare @paddedlength int = 10;
select replicate('0',@paddedlength-len(cast(logid as varchar))) + cast(logid as varchar) as logid,logtext from cachedb.dbo.logs;

Yet another approach would be to pad out the value beyond what we need, then trim the resulting string back to the required length. Again, we have to be careful to not leave ourselves with magic numbers; the solution is the same as when using REPLICATE():

-- Method 3
select right('0000000000' + cast(logid as varchar), 10) as logid,logtext from cachedb.dbo.logs;
-- Or, with more flexibility/fewer magic numbers
-- Method 4
declare @paddedlength int = 10;
select right(replicate('0',@paddedlength) + cast(logid as varchar), @paddedlength) as logid,logtext from cachedb.dbo.logs;

All four methods yield the same results, as far as the data itself is concerned. But what about performance? For a table with 523,732 records, execution times were:

  1. 2,000ms CPU time, 261,785ms elapsed
  2. 2,265ms CPU time, 294,399ms elapsed
  3. 2,000ms CPU time, 297,593ms elapsed
  4. 2,078ms CPU time, 302,045ms elapsed

Each method had an identical execution plan, so I’m probably going to opt for the code that’s more readable and maintainable – method 2 or 4.

As with any tuning, be sure to test with your own data & queries.