dbatools Badge Ribbons at PASS Summit

One of the (many) fun things to do at PASS Summit is to check out the ribbons people have attached to their badges. Some are witty or goofy, others informational, others technical, and still more that let you express how you identify with a community within the community.

To celebrate dbatools and the awesome team & community around it, two limited edition badges will be available from/distributed by me and a handful of other folks all week at Summit. Check ’em out:

Be on the lookout for these badges and talk to us about dbatools! What you like, what you’d like to see changed, new feature ideas, questions about how to use functions, anything at all. Even if you’ve never used dbatools, we love talking about it and showing people the awesome things they can do with it so please, introduce yourself!


T-SQL Tuesday #94 – Automating Configuration Comparison

tsql2sday-300x300This month’s T-SQL Tuesday is hosted by Rob Sewell and he’s posed the following question:

What are you going to automate today with PowerShell?

I’m cheating a little bit in that this is something I did a couple weeks ago, but it was immensely helpful. I’d been working on building out a new instance to migrate our test databases onto, but the developers had an urgent need to do some testing in isolation so they “borrowed” that new instance. But we had an additional requirement – the configuration needed to match production as closely as possible, more than our current test instance. Of course, I reached for Powershell and dbatools.

I started with Get-DbaSpConfigure to retrieve the settings available from sp_configure as these were the most important to my comparison. I ran this against production as well as each of my test instances and saved the results of each to a variable. Because accessing my production instance requires either jumping through hoops or using SQL Authentication, I passed -SqlCredential (get-credential -Message "Prod" -UserName MySQLLogin) so I’d be prompted for that password instead of using Windows Authentication.

My configurations saved for reference, I can now look at one of the objects returned to see which properties need to be compared:

ServerName            : TEST1
ConfigName            : AdHocDistributedQueriesEnabled
DisplayName           : Ad Hoc Distributed Queries
Description           : Enable or disable Ad Hoc Distributed Queries
IsAdvanced            : True
IsDynamic             : True
MinValue              : 0
MaxValue              : 1
ConfiguredValue       : 0
RunningValue          : 0
DefaultValue          : 0
IsRunningDefaultValue : True

Looks like I want to be checking out ConfigName and RunningValue. ConfigName is the same name that you’d pass to sp_configure. PowerShell comes with a handy function Compare-Object which (you guessed it!) lets you compare two objects and reports the differences.

Hmm…that’s no good. I know there are differences between test and production – for one, production has about 24 times the amount of RAM test has. I took to the SQL Community Slack for help, and was reminded that Compare-Object by default doesn’t do a “deep” comparison on PSCustomObjects, so you have to specify which property(ies) you want compared. In this case, RunningValue. So, passing both ConfigName and RunningValue into Compare-Object (the former so that I’d know what was being compared), then sorting the output, I was able to readily see the differences.

The value corresponding to the left-pointing arrow is what came from the reference object, and the right-pointing arrow is the value from the difference object (which instance is the “reference” in this case isn’t terribly important, as long as you remember which is which). So MaxDOP and MaxServerMemory are both higher in production – which is expected.

If we really want to get crazy, we can even make this a one-liner. But I don’t recommend it.

Running this against my second test instance as well let me quickly deliver the news to the developers that the instances were configured as closely as possible, with any differences being limited to the hardware/environments they were in which is not something we were in a position to address.

Stashing Data for dbatools

While working on an enhancement to dbatools, I had a need to stash a local copy of a file downloaded from the internet, but in a safe place that I could reasonably expect to be safe from accidental deletion.

  • User’s home directory? Maybe, but it’ll be clutter, the user might see it appear and fear that they’ve got malware. And likely deleted ina “cleanup” effort.

  • Create my own directory somewhere on the file system? See above.

  • A temp directory fetched from env:temp, env:tmp, or [System.IO.Path]::GetTempPath()? Well, it wouldn’t be hidden, but by definition it’ll be prone to getting purged. Not great for potential medium-term storage.

  • Let the user specify a location at runtime? I don’t know about you, but I’ll forget about 5 minutes later and I want the parameters for this to be simple.

No good solutions there. Fortunately, the dbatools team has it covered. The module has a system for storing its own configuration settings and data/files and has a few settings pre-set for you. You can see the full list with Get-DbaConfig:

In this case, the setting I’m looking for is called Path.DbatoolsData. Accessing it is easy. Get-DbaConfigValue -Name "Path.DbatoolsData" gets me the value of that setting – C:\Users\andy\AppData\Roaming\PowerShell\dbatools in this case.

Combine this with ‘Join-Path’ and I’ve got quick access to that file I tucked away for later. Join-Path -Path (Get-DbaConfigValue -Name "Path.DbatoolsData") -ChildPath "MyFile.zip" returns C:\Users\andy\AppData\Roaming\PowerShell\dbatools\MyFile.zip

You can create your own configuration settings & values via Set-DbaConfig but be warned: these do not persist across sessions. If you want to persist configuration values across sessions, you’ll need to write them out to a file, then read them in from that file in the new session.

Getting Started with GitHub for dbatools

I’ve recently started contributing to the dbatools project and it’s all done through GitHub. Prior to this, I’d never used git and GitHub for anything more than an offsite repository for my own small repositories (I’ve used Subversion for over a decade) and I never totally understood how it worked in a large collaborative project until this came along.

I’m putting this together here for my own reference and to hopefully write it up in a way that helps things “click” for some people who need that extra nudge to get into “aha!” territory. A number of the examples I’ve seen elsewhere have mixed the command-line and GUI clients, but the more I use git GUIs, the less I like them for the basic workflow. You only need to know a handful of commands to be productive and for that, the command line beats the GUI in my opinion.

So, here we go. My GitHub workflow for working on dbatools, with as much command-line work as possible. This walk-through assumes basic familiarity with source control concepts.

  1. If you don’t already have one, get yourself a GitHub account. While you’re doing that up, please set up two-factor authentication.
  2. Install a git client. If you install GitHub Desktop, it’ll come with the command-line client. I think GitKraken does as well. If you use macOS or Linux, you should already have the command-line client.
  3. Go to the dbatools main repository and click the Fork button on the upper-right corner.
  4. Now it’s time to get a copy of the repository onto your computer. Hop over to your profile on GitHub and get into your fork of the dbatools repository. Click the Clone button and copy the URL.

    Now open up your command line interface of choice and point it at the directory where your local copy is going to reside and run the following (using the URL you just copied):
    git clone https://github.com/YOURNAME/dbatools.git
    This will create a directory named dbatools in the current directory and pull the entire repository down into it.

    Congratulations! You’re ready to start coding. Almost.
  5. In order to keep up with the very rapid pace of the main project, you’re going to need a way to keep pulling in the changes that happen upstream from your fork. When I started working in GitHub, this was one of the most confusing things to me, so here’s the secret: git remote. I found this page that explains in a generic way what needs to be done. In English, you configure your local copy of the repository so that it knows about the next repository beyond what you cloned from, so that you can pull updates from there. For dbatools, run the following commands:
    git remote add upstream https://github.com/sqlcollaborative/dbatools.git
    git fetch origin
    git fetch upstream
    git merge upstream/master
    git push origin
    What’s this doing?

    • Set the an alias in your local repository called upstream that points at the main dbatools repository.
    • Fetch all changes from origin (your fork on GitHub)
    • Fetch all changes from upstream
    • Merge all changes from the master branch of upstream into your local repository
    • Push everything back up to your fork on GitHub (but at this point, there’s nothing to push)
      Keep those handy; you’ll use them a lot (see the “Maintaining your repository/fork” section below). Now you can check what remotes you have set up for your repository with git remote -v and verify that you have an upstream that points to the main repository.
  6. Git projects (including dbatools) make very heavy use of branches and merging. In this context, branches are a lightweight way of keeping your changes separate from one another. You can code against one branch, commit your changes, then switch to another branch to work on another set of changes altogether without disrupting the first set. In the dbatools main repository, the master branch is considered the release version. All development work is done using the development branch as a starting point. So, it makes sense to set up your fork and local repository the same way. We’ll create our own development branch with git branch development.
  7. Creating a branch doesn’t mean that you’re automatically working in it. Switching to a branch is done with git checkout (if you’re accustomed to Subversion, this new usage of checkout may seem odd). Running git checkout development switches into the new branch. Ready to code? Just about.
  8. You’re working in development now but it’s strongly recommended that you create a new branch for each new logical set of changes as it’ll make issuing Pull Requests easier and more manageable (PRs are merged into the main development branch). You want to create this branch from development, so now that you’re in that branch, you’re going to branch again. This time we’ll shortcut with git checkout -b Fix-Updates. This both creates the branch and checks it out with a single command.
  9. OK, now you can get your code on. The dbatools maintainers prefer that you make each change set only one file, or a small number of files (if they’re all related to one change) to make merging into the main project easier. What are you waiting for? Get in there and code!
  10. You’ve got some great code written and you’re ready to commit. First, let’s look at what’s changed with git status

    git shows that one file has been changed, but it’s not able to be committed yet. For that, you first have to add it (another difference from Subversion; this file is tracked, but you have to add or “stage” it for this commit) and git even tells you how – git add functions/Update-dbatools.ps1. Once that file is added, re-check your status and you’ll see that the file is taken care of.
  11. Now that everything is staged and ready to go, it’s time to commit. Do not be afraid to make lots of small commits to your repository as you work so that you can fall back to an earlier version if something goes wrong. Make sure you’re including a useful message along with your commit so that people (yourself included) know what’s going on six months from now. You commit with (conveniently enough), git commit.
    git commit -m "This is my awesome commit message"
  12. Great! You’ve committed your changes to the local repository, now how do you get them back up to GitHub? By pushing them to the origin (your GitHub fork). Run git push and you’ll be informed that you can’t do that quite yet.

    Copy & paste that, and you’ll get your changes pushed up to GitHub.

    Note that the second attempt was only needed because origin was unaware of the branch. Subsequent pushes to this branch can be done with just git push.
  13. We’re almost there. Jump back to your web browser and refresh your repository. You’ll see that your new branch is front and center. To get your changes in front of the dbatools maintainers, you need to issue a Pull Request via that green button on the far right.

    By default, the master branch of the upstream repository is used as the basis for comparison; you need to change this by selecting development from the drop-down.

    Then fill out the form as completely as possible and click Create Pull Request.

Congratulations! You’ve just submitted your first change to the dbatools project for review. You’ll probably get some comments on your first PR. And your tenth. And your hundredth. And that’s okay! They’re constructive comments meant to help you and make your code better – it’s not an indictment of your programming skills or DBA knowledge or experience. Your contribution is definitely appreciated. The dbatools team wants to put out the best code possible and collaboration is the best way to do that. Everyone is working toward the same goal and it’s a learning experience through and through.

Anyway…there may be some conversation on your PR about suggested changes, things to remove, things to add into it, style, etc. Please don’t give up & walk away, but don’t just blindly do whatever is suggested either. If you have good reasons behind your decisions, present them. The team is there to guide you and shepherd the project, keeping the quality high, so it may take a couple resubmissions before your code is ready for prime time. What’s really cool with GitHub is that if you make further changes to your branch, the Pull Request is updated automatically when you push that updated branch back up (this is why it’s important to create a new branch for each change that will become a PR).

And then, when that’s finished and Chrissy accepts your PR and you get that “Merged” email with an emoji (I think Chrissy always puts an emoji in them), you can sit back and smile.

Workflow Snapshot

That’s a lot of steps. Here’s the short-short version:

  1. Fork sqlcollaborative/dbatools
  2. Clone to your computer
  3. Set upstream
  4. Create a local development branch
  5. Merge upstream/development into local development
  6. Create & check out feature branch Fix-Updates
  7. Code
  8. Commit
  9. Push
  10. Issue Pull Request

Steps 1-4 you’ll only do once; everything else is the work cycle that you’ll get accustomed to quickly.

Where am I working?

If you’re working on multiple changes over time, or even if you’ve worked on a series of changes (completing one before moving on to the next), you’ll find yourself with a number of local branches and it’s easy to lose track of where you are. git branch will tell you what branches exist, and highlight in green the one that you’re working in.

Remember that you always want to check out development before creating a new branch.

Maintaining your repository/fork

As you work on dbatools more, you’re going to have to manage your branches and keep up with the Joneses…I mean upstream. The good news is that thanks to the work you did earlier in setting up an upstream repository, the latter is pretty easy.

Keeping up with development

To keep up with upstream‘s development branch, switch into your development branch, then pull things down into it and merge. You should do this pretty often; anytime you start a new branch (remember, you’re branching off development every time you start new work, so you want the freshest code possible).

git fetch upstream
git merge upstream/development

This will pull the latest changes from upstream into your development branch. Then you’ll want to push that back up to Github the same way you pushed your Fix-Updates branch up to GitHub.

You should also merge in from upstream/master occasionally. Switch to your master branch with git checkout master and do the same as above:

git fetch upstream
git merge upstream/master

You’ll also want to maintain your origin/master branch the same way; just use origin instead of upstream in the example above.


I hope that this has been easy to follow and gets you started down the road of contributing to dbatools or another Open Source project on GitHub. git looks intimidating from the sheer number of commands it has and the crazy things you can do with it, but for a normal, simple workflow there’s only a handful of commands you need and in many cases if you get a command slightly wrong or miss a step, it’ll help you out. The most important thing is to read the contribution guidelines before jumping into the deep end, and if you have any questions please don’t hesitate to ask in the #dbatools channel on the SQL Community Slack.

T-SQL Tuesday #92: Lessons Learned the Hard Way

tsql2sday-300x300This month’s T-SQL Tuesday is hosted by Raul Gonzalez and he’s asked everyone to share things we might be a bit embarrassed about:

For this month, I want you peers to write about those important lessons that you learned the hard way, for instance something you did and put your systems down or maybe something you didn’t do and took your systems down. It can be also a bad decision you or someone else took back in the day and you’re still paying for it…

  • In the stress/performance testing portion of an upgrade of a critical system, we were short on disk space. So, rather than having a separate set of VMs for the performance testing (as we needed to be able to get back to functional testing quickly), we decided to just take VM snapshots of all the servers. Testing was delayed a day or two – but we didn’t switch off the snapshots. Then we started testing and performance was terrific…for about five minutes. Then everything came to a screeching halt. Panicked, we thought we were going to need a pile of new hardware until the VMWare admin realized that our disks were getting hammered and we still had those active snapshots.
    Lesson learned: If you take VM-level snapshots of your database server and let them “soak” for an extended period, you’re gonna have a bad time. Unless you need to take a snapshot of the host OS or instance configuration itself, use a database snapshot instead of a VM-level snapshot.

  • A couple of times, I’ve had under-performing VMs running SQL Server. As I hadn’t been involved in the configuration, I thought everything had been provisioned properly. Turns out…not so much. Memory reservations, storage configuration, power profiles, all set up for suboptimal performance.
    Lesson learned: Ask your VMWare admin if they’ve perused the best practices guide and review things yourself before going down the rabbit hole of SQL Server configuration & query tuning. If the underlying systems aren’t configured well, you’ll spin your wheels for a long time.

  • In doing a configuration review of a rather large (production) instance, I noted that at least one configuration option was still set to the default value – Cost Threshold for Parallelism was stuck at 5. Running sp_BlitzCache, I found that I had quite a few simple queries going parallel and huge CXPACKET waits. CXPACKET isn’t bad per se, but if you’ve got a low-cost query that’s going parallel and waiting on threads where it could be running faster overall single-threaded (verified this was the case for several of the top offenders), increasing the cost threshold can help. I did some checking, verified that it was a configuration change I could make on the fly, and set the value to 50.
    And then everything. Slowed. Down.
    When I made this configuration change on the test instance, it wasn’t much a problem. But that was a much smaller instance, with much less traffic. What I failed to fully comprehend was the impact of this operation. I overlooked that changing this setting (and a number of others I wasn’t aware of) blows out the plan cache. In the case of this instance, about 26Gb of plan cache. Not only was performance impacted while the plan cache was re-filled, we took a hit while all the old plans were being evicted from cache.
    Lesson learned: Even if it seemed OK in test, those “low impact” changes can have a much larger impact on production unless you can make test mirror production in every way. So plan when you make these changes accordingly.

We learn the most from our mistakes. We can learn almost as much from the mistakes of others. Learn from mine.

Spell-checking dbatools with Visual Studio Code

Earlier this week I was working on adding a new feature to Update-DbaTools and while looking at another cmdlet to check syntax/conventions, I noticed an ugly typo in some of the help for it. 100% perfect prose isn’t necessary in the comment-based help for PowerShell cmdlets, but seeing misspellings and such kind of bugs me. Fortunately this is something I can help fix since the module is on Github.

First I needed to find a spell-checker that works with Visual Studio Code to help me spot misspellings. This was slightly trickier than expected, as I use macOS at home and at least one of the first plugins I found was Windows-only. I finally settled on Code Spellchecker.

But as you can see from the marketplace page there, by default this plugin doesn’t know PowerShell. In my user settings file settings.json, I added PowerShell to the cSpell.enabledLanguageIds section so it’s always recognized:

"cSpell.enabledLanguageIds": [

And with that, VSCode was giving me green squiggles under lots of words – both misspelled and not. Code Spellchecker doesn’t understand PowerShell in its default setup, it doesn’t have a dictionary for it. Just to get things started, I added a cSpell.userWords section to my settings.json and the squiggles started disappearing. The list I’m working with so far is posted as a gist on Github:

I’ll keep this updated as I encounter more strings that need to be recognized, whether they’re PowerShell tokens or specific to the dbatools project. In addition to actual PowerShell syntax in there, I’m dropping in strings that are commonly found throughout the module. Eventually I suppose I should get a proper dictionary file or two together, but this works well for a quick & dirty way to get going with a spellcheck & language cleanup for the module.

Don’t Forget the Network

A few weeks ago I was looking at a query and got tripped up by the network and my own forgetfulness. It was a pretty simple query with a simple-looking execution plan. It didn’t even do that much work. About 20K logical reads and returned 200K records. For a server as large as the one I was working with, this should have been nothing. Instead, was waiting three minutes to get my results. My first thought was “aha, this query must need tuning, maybe a new index.”

I started experimenting in a non-production copy of the database on the instance. Added a couple indexes, adjusted the query itself, even a few query/index hints just to see what worked. Each change definitely had an impact on the query. The execution plan shape changed a bit, and the costs shifted around between a few operators. But still I was waiting minutes for the complete results.

The Plot Thickens

I tried my query against the development server. Production has 24X as much RAM, 3X as many CPU cores, and faster disks when compared to the dev box, so I expected even worse performance. Nope. Near-instant results, even for the original query with no additional indexes, hints or any other trickery.

OK, something’s definitely up now. I hopped back to production and while running my query, I ran sp_BlitzFirst to peek at what was going on. Lo and behold, the top wait stat was ASYNC_NETWORK_IO. I’ve seen lots of this in the past, attributed to poorly-written applications that just couldn’t consume the data as fast as SQL Server could churn it out. But this is SQL Server Management Studio, it’s absorbing the data as fast as it can receive it.

And that’s when it hit me. SSMS was consuming the data as quickly as it received it. The development server is in the office. The production server is in a datacenter 1/3 of a continent away and the connection between that datacenter and the office sometimes gets bogged down. The problem wasn’t that SSMS couldn’t drink from the firehose – the firehose had a clamp around it!

To test this idea, I made one last change to the query – I selected the data into a temp table instead of sending the results to SSMS. Round-trip time: two seconds. Conclusion: the network itself was causing the slowdown and this query needs no tuning!

As it turns out, SSMS has a way to discard query results so that you don’t have to resort to dumping results to a temp table – Discard query results after execution. It’s similar to a /dev/null for result sets. If you only need an actual execution plan but no query results (like I did here), check it out.

An Unexpected Side-Effect of Invoke-WebRequest

Recently I was working on a bit of PowerShell to download the awesome First Responder Kit from Brent Ozar Unlimited. The canonical URL for the FRK is http://firstresponderkit.org/ but that’s a redirect to the GitHub repository where all the magic happens. I thought to myself:

Self! Rather than take a chance on that GitHub URL changing, use the “main” URL and Invoke-WebRequest will take care of the redirect for you.

So off to the PowerShell prompt I went and ran Invoke-WebRequest -Uri http://firstresponderkit.org/ to start looking at the object returned so I could see what I needed to parse out to find my way to the true download URL.

Then Firefox (my default browser) opened, and I was staring at https://github.com/BrentOzarULTD/SQL-Server-First-Responder-Kit/tree/master.

Alt text

I was expecting an HTTP 30X redirect status code which, based upon previous experience, Invoke-WebRequest would honor. Instead, I got a 200 OK which is the web server saying “yep, here’s your stuff, HAND!”

Invoke-WebRequest -Uri http://firstresponderkit.org | Select-Object -ExpandProperty Headers

Key              Value
---              -----
x-amz-id-2       {QtTLMVw5QobGd/xlueEIY44Ech2va1ZKALhaMrY9f/yI0fBHvAoA6KwGUa5jTQxPF5fF85tuYws=}
x-amz-request-id {86A4E2A10548CA53}
Date             {Sat, 03 Jun 2017 16:14:47 GMT}
ETag             {"4ff7c8b410c399d5b18e2ab05bbfce22"}
Server           {AmazonS3}

Hmmm…nope, nothing there. OK, in a past life I did some non-redirect redirects through page contents. Let’s look at the content of the page itself (if any):

    Invoke-WebRequest -Uri http://firstresponderkit.org | Select-Object -ExpandProperty Content
    <html lang="en-US">
        <meta charset="UTF-8">
        <meta http-equiv="refresh" content="1;url=https://github.com/BrentOzarULTD/SQL-Server-First-Responder-Kit/tree/master">
        <script type="text/javascript">
            window.location.href = "https://github.com/BrentOzarULTD/SQL-Server-First-Responder-Kit/tree/master"
        <title>Page Redirection</title>
        If you are not redirected automatically, <a href="https://github.com/BrentOzarULTD/SQL-Server-First-Responder-Kit/tree/master">head over here.</a>

Now we’ve got something. The web page itself has both a meta tag-based refresh/redirect and a JavaScript redirect, and that JavaScript redirect is being executed! How do we prevent the browser from opening and send the script to the right place?

Answer: the -UseBasicParsing switch for Invoke-WebRequest. From the docs:

Indicates that the cmdlet uses the response object for HTML content without Document Object Model (DOM) parsing.

This parameter is required when Internet Explorer is not installed on the computers, such as on a Server Core installation of a Windows Server operating system.

Note that this doesn’t eliminate all parsing of the content, and it’s not required to get parsing done on systems without Internet Explorer – everything I’ve written in this post was run in PowerShell on macOS, where Internet Explorer definitely doesn’t exist. But what it will do is prevent the parsing/execution of the JavaScript that’s embedded in the web page, which is what caused the browser to open in this case.

Looking closer at the output of Invoke-WebRequest, there’s a Links collection that looks pretty good.

    (Invoke-WebRequest -Uri http://firstresponderkit.org).Links |Format-List

    outerHTML : <a href="https://github.com/BrentOzarULTD/SQL-Server-First-Responder-Kit/tree/master">head over here.</a>
    tagName   : A
    href      : https://github.com/BrentOzarULTD/SQL-Server-First-Responder-Kit/tree/master

So now I can dig a little deeper and send my script to the URL that Brent & Co. want me to go to, and continue my search for the one true First Responder Kit download link by crawling subsequent pages.

Invoke-WebRequest -UseBasicParsing -uri $((Invoke-WebRequest -Uri http://firstresponderkit.org).Links[0].href)

How I Became a…SQL Server DBA

Kevin Hill mentioned this idea/series on a SQL community slack channel back in April and I thought it would be a good way to get back to blogging. The timing worked out well as I had just started a new job, my first with the official title of “SQL Server DBA.” So how’d I get here?

In college, I took a single database course. I’d messed around with Microsoft Access a bit, but wanted to get a better handle on what I was doing. The course was not at all what I was expecting. I passed and did OK, but I didn’t completely grasp the material. The class was mostly deep RDBMS theory including “how do we store this on disk” – I wrote minimal amounts of SQL in this course because it wasn’t required.

I graduated and took my shiny new Computer Science diploma to my first job, and within a few months I had a solid handle on Classic ASP, building apps with it and handling some of the server admin stuff on the NT4 boxes that hosted them. I spent a little over 5 years there and got minimal exposure to databases as that wasn’t what my job function demanded – I’d write some queries against DB2 on the mainframe or a SQL Server instance, but that was about it. The DBAs took care of everything else.

After a few years, I moved on from that position as I wanted to relocate for personal reasons. I found a job doing some Java work on an in-house application and system customization/integration for a purchased application that was used as the hub for the company’s core business. In the course of working on those systems, I started doing a lot more SQL work, but at the time I only knew enough to be dangerous.

During a project to upgrade that system, I got a crash course in writing good SQL from Allen White (b|t), and learned much more about how SQL Server works from both him and Kendal Van Dyke (b|t). Allen and Kendal also introduced me to the SQL Server community and my eyes were opened. This was huge.

Over the next several years, I discovered that I was a developer who had DBA tendencies that I just hadn’t realized yet. I started to get involved with the SQL Server community. Talked to so many people. Subscribed to dozens of blogs. Attended SQL Saturdays and PASS Summits.

Then, one evening after we finished unpacking equipment and supplies from one of our Rochester SQL Saturdays, Matt Slocum (b|t) just asked me, point-blank. “So do you wanna be a DBA or what?” Ding! The lightbulb flicked on. I’m already doing a whole bunch of this stuff, and enjoying it – why not go for it?

I refocused my efforts on really understanding how SQL Server works. Looked for ways to leverage my programming experience with a slant toward managing databases. Did a lot more non-production DBA type work (I didn’t a lot of access to production, which was probably a good thing). After searching for a while, I landed a job as a full-time production DBA with a company operating a SaaS platform. It was a bit of a leap but one that I had to take as it was the right thing that came along at the right time. I’m nearly 2 months in now and I’ve learned a ton already. Made a few slip-ups, but that’s to be expected – just have to learn from that and move forward.

I Finally Get Cross Apply!

For years I’ve looked at various queries online in sample code, diagnostic queries using DMVs, and the like and seen CROSS APPLY in the FROM clauses. But I’ve never really managed to comprehend what it was for or how it worked because I never saw a case where it was directly applied to something I was doing.

Finally, this week I had a breakthrough. I was working on updating a bunch of data but it was breaking on a small subset of that data. In this case, I was attempting to JOIN two tables on fields that should have been INTs, but in a very small number of cases one side was using a comma-delimited string. The user told me that someone else had done these updates in the past and didn’t encounter the problem I was having (so I knew that it was something i was doing “wrong”), but given that it was only a handful of broken updates she was OK with manually doing the updates (we were scripting it because we were updating potentially tens of thousands of records).

I am not OK with manually fixing this in the future. I wanted to know how the other DBA had done it before. I dug into some history and found CROSS APPLY. My nemesis. I was determined to figure out how to use it this time.

Setting the Stage

Let’s set up three simple tables to keep track of airports and what state each airport is in. But our developer doesn’t totally get database design and in his state-to-airport mapping table, he allows for a comma-separated list of airports associated with each state.

    CREATE TABLE #States
    ([Id]      INT IDENTITY(1, 1),
    StateName NVARCHAR(30) NOT NULL
    CREATE TABLE #Airports
    ([Id]     INT IDENTITY(1, 1),
    CREATE TABLE #StateAirports
    Airports NVARCHAR(50)

This makes getting a list of airports and their associated state names tricky at best if we don’t know about CROSS APPLY. With CROSS APPLY, it’s pretty straightforward.


Here’s the finished query.

    SELECT s.statename,
    FROM #StateAirports SA1
        CROSS APPLY string_split(SA1.airports, ',') AS SA2
        JOIN #Airports A ON A.Id = SA2.value
        JOIN #states S ON S.Id = SA1.stateid

string_split() is a Table Valued Function which we finally got in SQL Server 2016 after far too many years of having to write (or, let’s face it, copy from someone’s blog post) inefficient string splitting functions. Important note: even if your database engine is SQL Server 2016, the database you’re operating in must be at CompatibilityLevel 130

Breaking it down

If we take CROSS APPLY and break it down into its parts, it finally starts to make sense.
* APPLY the string_split() function to the Airports field of the #StateAirports table
* Append the each row of string_split()‘s output to the corresponding row on #StateAirports (similar to a CROSS JOIN but not exactly)

So now I have N rows for each StateId in #StateAirports, where N is the number of values in the comma-separated field. And JOINed to each row is one of the rows from the output of string_split().

    SELECT *
    FROM #StateAirports SA1
        CROSS APPLY string_split(SA1.airports, ',') AS SA2

From there, the query is pretty normal otherwise, JOINing to the other two tables to translate the state & airport ID numbers to their text values.

Hopefully this helps others get a handle on CROSS APPLY and find useful places for it. This had been a head-scratched for me for years, but only because I didn’t have an example that clearly broke down how to use it and what was going on. In hindsight, I probably could have used it in some analysis I did at a previous job but instead resorted to parsing & processing comma-separated data in a PowerShell script.