Its the little things – Service Master Keys

In advance of the yearly licensing reconciliation with Microsoft at work, I have led the charge to ensure all of our Development and QA servers are running the proper editions, namely Developer edition. In doing so I have done a lot of downgrades for SQL Server. Part of this process ensures that all of servers have our best practices and standards applied consistently across all installs.

Earlier today I got an email indicating that the linked servers on the CompXQA server I had recently worked on were failing.  Considering I had applied the same process to the CompXQA server as I had on the other hundred plus that I had recently downgraded, I was perplex.   I wrote a quick query and sure enough, errors.

Just like a developer, I read one error message and ran with it.

no remote user ‘user_name’ mapped to local user ‘(null)’ from the remote server

Based off of that error message, my first thought was to check the error log on the remote server to check for any login failures, that would point an issue that likely existed before my work. But there wasn’t. Not knowing if this account is being used by other servers I couldn’t simply update the account password on the target server, that could cause havoc elsewhere, which I try to keep to a minimum.

I used my favorite search engine to see what the blogosphere had in mind where I ran across this site which recommended simply regenerating the key. Whoa!  One does not simply force a new key.  Why would you need a new key for a login error?   Let me find my way back to the real error message and I see this obvious gem staring at me from the top line of the screen.

An error occurred during decryption

Now that I know I have an issue  I need to figure out who I managed to mangle the service master key.  To start with I wanted to take a backup of the current Service Master Key before I mess around with it, but I get the same error.

Digging into MSDN for the various commands against Service Master Key this hint hits me “The service master key is encrypted using the local machine key or the Windows Data Protection API. This API uses a key that is derived from the Windows credentials of the SQL Server service account.” 

That is interesting, and telling at the same time.  One of our standards concerns the service account.  If we manage the server, it gets our groups domain service account, if not, it goes to the teams service account, or if none are applied we default to Local System.   When I started my downgrade, this server was using our old service account.  When I got done I changed it to ‘Local System’   account.

MSDN tells us: “To change the SQL Server service account, use SQL Server Configuration Manager. To manage a change of the service account, SQL Server stores a redundant copy of the service master key protected by the machine account that has the necessary permissions granted to the SQL Server service group. If the computer is rebuilt, the same domain user that was previously used by the service account can recover the service master key. This does not work with local accounts or the Local System, Local Service, or Network Service accounts. When you are moving SQL Server to another computer, migrate the service master key by using backup and restore.”

Now I’m perplexed again, per Microsoft, I should be able to decrypt the Service Master Key with the machine account since that hasn’t changed, but I was getting the same error.    Now I do see that it says that “this does not work for local accounts or the Local System..”    I believe the ‘this’ which is referring to was using the same domain user on a rebuilt computer so it shouldn’t apply, but I’m looking for resolution, not grammatical clarification.

Since this is a QA system I was able to plug-in the old service account and restart the service once I notified the users.  Once I restarted the instance I was able to a backup of the Service Master Key.   Since our standards say that I can’t leave this account in place, I have to revert the Service Master Key back to the ‘Local System’ account.  Now I would expect that I shouldn’t be able to take a backup at this point because it should still be invalid, but alas, I was able to take a separate backup which indicates the proper key is in place.

I’m still perplexed as to why after a backup works, so I take to my search engine and find someone reporting the same behavior to Microsoft   The response from Raul Garcia at Microsoft points out a few items that likely point to the source my problem.

  • When changing the service account, was the SQL Server control manager used?
    No – I uninstalled and reinstalled the product completely, copying off master and putting master back in place.
  • Is the entropy key present in the registry?
    Not likely since I did a full UnInstall/ReInstall which likely wiped the registry key with the Entropy key which is needed with the Machine Account

In the end I suppose that when I loaded the old Domain Service Account, it was able to load the Service Master Key properly, at which point it provided access to the current machine account.  Once the service account was reverted to ‘Local System’ the machine account was then able to decrypt and subsequently use the original Service Master Key.   So while I didn’t have to REGENERATE as the article above suggested, nor RESTORE from the backed up key, I was able to get my key back to a stable state.