Wednesday, May 15, 2019

Issue with sys.servers and the distribution database in SQL 2016


First post in six months, I'd like to think I have been busy but maybe I just haven't had anything new to add to the SQL server conversation. Hopefully this helps someone else as I have been working with MS support for 2 weeks now on this issue and I just figured it out today. (Do I get my support dollars back?)

We have been running an agent job for years now to pull data from our replication distributors about the subscribers. It runs the following code against the distribution database

SELECT DISTINCT ms.subscriber_id,
        s.name AS subscriber,
        ms.subscriber_db,
        ms.subscription_type
FROM distribution.dbo.MSsubscriptions AS ms WITH(NOLOCK)
JOIN master.sys.servers AS s WITH(NOLOCK)
        ON (ms.subscriber_id=s.server_id)

It returns a list of subscribers and the local databases. Something like what I have below, there are two different subscribers who have different numbers of subscriptions and local subscription databases.

subscriber_id
subscriber
subscriber_db
subscription_type
9
SubscriberA
Database1
1
9
SubscriberA
Database2
1
9
SubscriberA
Database3
1
30
SubscriberB
Database1
1
30
SubscriberB
Database2
1

We implemented a SQL 2016 distributor about a year ago and everything worked fine until we added a new subscriber last month, lets call it SubscriberC. After the subscriber was added the job started failing with an error message like this:

Msg 20032, Level 16, State 1, Procedure sp_replmonitorsubscriptionpendingcmds, Line 52 [Batch Start Line 11]
'LinkedServer-PRD' is not defined as a Subscriber for 'PublisherA'.

Linkedserver-PRD is a linked server i setup on the distributor using a CNAME and the data source for that linked server is not the same server as SubscriberC. I excluded the 2016 distributor from the collection so the job would run without issue and began troubleshooting. Now when I run my query from above I get this output

subscriber_id
subscriber
subscriber_db
subscription_type
9
SubscriberA
Database1
1
9
SubscriberA
Database2
1
9
SubscriberA
Database3
1
30
SubscriberB
Database1
1
30
SubscriberB
Database2
1
31
LinkedServer-PRD
Database1
1

I checked sys.servers and sure enough server_id 31 is LinkedServer-PRD and my new subscriber, SubscriberC is server_id 32. I must point out that replication was working with no error and replication monitor showed all the correct server names.

At this point I opened a ticket with Microsoft and we have been going back and forth for a few weeks now. The support engineer sent me a new query yesterday but it still used sys.servers so the output was still showing LinkedServer-PRD. I asked him if there was another table with the subscriber name in it and he mentioned MSsubscriber_info but that it didn't have an id columns in it, just names. I thought to myself that there must be another table in the distribution database with the data in it so I started querying random tables until I found this one


SELECT *
FROM distribution.dbo.MSreplservers AS ms

BAM! There were the server names with the proper IDs
srvid
srvname
8
PublisherA
9
SubscriberA
30
SubscriberB
32
SubscriberC

Now I can update my original query to this and get the data I need:

SELECT DISTINCT ms.subscriber_id,
        s.srvname AS subscriber,
        ms.subscriber_db,
        ms.subscription_type
FROM distribution.dbo.MSsubscriptions AS ms WITH(NOLOCK)
JOIN distribution.dbo.msreplservers AS s WITH(NOLOCK)
        ON (ms.subscriber_id=s.srvid)

I was about the update the code that our job uses but decided to check our SQL 2014 distributors to make sure the code works there and unfortunately I get this error:
Msg 208, Level 16, State 1, Line 1
Invalid object name 'dbo.msreplservers'.

It seems that the table was added in SQL 2016. I don't have any SQL 2017 servers to test it on but I bet it is there as well. Now I need to go back and refactor my query to use the new table for 2016 and sys.servers for 2014 and lower but at least the mystery is solved.

That's it for now. I plan to start posting more and hopefully do better than the six posts I did last year. Maybe I can get back up to the glory days of 2011 when I had 10, although I should probably go back and review all those for accuracy. I have another MS ticket open right now dealing with Kerberos issues (see my post about Windows 10 and Credential Guard for a primer). Hopefully that one will get resolved soon and I can make a post out of it. Other than that I hope to clean up some of the older pages. My cut and paste jobs from confluence really leave something to be desired.


UPDATE: Since I knew the table name I was able to better search the internet and found that this is because starting with SQL 2016 SP2 -CU3 there is now support for using a distribution database in an availability group. It makes sense that they had to add the table because there is not an easy way to synchronize sys.servers in the master database between all the nodes.
See the Microsoft Announcement Here