Upgrading Content Databases with Duplicate Site Collection URL’s
Anyone who has ever tried to do an in-place upgrade from SharePoint 2007 to 2010 knows what a complete nightmare that process could be; without much effort you could easily destroy a farm and spend endless hours trying to rebuild from scratch. The process was so fragile and prone to failure that in the 2013 release Microsoft removed the option altogether, allowing only content database detach and re-attach migrations. This greatly improved the chances of success and is, for the most part, a pretty seamless operation for the majority of deployments. But every now and then you come across a strange configuration that throws a spanner into the works and turns a simple migration into a huge headache.
Case in point. I recently received a distress call from a user who was trying to migrate a single content database from 2010 to 2013 and getting all sorts of strange results. Although the migration appeared to succeed, all the content was over a year old – documents, sites, list items, customizations – everything was out of date. Initially, it was thought that a second database had been mounted at some point during the preceding year but that wasn’t the case – trolling through the SQL server backups it was obvious that there was only ever one database for that web application. Since database corruption is always a possibility, they tried restoring and mounting successive backups going back several months but nothing changed – the old data kept showing up. Running out of time in the assigned maintenance window, the administrator then tried restoring the original database back to 2010 and the same thing happened – old content showed up and the new stuff had simply vanished. Now they really had a problem as both environments, the original 2010 farm and the new 2013 farm, were completely unusable.
Curious as to what might have gone sideways, I had them check the AllDocs and AllLists tables in a copy of the current content database for the latest TimeLastModified value to see if it matched up with the results in the user interface (NOTE: This was an offline copy and not a production database – you should never go poking around in production SharePoint databases). That’s when we got a real shock. The database actually contained current lists, documents and list items! Although we couldn’t see them when viewing Site Contents, there they were plain as day in the database, with the proper metadata values and template associations. And they weren’t hidden, either – attempts to get at them programmatically also failed. Something really weird was going on.
After replicating the steps they took a few times on my own, and getting the same results, I decided to have a look at the upgrade log files. Turns out that even though no errors were displayed during the mount operation there were actually a small number of errors encountered during the upgrade process. The first one I came across made it seem as if whole site collections were missing:
07/20/2014 15:49:45.68 powershell (0x0EAC) 0x20C8 SharePoint Foundation Upgrade SPContentDatabaseSequence ajxkz ERROR Database [WSS_Content] contains a site (Id = [aa3fd23b-5c67-4996-a7c6-773d450945d8], Url = [/sites/abc]) that is not found in the site map. Consider detach and reattach the database.
Well that was obviously nonsense as I knew for a fact that the site collection did exist – I could navigate to it after the upgrade completed. It was followed by a warning indicating that "orphaned sites could cause upgrade failures". Um, yeah, I suppose they would, except there were no orphans in this case – the site really did exist. And then a few lines later I found this little gem:
07/20/2014 15:49:45.68 powershell (0x0EAC) 0x20C8 SharePoint Foundation Upgrade SPContentDatabaseSequence ajxk3 ERROR Database [WSS_Content] contains a site (Id = [aa3fd23b-5c67-4996-a7c6-773d450945d8], Url = [/sites/abc]) whose url is already used by a different site, in database (Id = [973995d8-d187-42c3-890e-04031a48811e], name = [WSS_Content]), in the same web application. Consider deleting one of the sites which have conflicting urls.
Huh? How could two site collections have the same URL? Surely that was nonsense also but just to be sure I ran a quick query against the AllSites table – and stared in disbelief at the results. There were five rows in the table when there should only have been three (that’s how many site collections showed up in Central Administration for that web application both before and after the upgrade process). Where did the extra rows come from and what sites did they refer to? Well, as it turned out, they referred to the exact same site collections but with an earlier TimeCreated value. Somehow, site collections had been created, then later re-created with the same URL’s, without the old ones being removed from the database. The upgrade operation was obviously using the earlier values when it updated the schema and object associations, which explained why objects existed in the database tables that weren’t exposed in the UI – it was just ignoring references to site GUID’s that didn’t match the two it selected. A quick look back at the AllLists table confirmed it, as tp_SiteId did in fact refer to the earlier site collections.
Now that I knew what was happening, the fix was easy. First, I removed the existing site collections which were based on the old instances, using the Remove-SPSite command. I then dismounted the database and mounted it again. This time, with the old site references gone from the AllSites table, the proper site associations were made and the correct content, including all the lists, libraries, documents and items, was restored. Problem solved. The only remaining mystery was how additional site collections were created with the same URL as ones that were already in the database – this shouldn’t be allowed to happen. I still don’t have an answer for that but at least the customer was able to carry on with their migration without suffering any data loss. I’ll put that one in the "win" column.