Replication slow or not working, with warnings about multiple objects

Hi all,

I've spent the last week trying to get to the bottom of this and am not making much progress. Here is what we have:

2 sites (call them A and B), 4 DC/DFS servers, two in each site, one virtual and one physical. DC01 (physical) and DC02 (virtual) are in site A, DC03 (virtual) and DC04 (physical) are in site B. Sites are connected by 100Mbps WAN link. There are no sites defined / organized in AD, this is just a different geographical location.

The original setup was on 2003 servers, a few months ago we upgraded domain and DFS to 2008R2. All 4 servers were fresh install, old servers were retired. All servers are patched by WSUS on a regular basis and rebooted in the middle of the night, we haven't had any issues with that. I've read about DFSR patches and hotfixes, if these are not part of WSUS updates then they have not been applied.

We have 7 namespaces with 19 folders, each folder is in its own replication group. All replication is set to full mesh (except one folder as described below), and each folder has only one active referral target.

About a week ago, we discovered that permissions on a couple of critical folders were not as they should be and decided to remedy that. On 3 out of 4 servers they were so messed up that I couldn't even gain access (as full domain admin), and to replace ownership (to domain admin group) it would mean all permissions would also be set to domain admin group, before we can set them to what they really need to be. Since this particular folder contains just under 2TB of data (mostly PDF files 1MB to 4MB each), we decided to replace permissions during after hours. At the time, DC03 was active referral target for this folder, however (for some reason that escapes me at this time) I decided to apply permissions on DC01 and let them replicate to other servers. So this was done, it took about 90 minutes to apply permissions and since we didn't know how long it will take to replicate this to DC03, I switched referral to DC01 which became the only referral target for that folder. We did a quick test and everything seemed ok. We were planning to wait for changes to replicate and then switch referral target back to DC03.

In the morning we've got the calls about users not being able to access some files. After investigation, we found that files that were saved to DC03 the day before had not been replicated to DC01, and now they were inaccessible as they were still on DC03 but DC01 was the only referral target. XCOPY was used to manually copy files from the day before, however during the investigation we found a handful of files were not replicated from some subfolders going back a couple of months. This was the first time we realized replication may not be working at 100% and started digging deeper.

At some point during this weekend I rebooted all 4 DCs one by one, without any positive impact. I have also changed full mesh replication to create a chain : DC01 > DC02 > (WAN link) > DC03 > DC04, topology tested ok. I haven't noticed any improvement. Staging area for this folder is set to 128GB, following small staging area events in the event log. Prior to this we've had plenty of disk activity, which has gone down to only a few MB/s and is easily handled by the server (4 CPUs, 8GB memory, 4x3TB disks in RAID5. Since I changed staging area on Friday we've only got one error about high watermark, the same day. At this time logs show occasional sharing violation for different files (normal use pattern from what I can tell) and plenty of info events about files being changed on multiple servers. DFSRS.exe takes around 650MB and low CPU usage, with about 2-3 MB/s disk traffic.

Right now we have some folders (not all) that have backlogs to or from DC01, while other servers are current for the most part, except for the 2TB folder we replaced permissions on. That folder currently has a backlog of 1.440 million files (presumably permission changes) DC02 > DC01, and 1.442 million DC01 > DC02. Interestingly dfsrdiag backlog still shows backlog between DC01 and DC03/04 even though they shouldn't be replicating directly according to topology. Backlog numbers are a bit higher than numbers above, it's almost as if backlog didn't go away but rather stands still. I expected any backlog from DC03 > DC01 would become DC03 > DC02 and DC02 > DC01, as per current topology.

While running dfsdiag backlog commands I found some cases where the command would execute but with warning :

[WARNING] Found 2 <DfsrReplicatedFolderConfig> objects with same ReplicationGroupGuid=878ED61A-A737-4C88-8D16-D65CABE68175 and ReplicatedFolderName=uploads; using first object.

I am not sure if this is related or if the problem existed before we did work a week ago.

I have followed instructions to rename .XML files into .OLD and have observed new XML files were created following DFSR service restart. It doesn't seem to have made any difference.

Please let me know what information I can provide to hopefully resolve this.

Thanks very much

Replication slow or not working, with warnings about multiple objects

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112