Hi All, I've been getting a little frustrated trying to fix this I was hoping someone could please help.
I have 7 servers in 2 namespaces and replica groups (4 servers each, 1 server overlaps) so replica group1=[server1A, server1B, server1C, server12] and replica group2=[server2A, server2B, server2C, server12]. (Each replica group on server12 is on a separate volume.
This past weekend server12 had a NIC flake out. I had to uninstall and reinstall the drivers.
Ever since then I have found that server1A and server2A (the masters in a full mesh) are not sending or receiving replica information to the other 3 servers. However the other 3 servers are replicating fine with one another (including server12). server12 is the only common thread between the 2 replica groups and I cannot figure out why the primaries on both groups has just stopped.
My staging space is 32GB and on a separate drive with over 600GB free
I have:
- Rebooted all servers (not at the same time)
- Ran DFSRDIAG POLLADD
- Ran the Diagnostic reports - no errors
- Checked event viewer, system, application, and dfs logs - no errors
- Ran dfsrcheck.exe - no errors
- cleaned the conflict and deleted and restarted
- bumped up my conflictandstaging folder from 660mb to 1024mb and unchecked the save deleted files here
- right clicked the replica and selected replicate for each
- ran the reports for backlog (server1A had a lot)
- set the debug up to 4 but I think that's more for MS cause is was mostly numbers. No noticeable errors.
As far as I can find on the net I've done everything besides the brute force method of deleting the server from the replica group and readding it, but I didn't want to do that.
Any help is much appreciated.
Thanks
Life moves pretty fast. If you don't stop and look around once in a while, you could miss it.