We have a physical server running Windows Server 2003 R2 x64 SP2. It has 24GB of RAM, using Xeon E5620s totaling 16 cores. We're using one of two embedded GigE interfaces. This server is located in a datacenter connected via 1G MetroE to our office. The server has a 300GB C: drive (likely 2x or 4x 300GB SAS DAS in RAID1 or 10), and two FC NetApp LUNs (S: and V:, each 2TB). The data in question is on the S: drive, and the Netapp controllers are FAS3240s.
We have users complaining that the server will intermittently bog down and give them the hourglass icon for 10-60 seconds. This morning my supervisor was poking around in the folder structure of the S: drive and it would randomly give the hourglass icon for 10 seconds or more before responding. The delayed folders were various sizes... some folders contained very few subfolders and files, and others contained 100s of subfolders.
I already had Perfmon running every 15s for a few weeks and they complained yesterday about the problem. I don't see a way to attach a .csv to this message, but here is what Excel told me during the reported trouble times. Would it be the queue lengths we need to be concerned with? Should I add more perfmon objects?
Between 11:31-11:40: Max CPU % ranges between 1-6% Min RAM 18900MB (19GB) Network max bytes received around 1.3MB/s Network max bytes sent around 2.25MB/s Max pagefile % around 30% Max Avg Disk Read Queue length 8.5 Max Avg Disk Write Queue length 3.7 Max Disk Read MB/s 46.1 Max Disk Write MB/s 3.15 Between 16:30-16:40: Max CPU % ranges between 1-8% Min RAM 17841MB (17GB) Network max bytes received around 1MB/s Network max bytes sent around 3.8MB/s Max pagefile % around 30% Max Avg Disk Read Queue length 4.5 Max Avg Disk Write Queue length 11 Max Disk Read MB/s 145 Max Disk Write MB/s 5 | |
Overall for the entire day: |