An extend power outage in the West Eugene Area took down the server environments for several clients. The servers were either shut down or powered off due to the power failure. The environment contains two Server 2003 devices, an SBS and a terminal server. The SBS is connected to two ISCSI targets, one for company data and one for backups.
Problem Being Addressed:
When the power returned, the company data was unavailable. Network users got an error when trying to connect to their mapped shares. On the server, the mapped share was also available, and referred to a drive that was not mounted and not visible in explorer.
The Approach Taken:
This problem needed to be approached from the outside in. First was verifying connectivity for the workstations. They had no issues accessing other resources on the server, so it was easy to rule out workstation/server connectivity. The examination moved to the server, where it was determined that the data location was missing. Closer inspection revealed that there were two ISCSI targets attached to the system, but only one was showing up. The missing target was pingable, and its web UI was accessible, meaning the device was online. Its ISCSI services were started and running properly.
The problem here was two-fold. First, the windows server had hung while connecting to the second target. When viewing the ISCSI initiator the target in question had a status of “reconnecting”, but the process never finished. Clicking “connect” had no effect. The current hung connection session had to be logged off (from an advanced window in the initiator) to free up the target and initiator for another attempt. Once this was done, the connection was able to be re-established, and the drive was quickly available to the server.
The second issue was a limitation of Server 2003. When server 2003 boots, it checks all of its shared files. If the location for any of those is inaccessible, it removes that as a network share. This means that the sharing settings needed to be reconfigured by hand to allow users to again access the network. We set all folders to access for “Authenticated Users”, and followed up with management to ask if any folders needed restricted access.
Things We Would Do Differently:
In network topographies where data storage is spread across multiple devices, there is a higher risk of reconnection issues with then devices are rebooted for any reason. This is especially true when all devices are removed. It’s very possible that the server tried to establish a connection with the target before the target was fully online, thereby creating the hung connection scenario. In the future, ensuring the storage locations are fully online, accessible, with all services started (using the webUI) before powering on the windows server would reduce the likelihood of this situation in the future.