Assume that you have a multi-threaded Windows service which performs lots of different operations which takes a fair share of time, e.g. extracting data from different data stores, parsing said data, posting it to an external server etc. Operations may be performed in different layers, e.g. application layer, repository layer or service layer.
At some point in the lifespan of this Windows service you may wish to shut it down or restart it by way of services.msc, however if you can't stop all operations and terminate all threads in the Windows service within the timespan that services.msc expects to be done with the stop procedure, it will hang and you will have to kill it from Task Manager.
Because of the issue mentioned above, my question is as follows: How would you implement a fail-safe way of handling shutdown of your Windows service? I have a volatile boolean that acts as a shutdown signal, enabled by OnStop() in my service base class, and should gracefully stop my main loop, but that isn't worth anything if there is an operation in some other layer which is taking it's time doing whatever that operation is up to.
How should this be handled? I'm currently at a loss and need some creative input.
After more research and some brainstorming I came to realise that the problems I've been experiencing were being caused by a very common design flaw regarding threads in Windows services.
The design flaw
Imagine you have a thread which does all your work. Your work consists of tasks that should be run again and again indefinitely. This is quite often implemented as follows:
volatile bool keepRunning = true;
Thread workerThread;
protected override void OnStart(string[] args)
{
workerThread = new Thread(() =>
{
while(keepRunning)
{
DoWork();
Thread.Sleep(10 * 60 * 1000); // Sleep for ten minutes
}
});
workerThread.Start();
}
protected override void OnStop()
{
keepRunning = false;
workerThread.Join();
// Ended gracefully
}
This is the very common design flaw I mentioned. The problem is that while this will compile and run as expected, you will eventually experience that your Windows service won't respond to commands from the service console in Windows. This is because your call to Thread.Sleep() blocks the thread, causing your service to become unresponsive. You will only experience this error if the thread blocks for longer than the timeout configured by Windows in HKLM\SYSTEM\CurrentControlSet\Control\WaitToKillServiceTimeout, because of this registry value this implementation may work for you if your thread is configured to sleep for a very short period of time and does it's work in an acceptable period of time.
The alternative
Instead of using Thread.Sleep() I decided to go for ManualResetEvent and System.Threading.Timer instead. The implementation looks something like this:
OnStart:
this._workerTimer = new Timer(new TimerCallback(this._worker.DoWork));
this._workerTimer.Change(0, Timeout.Infinite); // This tells the timer to perform the callback right now
Callback:
if (MyServiceBase.ShutdownEvent.WaitOne(0)) // My static ManualResetEvent
return; // Exit callback
// Perform lots of work here
ThisMethodDoesAnEnormousAmountOfWork();
(stateInfo as Timer).Change(_waitForSeconds * 1000, Timeout.Infinite); // This tells the timer to execute the callback after a specified period of time. This is the amount of time that was previously passed to Thread.Sleep()
OnStop:
MyServiceBase.ShutdownEvent.Set(); // This signals the callback to never ever perform any work again
this._workerTimer.Dispose(); // Dispose of the timer so that the callback is never ever called again
The conclusion
By implementing System.Threading.Timer and ManualResetEvent you will avoid your service becoming unresponsive to service console commands as a result of Thread.Sleep() blocking.
PS! You may not be out of the woods just yet!
However, I believe there are cases in which a callback is assigned so much work by the programmer that the service may become unresponsive to service console commands during workload execution. If that happens you may wish to look at alternative solutions, like checking your ManualResetEvent deeper in your code, or perhaps implementing CancellationTokenSource.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments