Sunday, September 22, 2013

Potential Pitfalls with PLINQ (from Microsoft)

Highlights from Microsoft (thank you Microsoft!):

Potential Pitfalls with PLINQ

.NET Framework 4.5
In many cases, PLINQ can provide significant performance improvements over sequential LINQ to Objects queries. However, the work of parallelizing the query execution introduces complexity that can lead to problems that, in sequential code, are not as common or are not encountered at all. 
Do Not Assume That Parallel Is Always Faster


Parallelization sometimes causes a PLINQ query to run slower than its LINQ to Objects equivalent. The basic rule of thumb is that queries that have few source elements and fast user delegates are unlikely to speedup much.
Avoid Writing to Shared Memory Locations


Whenever multiple threads are accessing such variables concurrently, there is a big potential for race conditions. Even though you can use locks to synchronize access to the variable, the cost of synchronization can hurt performance. Therefore, we recommend that you avoid, or at least limit, access to shared state in a PLINQ query as much as possible.
Avoid Over-Parallelization


By using the AsParallel operator, you incur the overhead costs of partitioning the source collection and synchronizing the worker threads. The benefits of parallelization are further limited by the number of processors on the computer. There is no speedup to be gained by running multiple compute-bound threads on just one processor. Therefore, you must be careful not to over-parallelize a query.
The most common scenario in which over-parallelization can occur is in nested queries, as shown in the following snippet.
var q = from cust in customers.AsParallel()
        from order in cust.Orders.AsParallel()
        where order.OrderDate > date
        select new { cust, order };
In this case, it is best to parallelize only the outer data source (customers) unless one or more of the following conditions apply:
  • The inner data source (cust.Orders) is known to be very long.
  • You are performing an expensive computation on each order. (The operation shown in the example is not expensive.)
  • The target system is known to have enough processors to handle the number of threads that will be produced by parallelizing the query on cust.Orders.

Avoid Calls to Non-Thread-Safe Methods
Writing to non-thread-safe instance methods from a PLINQ query can lead to data corruption which may or may not go undetected in your program.
FileStream fs = File.OpenWrite(...);
a.Where(...).OrderBy(...).Select(...).ForAll(x => fs.Write(x));
Limit Calls to Thread-Safe Methods


Most static methods in the .NET Framework are thread-safe and can be called from multiple threads concurrently. However, even in these cases, the synchronization involved can lead to significant slowdown in the query.
Note Note
You can test for this yourself by inserting some calls to WriteLine in your queries. Although this method is used in the documentation examples for demonstration purposes, do not use it in PLINQ queries.
Avoid Unnecessary Ordering Operations


When PLINQ executes a query in parallel, it divides the source sequence into partitions that can be operated on concurrently on multiple threads. By default, the order in which the partitions are processed and the results are delivered is not predictable (except for operators such as OrderBy). You can instruct PLINQ to preserve the ordering of any source sequence, but this has a negative impact on performance. The best practice, whenever possible, is to structure queries so that they do not rely on order preservation. For more information, see Order Preservation in PLINQ.
Prefer ForAll to ForEach When It Is Possible


Although PLINQ executes a query on multiple threads, if you consume the results in a foreach loop (For Each in Visual Basic), then the query results must be merged back into one thread and accessed serially by the enumerator. In some cases, this is unavoidable; however, whenever possible, use the ForAll method to enable each thread to output its own results, for example, by writing to a thread-safe collection such asSystem.Collections.Concurrent.ConcurrentBag<T>.
The same issue applies to Parallel.ForEach In other words, source.AsParallel().Where().ForAll(...) should be strongly preferred to
Parallel.ForEach(source.AsParallel().Where(), ...).
Be Aware of Thread Affinity Issues


Some technologies, for example, COM interoperability for Single-Threaded Apartment (STA) components, Windows Forms, and Windows Presentation Foundation (WPF), impose thread affinity restrictions that require code to run on a specific thread. For example, in both Windows Forms and WPF, a control can only be accessed on the thread on which it was created. If you try to access the shared state of a Windows Forms control in a PLINQ query, an exception is raised if you are running in the debugger. (This setting can be turned off.) However, if your query is consumed on the UI thread, then you can access the control from the foreach loop that enumerates the query results because that code executes on just one thread.
Do Not Assume that Iterations of ForEach, For and ForAll Always Execute in Parallel


It is important to keep in mind that individual iterations in a Parallel.ForParallel.ForEach or ForAll<TSource> loop may but do not have to execute in parallel. Therefore, you should avoid writing any code that depends for correctness on parallel execution of iterations or on the execution of iterations in any particular order.
For example, this code is likely to deadlock:
ManualResetEventSlim mre = new ManualResetEventSlim();
            Enumerable.Range(0, ProcessorCount * 100).AsParallel().ForAll((j) =>
            {
                if (j == Environment.ProcessorCount)
                {
                    Console.WriteLine("Set on {0} with value of {1}", Thread.CurrentThread.ManagedThreadId, j);
                    mre.Set();
                }
                else
                {
                    Console.WriteLine("Waiting on {0} with value of {1}", Thread.CurrentThread.ManagedThreadId, j);
                    mre.Wait();
                }
            }); //deadlocks
In this example, one iteration sets an event, and all other iterations wait on the event. None of the waiting iterations can complete until the event-setting iteration has completed. However, it is possible that the waiting iterations block all threads that are used to execute the parallel loop, before the event-setting iteration has had a chance to execute. This results in a deadlock – the event-setting iteration will never execute, and the waiting iterations will never wake up.
In particular, one iteration of a parallel loop should never wait on another iteration of the loop to make progress. If the parallel loop decides to schedule the iterations sequentially but in the opposite order, a deadlock will occur.

No comments: