|
From: | Jonathan Kinsey |
Subject: | Re: [Bug-gnubg] Re: Getting gnubg to use all available cores |
Date: | Sun, 16 Aug 2009 12:09:51 +0000 |
Hi All, I'm just back from a holiday. The limit is just an arbitrary number (so can just be increased if someone has more than 16 cores - there are some new 64 core boxes coming out but they're not really consumer level). I think the limit is just to avoid some small fixed memory overheads in the code. Jon Christian Anthon wrote: > Hi Michael, > > thx for investigating this. My answer would have been along the lines > of "beyond our control". Jon coded most of the threading stuff and I > believe that the MAX_NUMTHREADS is indeed somewhat arbitrary. However > I believe there is a bit of memory consumption and possible also extra > cpu time involved in setting it higher. Hopefully Jon will pitch in. > > Christian. > > On Fri, Aug 7, 2009 at 6:55 AM, Michael Petch wrote: >> Howdy Louis, >> >> I think that MAX_NUMTHREADS was an artificial limit set by the hardware of >> the day. Christian can likely tell you why it is 16 specifically but I am >> assuming that it was a someone arbitrary(and reasonable) value based on >> cores available on most systems. >> >> Onto your OS/X issue. I did a bit of research and my original view on >> waiting for Snow Leopard may actually be all that is required. >> >> Nehalem processors diverge from the previous generation of Intel processors >> because they no longer based on SMP (Symmetric MultiProcessor) designs. In >> an SMP system, generally all processors have access to main memory (RAM) via >> a single data bus. The problem of course is that the more cores you have, >> the more contention for memory read/writes that have to occur on that one >> bus. >> >> Intel decided that SMP designs likely will not scale properly in the future >> when dealing with large core counts (32, 62, 128 cores etc) so they moved >> their Nehalem design to NUMA type systems instead of SMP. NUMA is non >> uniform memory access. In this type of design cores may not necessarily be >> able to share memory with other processors without some help. I'm nto going >> to get into the gorey details but the bus system Intel is pushing is the QPI >> (QuickPath interconnect) bus. This literally replaces the good old FSB >> (Front side Bus) >> >> NUMA architectures do allow for the concept of "Remote" and "Local" data. >> Shared data may not be directly available by a processor but it can be >> retrieved (remotely) but it will be slower. Operating System Kernels need >> NUMA support in order for shared data access on different buses to work >> properly. >> >> So your asking, why tell me all this? Well the answer is simple. Apple in >> their infinite wisdom started using new QPI/Numa hardware without actually >> fully implementing NUMA in its current kernel! This hasn't been well >> documented by Apple but it was discovered when companies started running >> Xserve on the new QPI/Nehalem systems. >> >> Without proper NUMA support, processors can't arbitrarily share memory with >> all other processors. Which seems to be the case here with GnuBG. Gnubg >> launches in a single process and then asks the OS/X to create threads (with >> shared memory requirements). It appears by default that each processor is >> considered as a separate entity without sharing (On OS/X Leopard). The >> exception is that eacg core appears as 2 virtual cores. Virtual cores are on >> the same processor, thus the same bus so one can share memory across them. >> >> It seems when Gnubg launches, all the threads are created on one processor >> (the processor is originally chosen by OS/X) and accessible by 2 virtual >> cores (Using Hyperthreading). It seems Apple did this so they could put out >> new equipment before the next OS (Snow Leopard) was released. >> >> So what does Snow Leopard have that Leapard doesn't? NUMA support. >> >> My guess is that if you got your hands on Snow Leopard you may find that >> what you are seeing changes. Apparently this very problem exists for people >> using CS4 (Adobes Creative Studio 4). >> >> Linux supports NUMA, you might be adventuresome and try to install Linux on >> your Apple Hardware and see what happens. >> >> Your chess program may work because of the way it splits up tasks (It may >> even use a combination of Posix Threads and separate process spaces). I >> haven't seen the source code so its very hard to say. >> >> Michael Petch >> >> On 06/08/09 10:29 AM, "Louis Zulli" wrote: >> >>> Hi, >>> >>> I put >>> >>> #define MAX_NUMTHREADS 64 >>> >>> in multithread.h and rebuilt. >>> >>> In Settings-->Options-->Other, I put Eval Threads to 64. >>> >>> I then let gnubg analyze a game using 4-ply analysis. >>> >>> According to my unix top command, gnubg had 69 threads and was using >>> 188%CPU. So apparently all the threads were running (into each other!) >>> in one physical core. >>> >>> In any case, increasing the max number of threads above 16 seems >>> trivial to do, unless I'm missing something. >>> >>> Louis >>> >>> >>> On Aug 6, 2009, at 11:34 AM, Ingo Macherius wrote: >>> >>>> Do you use the calibrate command or a batch analysis of matchfiles? >>>> The >>>> former was shown to be of no value for benchmarks, see here: >>>> http://lists.gnu.org/archive/html/bug-gnubg/2009-08/msg00006.html >>>> >>>> With calibrate I had the very same effect of high idle times during >>>> benchmarks, unless I used at least 8 threads per physical core. >>>> >>>> I am doing benchmark on a 4 core machine which iterates over #thread >>>> (1..6) >>>> and cache size (2^1 .. 2^27). Should be posted in say 3 hours, it >>>> literally >>>> is still running :) >>>> >>>> Ingo >>>> >>>>> -----Original Message----- >>>>> From: address@hidden >>>>> [mailto:address@hidden On >>>>> Behalf Of Louis Zulli >>>>> Sent: Thursday, August 06, 2009 3:21 PM >>>>> To: Michael Petch >>>>> Cc: address@hidden >>>>> Subject: [Bug-gnubg] Re: Getting gnubg to use all available cores >>>>> >>>>> >>>>> >>>>> On Aug 5, 2009, at 4:02 PM, Michael Petch wrote: >>>>> >>>>>> I'm unsure how the architecture is deployed and how OS/X >>>>> handles the >>>>>> physical cores, but it almost sounds like one Physical core is being >>>>>> used >>>>>> (Using Hyperthreads to run 2 threads simultaneously). I wonder if >>>>>> the memory >>>>>> is shared across all the cores? A friend of mine was >>>>> suggesting that >>>>>> people >>>>>> may have to wait for Snow Lapard to come out before OS/X properly >>>>>> utilizes >>>>>> the Nehalem architecture (whetehr that si true or not, I >>>>> don't know). >>>>>> Anyway, as an experiment. If you run 2 copies of Gnubg at the same >>>>>> time >>>>>> (using multiple threads) do you get 400% CPU usage? >>>>>> >>>>> >>>>> Hi Mike, >>>>> >>>>> Sorry for the delay. I just had two copies of gnubg analyze the same >>>>> game, using 3 ply analysis. Each instance of gnubg used 200% >>>>> CPU. Each >>>>> copy was set to use 4 evaluation threads. >>>>> >>>>> So what's the verdict here? Is Leopard simply not directing threads >>>>> correctly? >>>>> >>>>> Louis >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bug-gnubg mailing list >>>>> address@hidden http://lists.gnu.org/mailman/listinfo/bug-gnubg >> >> >> >> _______________________________________________ >> Bug-gnubg mailing list >> address@hidden >> http://lists.gnu.org/mailman/listinfo/bug-gnubg >> > > > _______________________________________________ > Bug-gnubg mailing list > address@hidden > http://lists.gnu.org/mailman/listinfo/bug-gnubg > > Celebrate a decade of Messenger with free winks, emoticons, display pics, and more. Get Them Now |
[Prev in Thread] | Current Thread | [Next in Thread] |