Em 13/1/2012 20:54, smplx escreveu:
>
> On Fri, 13 Jan 2012, Isaac Marino Bavaresco wrote:
>
>> Em 13/1/2012 14:13, Herbert Graf escreveu:
>>> On Fri, 2012-01-13 at 15:36 +0000, smplx wrote:
>>>> I belive HT makes use of "dead time" during physical core intruction
>>>> execution to execute virtual core instructions. i.e. say there is a st=
all
>>>> in the physical core instruction execution pipeline (maybe the core ne=
eds
>>>> to get the results of a previous instruction but they aren't ready yet=
),
>>>> then the core can use the "stall" to execute instructions that have
>>>> nothing to do with the thread that is causing the stall, hence guarent=
eing
>>>> (probably) that there is something that can executed.
>>> Actually, HT is much more then that.
>>>
>>> HT is basically making one core look like 2 cores, the OS has no idea
>>> that there aren't 2 physical cores (top would show 2 cores).
>>>
>>> To do this, there is some duplication in the core allowing for 2 thread=
s
>>> to run. Large parts of the core are shared by the 2 threads. The result
>>> is certain parts of certain types of instructions can be run in paralle=
l
>>> (effectively 2 cores), while everything else causes a stall in the othe=
r
>>> thread, effectively making it single core.
>>>
>>> The performance of HT is very application/thread dependant. Most of the
>>> time it has some benefit, but it's pretty minimal, and NOWHERE near the
>>> benefit you get from 2 actual cores.
>>>
>>> Note that while this SOUNDS very similar to what AMD is doing it's not.
>>> AMD has 4 physical cores, but duplicates pretty much EVERYTHING integer
>>> related, so that if you're running integer only code (the vast majority
>>> of "regular" OS stuff and apps is mostly integer only) you have 2 cores=
..
>>> You are only really impacted if you run FPU rich code. This used to be =
a
>>> much bigger deal, but with the advent of very excellent GPUs, alot of
>>> FPU work traditionally done by the CPU is now handled by the GPU
>>> (obviously anything graphics related, and more and more general compute
>>> stuff is being offloaded to the GPU).
>>>
>>> HT is nice, having 2 real(ish) cores is better.
> Isn't there a diminishing return, as in, the more cores you have the more=
=20
> they actually interfere with each others need to access main memory. I ca=
n=20
> see where you would get to a point where cores are just starved and it=20
> would be better to get them to do something else.
>
>> The gains by using the stall time in one thread for another thread is
>> not very large, but you must remember that the time wasted with context
>> switching may be zero if running only two threads or be cut in half if
>> running more threads. That may add some more percents to the speed.
>>
>> Isaac
>>
> I haven't been keeping up with this stuff so I'm not sure BUT isn't=20
> main memory access a huge bottle neck? I thought modern CPUs got long=20
> stalls whenever something had to be fetched from RAM?
>
> Regards
> Sergio Masci


Don't forget the cache memory, it relieves a lot the pressure over the RAM.
Also, the context switching requires a lot of updating to the MMU registers=
..

Isaac

--=20
http://www.piclist.com PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
.