On Fri, 13 Jan 2012, Isaac Marino Bavaresco wrote:

> Em 13/1/2012 14:13, Herbert Graf escreveu:
>> On Fri, 2012-01-13 at 15:36 +0000, smplx wrote:
>>> I belive HT makes use of "dead time" during physical core intruction
>>> execution to execute virtual core instructions. i.e. say there is a sta=
ll
>>> in the physical core instruction execution pipeline (maybe the core nee=
ds
>>> to get the results of a previous instruction but they aren't ready yet)=
,
>>> then the core can use the "stall" to execute instructions that have
>>> nothing to do with the thread that is causing the stall, hence guarente=
ing
>>> (probably) that there is something that can executed.
>> Actually, HT is much more then that.
>>
>> HT is basically making one core look like 2 cores, the OS has no idea
>> that there aren't 2 physical cores (top would show 2 cores).
>>
>> To do this, there is some duplication in the core allowing for 2 threads
>> to run. Large parts of the core are shared by the 2 threads. The result
>> is certain parts of certain types of instructions can be run in parallel
>> (effectively 2 cores), while everything else causes a stall in the other
>> thread, effectively making it single core.
>>
>> The performance of HT is very application/thread dependant. Most of the
>> time it has some benefit, but it's pretty minimal, and NOWHERE near the
>> benefit you get from 2 actual cores.
>>
>> Note that while this SOUNDS very similar to what AMD is doing it's not.
>> AMD has 4 physical cores, but duplicates pretty much EVERYTHING integer
>> related, so that if you're running integer only code (the vast majority
>> of "regular" OS stuff and apps is mostly integer only) you have 2 cores.
>> You are only really impacted if you run FPU rich code. This used to be a
>> much bigger deal, but with the advent of very excellent GPUs, alot of
>> FPU work traditionally done by the CPU is now handled by the GPU
>> (obviously anything graphics related, and more and more general compute
>> stuff is being offloaded to the GPU).
>>
>> HT is nice, having 2 real(ish) cores is better.

Isn't there a diminishing return, as in, the more cores you have the more=20
they actually interfere with each others need to access main memory. I can=
=20
see where you would get to a point where cores are just starved and it=20
would be better to get them to do something else.

>
> The gains by using the stall time in one thread for another thread is
> not very large, but you must remember that the time wasted with context
> switching may be zero if running only two threads or be cut in half if
> running more threads. That may add some more percents to the speed.
>
> Isaac
>

I haven't been keeping up with this stuff so I'm not sure BUT isn't=20
main memory access a huge bottle neck? I thought modern CPUs got long=20
stalls whenever something had to be fetched from RAM?

Regards
Sergio Masci
--=20
http://www.piclist.com PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
.