Archive

Posts Tagged ‘architecture’

A Primitive Thought on Heterogeneous and Many Core Architecture

November 23rd, 2010 3 comments

Several years ago, when Intel tried to increase CPU frequency over 4GHz, some obstacles encountered. The main problem is that power and temperature increase rapidly. The commonly used air/fan cooling does not work. It is not feasible to force desktop computer users to use air-conditioner or water cooling to cool down CPU. To overcome this obstacle, Intel integrates more cores on one die. If two or more cores work together, workload of every core is decline thereby the frequency is down.

Figure 1: Performance and Application range

This method works well now. But it is not the final solution. Current commodity computers have to address more complex situations. Even if we integrate loads of same type of cores together, applications beyond center of normal distribution still cannot have a good performance (Figure 1, Red curve). This is the shortage of homogeneous architecture. We can see that different processors are suitable for different situations. Such as: CPU for common computation, GPU for graph processing. What if we combine different type of processors together? The answer is yellow curve in figure 1. Each processor has high throughput in particular area(Blue curves in figure 1). When they are united, application range is wider and performance is higher. Heterogeneous architecture is indubitable more competitive than homogeneous in multi-core design.

A professor from Technion gave us a primitive thought on heterogeneous architecture last Monday in Tsinghua University. The main points are: cache/share memory design, multi-core or multi-thread and scheduling in operating system.

Figure 2: City of Nahalal

Figure 3: circular cache/share memory

Previously we put core and share memory in separate place. However, it is not a good way to place cores together in multi-core architecture design. If a core wants to access data far from it (that means data has to go through more nodes between cores), that needs lots of time. A brilliant idea comes from a city named Nahalal (Figure 2). In Nahalal, factories and products serve in the middle of city. Residents live around center of city. They have their own farms around their house. The share memory in computer is like products in Nahalal. Every core needs it. And every core has its own private data which is similar to farms around house. A new model is proposed as figure 3.

With this kind of memory location, each core accesses data in share memory easily and private data area gives them more flexibility.

Figure 4: Performance and core number

The second is about multi-core and multi-thread selection. The difference between multi-core and multi-thread, in short, is that the former has shared cache but the latter doesn’t have. Multi-thread only uses private data and its own status register to maintain data and status. Multi-thread is a form of many cores. As shown in figure 4, when core number increase, performance first increases and then decreases. At last, performance increases again. There are three distinct stages in this curve. In the first stage, performance increase as core number increases. This is normal situation. But at some point, when we increase core number, locality of cache cannot be guaranteed. Cache is useless thus performance declines. This is the second stage of curve. At last stage, core number is large enough to conceal memory request latency so that performance increases again. A better way to solve performance loosing in the second stage is changing cache to share memory at the beginning of it.

The last thing is about process scheduling in operating system level. I didn’t catch the point, nothing to say. :)