Anatomy of an ARM Server Chip

Posted: Published on September 2nd, 2014

This post was added by Dr Simmons

The server market has been a battlefield where ARM has been bloodied in recent years. The collapse of Calxeda at the end of 2013 was a body blow, but the ecosystem is bouncing back, not least from the popularity of the ARMv8 64-bit architecture.

It turns out four out of the top five companies that provide chips for enterprise networking and servers are working on ARM-based devices, and with key players like Cavium and AMD in the mix for the first time, there is plenty of optimism.

The 50th licensing agreement for the 64-bit-capable ARM Cortex-A50 processor family and ARMv8 architecture licenses since November 2011 includes these key infrastructure deployments coping with more complex applications within strict power budgets:

The details of the ARM Opteron A1100 show eight 64-bit cores with L3 cache on chip, with a lowly A5 core in the corner as an "SoC in an SoC" to handle the legacy peripherals. Meanwhile the heavy lifting for communications across devices and cards is handled by eight lanes of x8 PCI-Express and two 10GBase-KR Ethernet ports for a direct connection to the copper of the backplane. The focus on the memory and the peripherals makes it very different from the massively multicore networking devices.

Cavium has long been a MIPS house with its Octeon III CN7xxx family based on the MIPS64 architecture, but with the collapse of Calxeda it took on co-founder Larry Wikelius and former colleague Gopal Hegde. As a result we are seeing a family of processors with 24 to 48 64-bit A50 ARM cores on a 28 nm process due for the end of this year. Calvium says the ThunderX CN88XX family is the first ARM-based SoC that scales up to 48 cores with up to 2.5 GHz core frequency with 78k of I-Cache and 32k of D-Cache along with 16 MB of L2 cache. It is also lays claim to the first ARM-based SOC to be fully cache coherent across dual sockets using Cavium Coherent Processor Interconnect (CCPI), while the four DDR3/4 72-bit memory controllers are capable of supporting 2,400 MHz memories with 1 TB of memory in a dual-socket configuration

The Seattle chip is AMDs first 64-bit ARM-based processor and combines eight ARM Cortex-A57 cores running at 2 GHz and above. Seattle is a dense server processor for datacenter applications where the key driver is performance/dollar/watt, says AMD. One way to tackle this is to increase the instructions per clock (IPC) and reduce the high cache miss-rates so Seattle, with smaller cores and caches, can deliver the equivalent performance as traditional server processors with large cores and caches, but using much less power and area.

Click the design schematic below to see more images inside AMD's Seattle:

Cache optimization has been a focus for processor designers for decades, and there is no real surprise in the design of Seattle. Each of the A57 cores has a 48 kB 3-way set associative, parity protected instruction cache to feed the processing engine, with a 32 kB 2-way associative data cache. Each core then shares a 2 MB 16-way L2 cache with a neighbor.

The bonus from the latest 28 nm process technology is it then being able to put a massive 8 MB L3 16-way cache in the center of the chip with a snoop filter to ensure the L2 caches are coherent and minimize the hit from a cache miss or refresh. The wider memory subsystem then also becomes important, with two 64-bit DDR3/4 channels with ECC and each device supporting up to 128 Gbytes of DRAM to get the flow of data as stable as possible through the device and making the most of all those cycles.

AMD has taped out samples, and there is a demonstration board running an ARM-optimized version of Linux based on RHEL, Apache 2.4.6, MySQL 5.5.35, and PHP 5.4.16. The server was then used to host a WordPress blog that included streamable video. General availability is expected in the fourth quarter.

More here:
Anatomy of an ARM Server Chip

Related Posts
This entry was posted in Anatomy. Bookmark the permalink.

Comments are closed.