Features of AMD Piledriver processors

A few days ago AMD posted new technical document, titled Software Optimization Guide for AMD Family 15h Processors (PDF file). Although the guide is intended for developers, it includes some useful information about AMD Family 15h, i.e. Bulldozer. The document not only describes features of current Bulldozer processors, which have model numbers 00h - 0fh (0xh), but also refers to two future generations with model numbers 10h - 1fh (1xh), and 20h - 2fh (2xh). We believe that microprocessors with 10h and higher model numbers will be based on Piledriver cores.

The optimization guide is quite large. References to different features are scattered across the document, so we did our best to find all relevant information. There will be some features, applicable to both 1xh and 2xh CPUs, such as support for 16-bit floating point numbers, and addition of VCVTPH2PS and VCVTPS2PH instructions, used to convert to and from new 16-bit floating-point type. The processors will also incorporate FMA3, Bit Manipulation Instructions (BMI) and Trailing Bit Manipulation instructions, or TBM. In addition to this, there will be some other improvements, such as increased depth of FP load queue, and larger size of level 1 data TLB. Also, latencies of some instructions were reduced.

Microprocessors with model 1xh will have up to 2 modules, or 4 cores, and will lack L3 cache. These characteristics match upcoming Trinity core, that will be utilized by mobile AMD chips. Model 1xh CPUs will also have enhanced IOMMU, or IOMMU v2, that will improve access of I/O devices to system memory, and will add such features as direct access to user I/O space, and interrupt remapping and filtering.

Processors with model number 2xh will have up to 5 modules, or up 10 cores, and support quad-channel DDR3 memory. Like Bulldozer CPUs, which are aimed at both desktop and server markets, 2xh may also target both markets. We suspect that 2xh parts will be released as "Vishera" CPUs on desktop, and "Terramar" and "Sepang" processors for servers. It is possible that server Terramar Opterons, that will integrate two dies on a chip and have up to 20 cores, will support 8 memory channels.

The optimization guide also mentions model 30h - 3fh, and 40h - 4fh processors, however it doesn't contain any details on these chips. The summary of all features, referenced by the optimization guide, is provided below:

FeatureModel 0xhModel 1xhModel 2xh
CoreZambezi /
Interlagos /
Valencia
TrinityVishera /
Terramar /
Sepang
L3 cachePresentNonePresent
16-bit floating point typeNot supportedSupportedSupported
F16C (VCVTPH2PS and VCVTPS2PH)Not supportedSupportedSupported
FMA3Not supportedSupportedSupported
BMI instructionsNot supportedSupportedSupported
TBM instructionsNot supportedSupportedSupported
Size of L1 data TLB32 entries64 entries64 entries
Max number of cores8410
DDR3 channels224
Depth of FPU load queue404444
HyperTransport Assist featureSupportedNot supportedSupported
IOMMUv1v2not specified

Related News (newer articles):
Related News (older articles):
Comments: 11

FMA

2012-01-11 10:57:59
Posted by: gallier2

It would be good to add that the FMA referenced here is the FMA3 instruction, as BD already has FMA4.

 

2012-01-11 14:13:56
Posted by: gshv

Thank you for correction! I FMA to FMA3 in the summary table. The article itself already had correct instruction name.

I?

2012-01-11 11:08:18
Posted by: Anton Markov

This is native Next Generation Bulldozer based on new FM2 socket/chipset platform, with a big changes of work with RAM; new instructions included; new faster L cache memories of every levels and peak up tu 1.5+TFLOPs for oldest model when turbo core 3.0 is enabled...After that in 2014 will have AMD Excavator with X128 support and quad-channel DDR4...

IOMMUv2

2012-01-12 21:12:39
Posted by: seronx

Is on 20h-2Fh

 

2012-01-12 23:41:26
Posted by: gshv

You may be right. Optimization guide doesn't say anything about model 20h - 2fh CPUs, it only says: "Some AMD Family 15h processors (example: model 10h-1Fh) include an enhanced IOMMU ...". I changed the table to say that the IOMMU version for 2xh processors is not specified.

 

2012-01-20 23:21:32
Posted by: Seronx

Okay

HT Assist doesn't exist in 10h-2Fh

IOMMU is only 20h-2Fh do to SCH

Same virtual memory space

2012-09-22 20:12:34
Posted by: mmarq

IOMMU v2 already exists on Trinity because its the feature that allows have the same memory coherency (not cache)between the CPU and the GPU.

To remember on this systems the all system DRAM is on the side of the APU/CPU. This accounting the same feature between APU/CPU and Sea Islands discrete adapters.

Its not yet "full" HSA because this architecture foresees integrated hardware scheduling, and in future AMD even the same interrupt ASIC engines for preemption.... among other things. To reference it will be called HSA MMU this feature, means the GPGPUs will be fully virtualizable for VMM environments (better than Nvidia will do with Big Kepler i suspect).
(http://)hsafoundation.com/hsa-developer-tools/

All IOMMU so far according to the reference manual, deals with Hypertransport packets not PCIe, meaning the integrated Northbridge in Trinity is Hypertransport based.
(http://)developer.amd.com/Resources/documentation/articles/Pages/12212005112.aspx
(http://)support.amd.com/us/Embedded_TechDocs/34434-IOMMU-Rev_1.26_2-11-09.pdf

Vishera as with earlier chipsets, 790/890FX, will have the the IOMMU on the mobo NorthBridge hub.

Piled river should have better single thread performance

2012-01-13 18:00:06
Posted by: Vincent O.

The data cache L1 should increase from 16 Kb to 32 Kb per core in the Piledriver. This would increase single thread performance. The prefetch has to be bigger otherwise the CPU is like a beast with very tiny claws. If there was a way to partner the cores in pairs so that one of them 'turns off' and boosts the other by allocating it's 64 Kb (not 32 Kb), this would result in 128 Kb (64/64 data instruction) and have new Turbo technology on top of that with reduced latency and quad Ram support with options to downgrade to dual channel via bios would make the Piledriver a winner. Let's hope AMD listens to consumer feedback because I am ready to jump ship.

FM2 PCI-E 3.0 slots dual 16X or 8X 8X?

2012-01-22 19:22:24
Posted by: Kilobit

My question is since all of the current CPUs with integrated graphics are limited a single PCI-E slot running at 16X or dual at 8X 8X on the motherboard, is it possible that the Vishera cores without integrated graphics will have a different CPU socket/motherboard? I kind of hope they would because this has kept me from purchasing any CPUs with integrated graphics because Im a PC gamer and there is a big difference between 2 cards both@16x and two cards @8x.

 

2012-01-30 03:37:17
Posted by: Gen

If Vishera does indeed have quad channel memory support, a new socket will be almost guaranteed.

Vishera to feature "4" DDR3 channels...

2012-02-03 09:40:43
Posted by: nt300

Not really, 4-8 core Piledriver Cores are based on Socket AM3+ up until late 2013.
Then we will get Excavator Cores based on Socket FM2 in 2014.
Also they can easily do 4 DDR3 channels on Socket AM3+ due to it being an IMC.

Anyhow it has been said these new Piledriver Cores should be approx: 25% faster than Phenom II's clock for clock. Looks like AMD's been tweaking the design even before Bulldozer's release.

Terms and Conditions · Privacy Policy · Contact Us (c) Copyright 2003 - 2010 Gennadiy Shvets

Search CPU-World

Search site contents:

Identify part

Identify CPU, FPU or MCU: