The inevitable release of Nvidia’s next generation 16nm Pascal graphics cards with stacked high bandwidth memory is inching closer. Only three weeks ago we discovered four different Nvidia Pascal graphics cards being shipped across Nvidia's testing facilities. Today we're reporting on two more graphics boards. One that's entirely new and another that's an updated design that may be progressing towards full functionality soon.
[UPDATED 03/09/2016 04:17 PM EST] Two additional Nvidia Pascal graphics cards were spotted, with a value per unit of $1100 and $700. Additional details and information has been added after the sixth paragraph.
We reported two weeks ago that Nvidia is rumored to demo its next generation Pascal graphics cards at GTC in April with a product launch at Computex in June. Whispers have reached us of Nvidia planning to showcase a Pascal graphics card at the show for the very first time.
A source claimed that this will take place on April 5th during Nvidia’s CEO Jen-Hsun Huang’s keynote. A pascal graphics board will allegedly be showcased on stage during the keynote. We were told that it’s not just going to be a prototype to visually demonstrate the form factor like last year, but an actual working Pascal graphics card.
Two New Nvidia Pascal Graphics Cards Spotted, Valued At $900 & $600 - Potential GTC Demo Units
Our two new Nvidia graphics boards are listed as "COMPUTER GRAPHICS CARDS" in Nvidia's shipping description. Both carry hefty per unit values. The first is valued at $600 and the second at $900. So we're potentially looking at high-end graphics cards here. However, it's still important to note that these values don't always accurately reflect actual product pricing and can be vastly different from actual product cost and selling price.
Both boards start with the same 699 serial number. We've pointed out in a previous article that the earliest record of a board carrying that serial number appears in December. So we know that we're looking at Nvidia graphics boards that are new and did not exist at any point before December. This could potentially explain Pascal's absence from CES in January if no Pascal graphics cards were ready at the time. Which led to Nvidia's decision to showcase the Pascal Drive PX2 module with Maxwell GPUs instead.
The two new boards are as follows :
Serial Number | Value Per Unit |
---|---|
699-1G610-0000-000 | $600 |
699-12914-0076-100 | $900 |
The second entry is one that we've seen before, all be it with a slightly different serial number. The card we had seen earlier was shipped in February and had the following serial number : 699-12914-0071-100 and carried a significantly lower value of $500, vs the new iteration which is listed at $900. This indicates that in all likelihood more components have been added to the board and it's inching closer to full operational capacity. Hopefully in time for the upcoming GPU Technology Conference in April.
________________________________
[UPDATED 03/09/2016 04:17 PM EST]
After digging a little bit deeper we've spotted two additional boards. These graphics cards were shipped late last month and carry the following serial numbers.
699-1G411-0000-000
699-12914-0000-100
The first graphics card has actually been spotted once before and carried a lower per unit value of $600. The second graphics card is the third that we've seen with the 12914 serial number. However all three still had unique 4 digit strings and are likely variations of the same unit. These boards have also appeared under three completely different per unit values. From the initial listing at $600 to its second appearance at $1100 and last quote at $700. However, again we would like to remind everyone that these value figures have little relevance to actual product pricing and are quoted for insurance purposes. Their wide variability makes it incredibly difficult to draw any solid conclusions about actual cost.
________________________________
Could These Nvidia Pascal Graphics Cards Be GTC Demo Units?
Interestingly, all of Nvidia’s scheduled talks at GTC start with one or two alphabets and the digit six. That is they all follow this formula X6###. Where X is one or two letters, six is constant and # is a variable number. Keeping this in mind, the opening keynote of Nvidia’s CEO is given the variable 699.
As it happens, all six Nvidia graphics cards that have appeared in shipping records carried this very same serial number, matching that of Jen-Hsun’s keynote. These digits could be a code name for Pascal inside Nvidia which is why we're seeing them on these graphics cards and Jen-Hsun's keynote. Whatever they actually stand for we know we've seen them enough times to know that it's not a coincidence.
There's no way of knowing for certain whether these are GP100 or GP104 boards as of yet. Interestingly GP100 or “Big Pascal” as we’d like to call it has been spotted a few months back. Back then Nvidia only had GPUs but there was no evidence of any actual boards. So looks like Pascal has come a long way since then.
What we know so far about Nvidia's flagship Pascal GP100 GPU :
- Pascal graphics architecture.
- 2x performance per watt estimated improvement over Maxwell.
- To launch in 2016, purportedly the second half of the year.
- DirectX 12 feature level 12_1 or higher.
- Successor to the GM200 GPU found in the GTX Titan X and GTX 980 Ti.
- Built on the 16nm FinFET manufacturing process from TSMC.
- Allegedly has a total of 17 billion transistors, more than twice that of GM200.
- Will feature four 4-Hi HBM2 stacks, for a total of 16GB of VRAM and 8-Hi stacks for up to 32GB for the professional compute SKUs.
- Features a 4096-bit memory bus interface, same as AMD's Fiji GPU power the Fury series.
- Features NVLink (only compatible with next generation IBM PowerPC server processors)
- Supports half precision FP16 compute at twice the rate of full precision FP32.
GPU Architecture | NVIDIA Fermi | NVIDIA Kepler | NVIDIA Maxwell | NVIDIA Pascal |
---|---|---|---|---|
GPU Process | 40nm | 28nm | 28nm | 16nm (TSMC FinFET) |
Flagship Chip | GF110 | GK210 | GM200 | GP100 |
GPU Design | SM (Streaming Multiprocessor) | SMX (Streaming Multiprocessor) | SMM (Streaming Multiprocessor Maxwell) | SMP (Streaming Multiprocessor Pascal) |
Maximum Transistors | 3.00 Billion | 7.08 Billion | 8.00 Billion | 15.3 Billion |
Maximum Die Size | 520mm2 | 561mm2 | 601mm2 | 610mm2 |
Stream Processors Per Compute Unit | 32 SPs | 192 SPs | 128 SPs | 64 SPs |
Maximum CUDA Cores | 512 CCs (16 CUs) | 2880 CCs (15 CUs) | 3072 CCs (24 CUs) | 3840 CCs (60 CUs) |
FP32 Compute | 1.33 TFLOPs(Tesla) | 5.10 TFLOPs (Tesla) | 6.10 TFLOPs (Tesla) | ~12 TFLOPs (Tesla) |
FP64 Compute | 0.66 TFLOPs (Tesla) | 1.43 TFLOPs (Tesla) | 0.20 TFLOPs (Tesla) | ~6 TFLOPs(Tesla) |
Maximum VRAM | 1.5 GB GDDR5 | 6 GB GDDR5 | 12 GB GDDR5 | 16 / 32 GB HBM2 |
Maximum Bandwidth | 192 GB/s | 336 GB/s | 336 GB/s | 720 GB/s - 1 TB/s |
Maximum TDP | 244W | 250W | 250W | 300W |
Launch Year | 2010 (GTX 580) | 2014 (GTX Titan Black) | 2015 (GTX Titan X) | 2016 |
We've learned last year that Nvidia’s flagship Pascal code named GP100 may have taped out on TSMC’s 16nm FinFET manufacturing process in June. Interestingly just shortly afterwards AMD announced that it had taped out two FinFET chips. It’s absolutely not a coincidence that both companies completed their FinFET designs at the same time. Both are pushing for a very aggressive time to market timetable to debut their next generation FinFET based GPUs this year.
Word On The Street Is That We Might See The First Pascal Graphics Cards Launch In June - Mobility Versions To Come First
This one comes directly from sweclockers.com where the site has claimed on two occasions over the past few weeks that Nvidia is planning to launch its very first lineup of Pascal graphics cards around Computex in June. This launch will specifically be for the mobility lineup going into gaming notebooks. Swerclockers makes no mention of when we should expect desktop Pascal graphics cards but the site goes on to claim that Nvidia is facing challenges bringing Pascal up to speed on TSMC's 16nm FinFET which they say may throw a wrench in the plans and result in postponement.
The plan to introduce the mobility lineup in mid June has reportedly been set in motion but could face delays owing to the ambiguity of Pascal's readiness. As such the probability of a paper launch in Computex or a postponement the launch entirely to a later date is described as being "great" the site reports.
Our take is that the reports of Nvidia wanting to launch its chips on the mobile side first are likely grounded in reality. The company will want to deliver mobile Pascal products on time for the OEMs' product refresh cycle before they roll out new products for the back to school season which spans July to September.
To a great extent a similar limitation does not exist for desktop PCs for a variety of factors. For one the AIB market commands the lion's share of the desktop graphics market. Additionally OEMs have much greater flexibility switching out graphics cards in their desktop products. This means that we might be looking at market availability of desktop Pascal graphics cards around Q3 to Q4 of this year.
Nvidia's Pascal : Everything We Know Right Now
We've learned last year that Nvidia’s flagship Pascal code named GP100 may have taped out on TSMC’s 16nm FinFET manufacturing process in June. Interestingly just shortly afterwards AMD announced that it had taped out two FinFET chips. It’s absolutely not a coincidence that both companies completed their FinFET designs at the same time. Both are pushing for a very aggressive time to market timetable to debut their next generation FinFET based GPUs this year.
What we know so far about Nvidia's flagship Pascal GP100 GPU :
- Pascal graphics architecture.
- 2x performance per watt estimated improvement over Maxwell.
- To launch in 2016, purportedly the second half of the year.
- DirectX 12 feature level 12_1 or higher.
- Successor to the GM200 GPU found in the GTX Titan X and GTX 980 Ti.
- Built on the 16nm FinFET manufacturing process from TSMC.
- Allegedly has a total of 17 billion transistors, more than twice that of GM200.
- Will feature four 4-Hi HBM2 stacks, for a total of 16GB of VRAM and 8-Hi stacks for up to 32GB for the professional compute SKUs.
- Features a 4096-bit memory bus interface, same as AMD's Fiji GPU power the Fury series.
- Features NVLink (only compatible with next generation IBM PowerPC server processors)
- Supports half precision FP16 compute at twice the rate of full precision FP32.
GPU Architecture | NVIDIA Fermi | NVIDIA Kepler | NVIDIA Maxwell | NVIDIA Pascal |
---|---|---|---|---|
GPU Process | 40nm | 28nm | 28nm | 16nm (TSMC FinFET) |
Flagship Chip | GF110 | GK210 | GM200 | GP100 |
GPU Design | SM (Streaming Multiprocessor) | SMX (Streaming Multiprocessor) | SMM (Streaming Multiprocessor Maxwell) | SMP (Streaming Multiprocessor Pascal) |
Maximum Transistors | 3.00 Billion | 7.08 Billion | 8.00 Billion | 15.3 Billion |
Maximum Die Size | 520mm2 | 561mm2 | 601mm2 | 610mm2 |
Stream Processors Per Compute Unit | 32 SPs | 192 SPs | 128 SPs | 64 SPs |
Maximum CUDA Cores | 512 CCs (16 CUs) | 2880 CCs (15 CUs) | 3072 CCs (24 CUs) | 3840 CCs (60 CUs) |
FP32 Compute | 1.33 TFLOPs(Tesla) | 5.10 TFLOPs (Tesla) | 6.10 TFLOPs (Tesla) | ~12 TFLOPs (Tesla) |
FP64 Compute | 0.66 TFLOPs (Tesla) | 1.43 TFLOPs (Tesla) | 0.20 TFLOPs (Tesla) | ~6 TFLOPs(Tesla) |
Maximum VRAM | 1.5 GB GDDR5 | 6 GB GDDR5 | 12 GB GDDR5 | 16 / 32 GB HBM2 |
Maximum Bandwidth | 192 GB/s | 336 GB/s | 336 GB/s | 720 GB/s - 1 TB/s |
Maximum TDP | 244W | 250W | 250W | 300W |
Launch Year | 2010 (GTX 580) | 2014 (GTX Titan Black) | 2015 (GTX Titan X) | 2016 |
Nvidia Pascal - 2X Perf/Watt, Stacked Memory, NV-Link And Mixed Precision Compute
TSMC’s new 16nm FinFET process promises to be significantly more power efficient than planar 28nm. It also promises to bring about a considerable improvement in transistor density. Which would enable Nvidia to build faster, significantly more complex and more power efficient GPUs.
TSMC’s 16FF+ (FinFET Plus) technology can provide above 65 percent higher speed, around 2 times the density, or 70 percent less power than its 28HPM technology. Comparing with 20SoC technology, 16FF+ provides extra 40% higher speed and 60% power saving. By leveraging the experience of 20SoC technology, TSMC 16FF+ shares the same metal backend process in order to quickly improve yield and demonstrate process maturity for time-to-market value.
Apart from HBM2 and 16nm there is one big compute-centric feature that Nvidia will debut with Pascal. And it’s NVLink. Pascal will be the first GPU from the company to support this new proprietary server interconnect.
NVIDIA Volta GPUs and IBM Power9 CPUs Enabled Supercomputers in 2017:The technology targets GPU accelerated servers where the cross-chip communication is extremely bandwidth limited and a major system bottleneck. Nvidia states that NV-Link will be up to 5 to 12 times faster than traditional PCIE 3.0 making it a major step forward in platform atomics. Earlier this year Nvidia announced that IBM will be integrating this new interconnect into its upcoming PowerPC server CPUs. NVLink will debut with Nvidia’s Pascal in 2016 before it makes its way to Volta in 2018.
NVLink is an energy-efficient, high-bandwidth communications channel that uses up to three times less energy to move data on the node at speeds 5-12 times conventional PCIe Gen3 x16. First available in the NVIDIA Pascal GPU architecture, NVLink enables fast communication between the CPU and the GPU, or between multiple GPUs. Figure 3: NVLink is a key building block in the compute node of Summit and Sierra supercomputers.
VOLTA GPU Featuring NVLINK and Stacked Memory NVLINK GPU high speed interconnect 80-200 GB/s 3D Stacked Memory 4x Higher Bandwidth (~1 TB/s) 3x Larger Capacity 4x More Energy Efficient per bit.
NVLink is a key technology in Summit’s and Sierra’s server node architecture, enabling IBM POWER CPUs and NVIDIA GPUs to access each other’s memory fast and seamlessly. From a programmer’s perspective, NVLink erases the visible distinctions of data separately attached to the CPU and the GPU by “merging” the memory systems of the CPU and the GPU with a high-speed interconnect. Because both CPU and GPU have their own memory controllers, the underlying memory systems can be optimized differently (the GPU’s for bandwidth, the CPU’s for latency) while still presenting as a unified memory system to both processors. NVLink offers two distinct benefits for HPC customers. First, it delivers improved application performance, simply by virtue of greatly increased bandwidth between elements of the node. Second, NVLink with Unified Memory technology allows developers to write code much more seamlessly and still achieve high performance. via NVIDIA News
Unlike with Maxwell, Nvidia has laid major focus on compute and GPGPU acceleration with Pascal. The slew of new features and technologies that Nvidia will debut with Pascal emphasize this focus. Including the use of next generation stacked High Bandwidth Memory, high-speed NVLink GPU interconnect and the addition of mixed precision compute at double the rate of full precision compute to push perf/watt. We can’t wait to see Pascal in action later this year, but until then stay tuned for the latest.
GPU Family | Vega | NVIDIA Pascal |
---|---|---|
Flagship GPU | Vega 10 | GP102 |
GPU Process | 14nm FinFET | 16nm FinFET |
GPU Transistors | Up To 18 Billion | 12 Billion |
Memory | Up to 16 GB HBM2 | 12GB GDDR5X |
Bandwidth | 512 GB/s | 480 GB/s |
Graphics Architecture | Vega (NCU) | Pascal |
Predecessor | Fiji (Fury Series) | GM200 (900 Series) |