Nvidia Memory Testing Guide

So, your card has all voltages and you have verified that the bios circuit is working as it should but you still have no output from the card. Or there is output but you have artifacts, crashing under load, abnormal behavior etc. Well, you probably have a faulty memory chip and you're at the right place.

-Replacing memory chips is a difficult procedure, if you do not have the tools or the experience, you should let an expert do it for you.-

Nvidia Modular Diagnostic Software (aka Nvidia MODS)
MODS is a very powerful tool that tests nvidia cards for different kinds of faults. it includes a standalone tool called MATS that tests memory specifically. If you do have access to it, this guide will show how to use MATS and identify faulty memory chips.

Memory Channel Labeling
As shown in Figure 1, each channel consists of 2 memory chips. 0 and 1. For a card with N GB VRAM, there is N/2 channels. in that example, there is 4 channels in the 8GB GTX 1080.

Memory is counted counter clockwise starting from the OPPOSITE corner of the golden arrow on the core. Starting from A1, A0, B1, B0... to X1, X0. (X being the last channel)

Using MATS with a card that has no output
You'll need either a CPU with an integrated GPU (APU) or a secondary card to get the screen output. After booting into MODS, type the following command to start testing the memory:

and then:

Index should be 1 if you are using integrated graphics or a dedicated GPU with a CPU that has no integrated.

Memory size to test should be at least 5, recommended 50. Higher numbers will take longer to finish.

After the test finishes, you will get a report.txt file that has the result of the test inside, alternatively, you can add  to the end of the 2nd command to show the results instantly on the screen.

Using MATS with a card that has output.
This is a bit easier since you don't have to enter the first command or an index, just enter:   and the test will run. You can still add  to the end to show the report on the screen.

Identifying the faulty memory bank(s)
Reading the report example in Figure 2, MATS found errors on D1 and C0, Which correspond to the chips marked in Figure 3.

Usually, only one chip fails and makes the card not output a picture or displays artifacts. In this case however, there was a problem with 2 chips which points to a IMC (Integrated Memory Controller) fault which is inside the core. Luckily, this particular card was dropped by the user. Taking the memory chips off, cleaning the pads and resoldering the chips back fixed it.

If you get errors on all channels though, It's either the IMC or power related issue that either killed all the memories or is not suppling enough power to them.

The failing bits can sometimes tell you if the issue is the memory itself or the IMC but replace the memory to make sure.