<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c, branch v5.9</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.9</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.9'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2020-08-18T21:00:46Z</updated>
<entry>
<title>drm/amdgpu: fix NULL pointer access issue when unloading driver</title>
<updated>2020-08-18T21:00:46Z</updated>
<author>
<name>Guchun Chen</name>
<email>guchun.chen@amd.com</email>
</author>
<published>2020-08-13T06:35:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1a68d96f81b8e7eb2a121fbf9abf9e5974e58832'/>
<id>urn:sha1:1a68d96f81b8e7eb2a121fbf9abf9e5974e58832</id>
<content type='text'>
When unloading driver by "modprobe -r amdgpu", one NULL pointer
dereference bug occurs in ras debugfs releasing. The cause is the
duplicated debugfs_remove, as drm debugfs_root dir has been cleaned
up already by drm_minor_unregister.

BUG: kernel NULL pointer dereference, address: 00000000000000a0
PGD 0 P4D 0
Oops: 0002 [#1] SMP PTI
CPU: 11 PID: 1526 Comm: modprobe Tainted: G           OE     5.6.0-guchchen #1
Hardware name: System manufacturer System Product Name/TUF Z370-PLUS GAMING II, BIOS 0411 09/21/2018
RIP: 0010:down_write+0x15/0x40
Code: eb de e8 7e 17 72 ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 53 48 89 fb e8 92
d8 ff ff 31 c0 ba 01 00 00 00 &lt;f0&gt; 48 0f b1 13 75 0f 65 48 8b 04 25 c0 8b 01 00 48 89 43 08 5b c3
RSP: 0018:ffffb1590386fcd0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 00000000000000a0 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffffffff85b2fcc2 RDI: 00000000000000a0
RBP: ffffb1590386fd30 R08: ffffffff85b2fcc2 R09: 000000000002b3c0
R10: ffff97a330618c40 R11: 00000000000005f6 R12: ffff97a3481beb40
R13: 00000000000000a0 R14: ffff97a3481beb40 R15: 0000000000000000
FS:  00007fb11a717540(0000) GS:ffff97a376cc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000a0 CR3: 00000004066d6006 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 simple_recursive_removal+0x63/0x370
 ? debugfs_remove+0x60/0x60
 debugfs_remove+0x40/0x60
 amdgpu_ras_fini+0x82/0x230 [amdgpu]
 ? __kernfs_remove.part.17+0x101/0x1f0
 ? kernfs_name_hash+0x12/0x80
 amdgpu_device_fini+0x1c0/0x580 [amdgpu]
 amdgpu_driver_unload_kms+0x3e/0x70 [amdgpu]
 amdgpu_pci_remove+0x36/0x60 [amdgpu]
 pci_device_remove+0x3b/0xb0
 device_release_driver_internal+0xe5/0x1c0
 driver_detach+0x46/0x90
 bus_remove_driver+0x58/0xd0
 pci_unregister_driver+0x29/0x90
 amdgpu_exit+0x11/0x25 [amdgpu]
 __x64_sys_delete_module+0x13d/0x210
 do_syscall_64+0x5f/0x250
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Guchun Chen &lt;guchun.chen@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: add printing after executing page reservation to eeprom</title>
<updated>2020-08-06T20:24:11Z</updated>
<author>
<name>Guchun Chen</name>
<email>guchun.chen@amd.com</email>
</author>
<published>2020-07-20T03:11:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ee10e06eb00c3b371fa17a13ee2adaee4254dd54'/>
<id>urn:sha1:ee10e06eb00c3b371fa17a13ee2adaee4254dd54</id>
<content type='text'>
This will tell users if the faulty page has been written to
external eeprom device in dmesg log.

Signed-off-by: Guchun Chen &lt;guchun.chen@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: RAS emergency restart logic refine</title>
<updated>2020-07-15T16:41:47Z</updated>
<author>
<name>Wenhui Sheng</name>
<email>Wenhui.Sheng@amd.com</email>
</author>
<published>2020-07-13T07:14:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bb5c7235eaafb4e2f957e9f0f71a187db5cf525a'/>
<id>urn:sha1:bb5c7235eaafb4e2f957e9f0f71a187db5cf525a</id>
<content type='text'>
If we are in RAS triggered situation and
BACO isn't support, emergency restart is needed,
and this code is only needed for some specific
cases(vega20 with given smu fw version).

After we add smu mode1 reset for sienna cichlid, we
need to share AMD_RESET_METHOD_MODE1 with psp mode1 reset,
so in amdgpu_device_gpu_recover, we need differentiate
which mode1 reset we are using, then decide if it's
a full reset and then decide if emergency restart is needed,
the logic will become much more complex.

After discussion with Hawking, move emergency restart logic
to an independent function.

Signed-off-by: Likun Gao &lt;Likun.Gao@amd.com&gt;
Signed-off-by: Wenhui Sheng &lt;Wenhui.Sheng@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: label internally used symbols as static</title>
<updated>2020-07-01T05:59:23Z</updated>
<author>
<name>Nirmoy Das</name>
<email>nirmoy.das@amd.com</email>
</author>
<published>2020-06-18T14:09:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f3167919f689c9b547aca82ec1eeb455e4eb120b'/>
<id>urn:sha1:f3167919f689c9b547aca82ec1eeb455e4eb120b</id>
<content type='text'>
Used sparse(make C=1) to find these loose ends.

v2:
removed unwanted extra line

Signed-off-by: Nirmoy Das &lt;nirmoy.das@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: remove useless code in RAS</title>
<updated>2020-06-02T20:47:43Z</updated>
<author>
<name>Guchun Chen</name>
<email>guchun.chen@amd.com</email>
</author>
<published>2020-06-02T05:53:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9e69b1ee1d9e1d58244279e39f032658df8cead6'/>
<id>urn:sha1:9e69b1ee1d9e1d58244279e39f032658df8cead6</id>
<content type='text'>
Module parameter amdgpu_ras_mask has been involved in
the calculation of ras support capability, so drop this
redundant code.

Signed-off-by: Guchun Chen &lt;guchun.chen@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: fix RAS memory leak in error case</title>
<updated>2020-06-02T20:47:20Z</updated>
<author>
<name>Guchun Chen</name>
<email>guchun.chen@amd.com</email>
</author>
<published>2020-06-02T05:46:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5e91160ac0b5cfbbaeb62cbff8b069262095f744'/>
<id>urn:sha1:5e91160ac0b5cfbbaeb62cbff8b069262095f744</id>
<content type='text'>
RAS context memory needs to freed in failure case.

Signed-off-by: Guchun Chen &lt;guchun.chen@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: print warning when input address is invalid</title>
<updated>2020-05-28T18:00:49Z</updated>
<author>
<name>Guchun Chen</name>
<email>guchun.chen@amd.com</email>
</author>
<published>2020-05-22T07:50:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b0d4783a382297048cb00e3e94078aa301296841'/>
<id>urn:sha1:b0d4783a382297048cb00e3e94078aa301296841</id>
<content type='text'>
This will assist debug in error injection case.

Signed-off-by: Guchun Chen &lt;guchun.chen@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Update RAS XGMI error inject sequence</title>
<updated>2020-05-14T21:42:35Z</updated>
<author>
<name>John Clements</name>
<email>john.clements@amd.com</email>
</author>
<published>2020-05-13T12:23:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5c23e9e05e42b5ea56a87a17f1da9ccf9b100465'/>
<id>urn:sha1:5c23e9e05e42b5ea56a87a17f1da9ccf9b100465</id>
<content type='text'>
Disable XGMI link power down prior to issuing a XGMI RAS error

Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: John Clements &lt;john.clements@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: allocate large structures dynamically</title>
<updated>2020-05-05T17:12:55Z</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2020-05-05T14:01:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7fcffecf79ba2963f1f7cf967f451818a7913482'/>
<id>urn:sha1:7fcffecf79ba2963f1f7cf967f451818a7913482</id>
<content type='text'>
After the structure was padded to 1024 bytes, it is no longer
suitable for being a local variable, as the function surpasses
the warning limit for 32-bit architectures:

drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:587:5: error: stack frame size of 1072 bytes in function 'amdgpu_ras_feature_enable' [-Werror,-Wframe-larger-than=]
int amdgpu_ras_feature_enable(struct amdgpu_device *adev,
    ^

Use kzalloc() instead to get it from the heap.

Fixes: a0d254820f43 ("drm/amdgpu: update RAS TA to Host interface")
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: update RAS error handling</title>
<updated>2020-04-30T20:48:20Z</updated>
<author>
<name>John Clements</name>
<email>john.clements@amd.com</email>
</author>
<published>2020-04-30T09:11:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a200034b664820da243ff4cd9595b8b5116332af'/>
<id>urn:sha1:a200034b664820da243ff4cd9595b8b5116332af</id>
<content type='text'>
Parse return status from TA to determine error severity

Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: John Clements &lt;john.clements@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
