<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c, branch v5.8</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.8</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.8'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2020-05-14T21:42:35Z</updated>
<entry>
<title>drm/amdgpu: Update RAS XGMI error inject sequence</title>
<updated>2020-05-14T21:42:35Z</updated>
<author>
<name>John Clements</name>
<email>john.clements@amd.com</email>
</author>
<published>2020-05-13T12:23:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5c23e9e05e42b5ea56a87a17f1da9ccf9b100465'/>
<id>urn:sha1:5c23e9e05e42b5ea56a87a17f1da9ccf9b100465</id>
<content type='text'>
Disable XGMI link power down prior to issuing a XGMI RAS error

Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: John Clements &lt;john.clements@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: allocate large structures dynamically</title>
<updated>2020-05-05T17:12:55Z</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2020-05-05T14:01:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7fcffecf79ba2963f1f7cf967f451818a7913482'/>
<id>urn:sha1:7fcffecf79ba2963f1f7cf967f451818a7913482</id>
<content type='text'>
After the structure was padded to 1024 bytes, it is no longer
suitable for being a local variable, as the function surpasses
the warning limit for 32-bit architectures:

drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:587:5: error: stack frame size of 1072 bytes in function 'amdgpu_ras_feature_enable' [-Werror,-Wframe-larger-than=]
int amdgpu_ras_feature_enable(struct amdgpu_device *adev,
    ^

Use kzalloc() instead to get it from the heap.

Fixes: a0d254820f43 ("drm/amdgpu: update RAS TA to Host interface")
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: update RAS error handling</title>
<updated>2020-04-30T20:48:20Z</updated>
<author>
<name>John Clements</name>
<email>john.clements@amd.com</email>
</author>
<published>2020-04-30T09:11:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a200034b664820da243ff4cd9595b8b5116332af'/>
<id>urn:sha1:a200034b664820da243ff4cd9595b8b5116332af</id>
<content type='text'>
Parse return status from TA to determine error severity

Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: John Clements &lt;john.clements@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: set error query ready after all IPs late init</title>
<updated>2020-04-22T22:11:49Z</updated>
<author>
<name>Dennis Li</name>
<email>Dennis.Li@amd.com</email>
</author>
<published>2020-04-22T04:22:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a891d239f9e036031f9f1c62fe584232662cb7f1'/>
<id>urn:sha1:a891d239f9e036031f9f1c62fe584232662cb7f1</id>
<content type='text'>
If set error query ready in amdgpu_ras_late_init, which will
cause some IP blocks aren't initialized, but their error query
is ready.

Signed-off-by: Dennis Li &lt;Dennis.Li@amd.com&gt;
Reviewed-by: Guchun Chen &lt;guchun.chen@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: fix kernel page fault issue by ras recovery on sGPU</title>
<updated>2020-04-22T22:11:46Z</updated>
<author>
<name>Guchun Chen</name>
<email>guchun.chen@amd.com</email>
</author>
<published>2020-04-16T15:41:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=12c17b9d62663c14a5343d6742682b3e67280754'/>
<id>urn:sha1:12c17b9d62663c14a5343d6742682b3e67280754</id>
<content type='text'>
When running ras uncorrectable error injection and triggering GPU
reset on sGPU, below issue is observed. It's caused by the list
uninitialized when accessing.

[   80.047227] BUG: unable to handle page fault for address: ffffffffc0f4f750
[   80.047300] #PF: supervisor write access in kernel mode
[   80.047351] #PF: error_code(0x0003) - permissions violation
[   80.047404] PGD 12c20e067 P4D 12c20e067 PUD 12c210067 PMD 41c4ee067 PTE 404316061
[   80.047477] Oops: 0003 [#1] SMP PTI
[   80.047516] CPU: 7 PID: 377 Comm: kworker/7:2 Tainted: G           OE     5.4.0-rc7-guchchen #1
[   80.047594] Hardware name: System manufacturer System Product Name/TUF Z370-PLUS GAMING II, BIOS 0411 09/21/2018
[   80.047888] Workqueue: events amdgpu_ras_do_recovery [amdgpu]

Signed-off-by: Guchun Chen &lt;guchun.chen@amd.com&gt;
Reviewed-by: John Clements &lt;John.Clements@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: refine ras related message print</title>
<updated>2020-04-13T16:01:50Z</updated>
<author>
<name>Guchun Chen</name>
<email>guchun.chen@amd.com</email>
</author>
<published>2020-04-10T07:51:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6952e99cfd52d32098540fe8d9e592828b9e774c'/>
<id>urn:sha1:6952e99cfd52d32098540fe8d9e592828b9e774c</id>
<content type='text'>
Prefix ras related kernel message logging with PCI
device info by replacing DRM_INFO/WARN/ERROR with
dev_info/warn/err. This can clearly tell user about
GPU device information where ras is. And add some
other ras message printing to make it more clear
and friendly as well.

Suggested-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Guchun Chen &lt;guchun.chen@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: resolve mGPU RAS query instability</title>
<updated>2020-04-09T14:43:15Z</updated>
<author>
<name>John Clements</name>
<email>john.clements@amd.com</email>
</author>
<published>2020-04-07T07:08:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b3dbd6d3ec495057db425a09516a922e1dacec33'/>
<id>urn:sha1:b3dbd6d3ec495057db425a09516a922e1dacec33</id>
<content type='text'>
upon receiving uncorrectable error, query every GPU node for ras errors

Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: John Clements &lt;john.clements@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: fix non-pointer dereference for non-RAS supported</title>
<updated>2020-04-01T18:44:44Z</updated>
<author>
<name>Evan Quan</name>
<email>evan.quan@amd.com</email>
</author>
<published>2020-03-27T07:39:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a9d82d2f91297679cfafd7e61c4bccdca6cd550d'/>
<id>urn:sha1:a9d82d2f91297679cfafd7e61c4bccdca6cd550d</id>
<content type='text'>
Backtrace on gpu recover test on Navi10.

[ 1324.516681] RIP: 0010:amdgpu_ras_set_error_query_ready+0x15/0x20 [amdgpu]
[ 1324.523778] Code: 4c 89 f7 e8 cd a2 a0 d8 e9 99 fe ff ff 45 31 ff e9 91 fe ff ff 0f 1f 44 00 00 55 48 85 ff 48 89 e5 74 0e 48 8b 87 d8 2b 01 00 &lt;40&gt; 88 b0 38 01 00 00 5d c3 66 90 0f 1f 44 00 00 55 31 c0 48 85 ff
[ 1324.543452] RSP: 0018:ffffaa1040e4bd28 EFLAGS: 00010286
[ 1324.549025] RAX: 0000000000000000 RBX: ffff911198b20000 RCX: 0000000000000000
[ 1324.556217] RDX: 00000000000c0a01 RSI: 0000000000000000 RDI: ffff911198b20000
[ 1324.563514] RBP: ffffaa1040e4bd28 R08: 0000000000001000 R09: ffff91119d0028c0
[ 1324.570804] R10: ffffffff9a606b40 R11: 0000000000000000 R12: 0000000000000000
[ 1324.578413] R13: ffffaa1040e4bd70 R14: ffff911198b20000 R15: 0000000000000000
[ 1324.586464] FS:  00007f4441cbf540(0000) GS:ffff91119ed80000(0000) knlGS:0000000000000000
[ 1324.595434] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1324.601345] CR2: 0000000000000138 CR3: 00000003fcdf8004 CR4: 00000000003606e0
[ 1324.608694] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1324.616303] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1324.623678] Call Trace:
[ 1324.626270]  amdgpu_device_gpu_recover+0x6e7/0xc50 [amdgpu]
[ 1324.632018]  ? seq_printf+0x4e/0x70
[ 1324.636652]  amdgpu_debugfs_gpu_recover+0x50/0x80 [amdgpu]
[ 1324.643371]  seq_read+0xda/0x420
[ 1324.647601]  full_proxy_read+0x5c/0x90
[ 1324.652426]  __vfs_read+0x1b/0x40
[ 1324.656734]  vfs_read+0x8e/0x130
[ 1324.660981]  ksys_read+0xa7/0xe0
[ 1324.665201]  __x64_sys_read+0x1a/0x20
[ 1324.669907]  do_syscall_64+0x57/0x1c0
[ 1324.674517]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1324.680654] RIP: 0033:0x7f44417cf081

Signed-off-by: Evan Quan &lt;evan.quan@amd.com&gt;
Reviewed-by: John Clements &lt;John.Clements@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: disable ras query and iject during gpu reset</title>
<updated>2020-04-01T18:44:42Z</updated>
<author>
<name>John Clements</name>
<email>john.clements@amd.com</email>
</author>
<published>2020-03-25T08:01:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=61380faa4b4cc577df8a7ff5db5859bac6b351f7'/>
<id>urn:sha1:61380faa4b4cc577df8a7ff5db5859bac6b351f7</id>
<content type='text'>
added flag to ras context to indicate if ras query functionality is ready

Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: John Clements &lt;john.clements@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: protect RAS sysfs during GPU reset</title>
<updated>2020-03-20T14:45:00Z</updated>
<author>
<name>John Clements</name>
<email>john.clements@amd.com</email>
</author>
<published>2020-03-19T06:41:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=43c4d57618bef018eecd769c2805ce6f4e849a0d'/>
<id>urn:sha1:43c4d57618bef018eecd769c2805ce6f4e849a0d</id>
<content type='text'>
MMHub EDC becomes dirty after BACO reset

EDC registers should be cleared early on in reset phase

Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: John Clements &lt;john.clements@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
