<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c, branch v6.7</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.7</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.7'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2023-12-06T21:05:32Z</updated>
<entry>
<title>drm/amdgpu: optimize the printing order of error data</title>
<updated>2023-12-06T21:05:32Z</updated>
<author>
<name>Yang Wang</name>
<email>kevinyang.wang@amd.com</email>
</author>
<published>2023-12-04T02:44:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=dbf3850d12baf3ba8a80c302f538d1b01940aef7'/>
<id>urn:sha1:dbf3850d12baf3ba8a80c302f538d1b01940aef7</id>
<content type='text'>
sort error data list to optimize the printing order.

Signed-off-by: Yang Wang &lt;kevinyang.wang@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: fix ras err_data null pointer issue in amdgpu_ras.c</title>
<updated>2023-11-17T05:51:58Z</updated>
<author>
<name>Yang Wang</name>
<email>kevinyang.wang@amd.com</email>
</author>
<published>2023-11-13T06:34:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=07ee43faeb7eb088e49a7549fcabcae94c443d3b'/>
<id>urn:sha1:07ee43faeb7eb088e49a7549fcabcae94c443d3b</id>
<content type='text'>
fix ras err_data null pointer issue in amdgpu_ras.c

Fixes: 8cc0f5669eb6 ("drm/amdgpu: Support multiple error query modes")
Signed-off-by: Yang Wang &lt;kevinyang.wang@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: fix software pci_unplug on some chips</title>
<updated>2023-11-09T22:02:49Z</updated>
<author>
<name>Vitaly Prosyak</name>
<email>vitaly.prosyak@amd.com</email>
</author>
<published>2023-10-11T23:31:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4638e0c29a3f2294d5de0d052a4b8c9f33ccb957'/>
<id>urn:sha1:4638e0c29a3f2294d5de0d052a4b8c9f33ccb957</id>
<content type='text'>
When software 'pci unplug' using IGT is executed we got a sysfs directory
entry is NULL for differant ras blocks like hdp, umc, etc.
Before call 'sysfs_remove_file_from_group' and 'sysfs_remove_group'
check that 'sd' is  not NULL.

[  +0.000001] RIP: 0010:sysfs_remove_group+0x83/0x90
[  +0.000002] Code: 31 c0 31 d2 31 f6 31 ff e9 9a a8 b4 00 4c 89 e7 e8 f2 a2 ff ff eb c2 49 8b 55 00 48 8b 33 48 c7 c7 80 65 94 82 e8 cd 82 bb ff &lt;0f&gt; 0b eb cc 66 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90
[  +0.000001] RSP: 0018:ffffc90002067c90 EFLAGS: 00010246
[  +0.000002] RAX: 0000000000000000 RBX: ffffffff824ea180 RCX: 0000000000000000
[  +0.000001] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  +0.000001] RBP: ffffc90002067ca8 R08: 0000000000000000 R09: 0000000000000000
[  +0.000001] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  +0.000001] R13: ffff88810a395f48 R14: ffff888101aab0d0 R15: 0000000000000000
[  +0.000001] FS:  00007f5ddaa43a00(0000) GS:ffff88841e800000(0000) knlGS:0000000000000000
[  +0.000002] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000001] CR2: 00007f8ffa61ba50 CR3: 0000000106432000 CR4: 0000000000350ef0
[  +0.000001] Call Trace:
[  +0.000001]  &lt;TASK&gt;
[  +0.000001]  ? show_regs+0x72/0x90
[  +0.000002]  ? sysfs_remove_group+0x83/0x90
[  +0.000002]  ? __warn+0x8d/0x160
[  +0.000001]  ? sysfs_remove_group+0x83/0x90
[  +0.000001]  ? report_bug+0x1bb/0x1d0
[  +0.000003]  ? handle_bug+0x46/0x90
[  +0.000001]  ? exc_invalid_op+0x19/0x80
[  +0.000002]  ? asm_exc_invalid_op+0x1b/0x20
[  +0.000003]  ? sysfs_remove_group+0x83/0x90
[  +0.000001]  dpm_sysfs_remove+0x61/0x70
[  +0.000002]  device_del+0xa3/0x3d0
[  +0.000002]  ? ktime_get_mono_fast_ns+0x46/0xb0
[  +0.000002]  device_unregister+0x18/0x70
[  +0.000001]  i2c_del_adapter+0x26d/0x330
[  +0.000002]  arcturus_i2c_control_fini+0x25/0x50 [amdgpu]
[  +0.000236]  smu_sw_fini+0x38/0x260 [amdgpu]
[  +0.000241]  amdgpu_device_fini_sw+0x116/0x670 [amdgpu]
[  +0.000186]  ? mutex_lock+0x13/0x50
[  +0.000003]  amdgpu_driver_release_kms+0x16/0x40 [amdgpu]
[  +0.000192]  drm_minor_release+0x4f/0x80 [drm]
[  +0.000025]  drm_release+0xfe/0x150 [drm]
[  +0.000027]  __fput+0x9f/0x290
[  +0.000002]  ____fput+0xe/0x20
[  +0.000002]  task_work_run+0x61/0xa0
[  +0.000002]  exit_to_user_mode_prepare+0x150/0x170
[  +0.000002]  syscall_exit_to_user_mode+0x2a/0x50

Cc: Hawking Zhang &lt;hawking.zhang@amd.com&gt;
Cc: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: Christian Koenig &lt;christian.koenig@amd.com&gt;
Signed-off-by: Vitaly Prosyak &lt;vitaly.prosyak@amd.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Support multiple error query modes</title>
<updated>2023-11-09T22:01:58Z</updated>
<author>
<name>Hawking Zhang</name>
<email>Hawking.Zhang@amd.com</email>
</author>
<published>2023-11-08T08:07:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8cc0f5669eb6d4f156c721956da67560c9319317'/>
<id>urn:sha1:8cc0f5669eb6d4f156c721956da67560c9319317</id>
<content type='text'>
Direct error query mode and firmware error query mode
are supported for now.

Signed-off-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Reviewed-by: Yang Wang &lt;kevinyang.wang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: check recovery status of xgmi hive in ras_reset_error_count</title>
<updated>2023-11-03T16:18:32Z</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2023-10-30T12:44:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=18eae367cb74d05b5e37ce77ef4025b735df012e'/>
<id>urn:sha1:18eae367cb74d05b5e37ce77ef4025b735df012e</id>
<content type='text'>
Handle xgmi hive case.

Suggested-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Stanley.Yang &lt;Stanley.Yang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: check RAS supported first in ras_reset_error_count</title>
<updated>2023-10-31T20:40:15Z</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2023-10-25T03:03:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d1d4c0b7b65b7fab2bc6f97af9e823b1c42ccdb0'/>
<id>urn:sha1:d1d4c0b7b65b7fab2bc6f97af9e823b1c42ccdb0</id>
<content type='text'>
Not all platforms support RAS.

Fixes: 73582be11ac8 ("drm/amdgpu: bypass RAS error reset in some conditions")
Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Yang Wang &lt;kevinyang.wang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: bypass RAS error reset in some conditions</title>
<updated>2023-10-26T22:41:22Z</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2023-10-12T06:33:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=73582be11ac8f6d6765e185bf48f22efb9d28c3b'/>
<id>urn:sha1:73582be11ac8f6d6765e185bf48f22efb9d28c3b</id>
<content type='text'>
PMFW is responsible for RAS error reset in some conditions, driver can
skip the operation.

v2: add check for ras-&gt;in_recovery, it's set earlier than
amdgpu_in_reset.

v3: fix error in gpu reset check.

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: enable RAS poison mode for APU</title>
<updated>2023-10-26T22:41:22Z</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2023-10-20T09:21:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f3a3bbf1566c7b6b0f9ac36e8e597c73dc0afdf8'/>
<id>urn:sha1:f3a3bbf1566c7b6b0f9ac36e8e597c73dc0afdf8</id>
<content type='text'>
Enable it by default on APU platform.

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Stanley.Yang &lt;Stanley.Yang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: refine ras error kernel log print</title>
<updated>2023-10-26T22:41:21Z</updated>
<author>
<name>Yang Wang</name>
<email>kevinyang.wang@amd.com</email>
</author>
<published>2023-10-19T10:16:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ec3e0a9167e2cc97a9b12d9f2a619afd78b77223'/>
<id>urn:sha1:ec3e0a9167e2cc97a9b12d9f2a619afd78b77223</id>
<content type='text'>
refine ras error kernel log to avoid user-ridden ambiguity.

Signed-off-by: Yang Wang &lt;kevinyang.wang@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: fix find ras error node error</title>
<updated>2023-10-26T22:41:21Z</updated>
<author>
<name>Yang Wang</name>
<email>kevinyang.wang@amd.com</email>
</author>
<published>2023-10-20T02:12:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=53d4d7792757d195979a630a6402f272d3fd2a47'/>
<id>urn:sha1:53d4d7792757d195979a630a6402f272d3fd2a47</id>
<content type='text'>
the origin function might return the wrong node.

Fixes: 5b1270beb380 ("drm/amdgpu: add ras_err_info to identify RAS error source")
Signed-off-by: Yang Wang &lt;kevinyang.wang@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
