<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c, branch v6.3</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.3</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.3'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2023-02-23T22:35:59Z</updated>
<entry>
<title>drm/amdgpu: change default behavior of bad_page_threshold parameter</title>
<updated>2023-02-23T22:35:59Z</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2023-02-21T07:25:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f3cbe70e215a87dcfdf028582a2fa94b24a08efe'/>
<id>urn:sha1:f3cbe70e215a87dcfdf028582a2fa94b24a08efe</id>
<content type='text'>
Ignore ras umc bad page threshold by default, GPU initialization won't
be stopped in this mode.

v2: refine the description of bad_page_threshold.

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Stanley.Yang &lt;Stanley.Yang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: exclude duplicate pages from UMC RAS UE count</title>
<updated>2023-02-23T22:35:59Z</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2023-02-10T08:33:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4d33e0f1340b3d08002ff8f9bcbf256cfdc4f3ba'/>
<id>urn:sha1:4d33e0f1340b3d08002ff8f9bcbf256cfdc4f3ba</id>
<content type='text'>
If a UMC bad page is reserved but not freed by an application, the
application may trigger uncorrectable error repeatly by accessing the page.

v2: add specific function to do the check.
v3: remove duplicate pages, calculate new added bad page number.
v4: reuse save_bad_pages to calculate new added bad page number.

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Stanley.Yang &lt;Stanley.Yang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Adjust ras support check condition for special asic</title>
<updated>2023-01-17T21:11:51Z</updated>
<author>
<name>YiPeng Chai</name>
<email>YiPeng.Chai@amd.com</email>
</author>
<published>2023-01-06T12:16:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8f453c51cfae92fded6e232985f6943c51b7829c'/>
<id>urn:sha1:8f453c51cfae92fded6e232985f6943c51b7829c</id>
<content type='text'>
[Why]:
     Amdgpu ras uses amdgpu_ras_is_supported to check whether
  the ras block supports the ras function. amdgpu_ras_is_supported
  uses .ras_enabled to determine whether the ras function of the
  block is enabled.
     But for special asic with mem ecc enabled but sram ecc not
  enabled, some ras blocks support poison mode but their ras function
  is not enabled on .ras_enabled, these ras blocks will run abnormally.

[How]:
    If the ras block is not supported on .ras_enabled but the asic
  supports poison mode and the ras block has ras configuration, it
  can be considered that the ras block supports ras function.

Signed-off-by: YiPeng Chai &lt;YiPeng.Chai@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Remove unnecessary ras block support check</title>
<updated>2023-01-17T21:11:51Z</updated>
<author>
<name>YiPeng Chai</name>
<email>YiPeng.Chai@amd.com</email>
</author>
<published>2023-01-06T12:54:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8c305a3fdf9b10e3ad773d843306eae2f7b76473'/>
<id>urn:sha1:8c305a3fdf9b10e3ad773d843306eae2f7b76473</id>
<content type='text'>
[Why]:
   For special asic with mem ecc enabled but sram ecc
not enabled, some ras blocks can register their ras
configuration to ras list, but these ras blocks are not
enabled on .ras_enabled, so it can not get ras block
object using amdgpu_ras_get_ras_block.

[How]:
   Remove ras block support check. Even if the ras block
checked is not in the ras list, it will return a null
pointer and will have no effect.

Signed-off-by: YiPeng Chai &lt;YiPeng.Chai@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Perform gpu reset after gfx finishes processing ras poison consumption on gfx_v11_0_3</title>
<updated>2023-01-17T21:11:51Z</updated>
<author>
<name>YiPeng Chai</name>
<email>YiPeng.Chai@amd.com</email>
</author>
<published>2023-01-04T05:13:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ac7b25d92c6f967217c92a401734bf041187996f'/>
<id>urn:sha1:ac7b25d92c6f967217c92a401734bf041187996f</id>
<content type='text'>
Perform gpu reset after gfx finishes processing
ras poison consumption on gfx_v11_0_3.

V2:
 Move gfx poison consumption handler from hw_ops to ip
 function level.

V3:
 Adjust the calling position of amdgpu_gfx_poison_consumation_handler.

V4:
   Since gfx v11_0_3 does not have .hw_ops instance, the .hw_ops null
 pointer check in amdgpu_ras_interrupt_poison_consumption_handler
 needs to be adjusted.

Signed-off-by: YiPeng Chai &lt;YiPeng.Chai@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: allow query error counters for specific IP block</title>
<updated>2023-01-05T16:42:14Z</updated>
<author>
<name>Hawking Zhang</name>
<email>Hawking.Zhang@amd.com</email>
</author>
<published>2023-01-03T15:41:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4a1c9a444b5e0f276f43f77e1723088bbedb1687'/>
<id>urn:sha1:4a1c9a444b5e0f276f43f77e1723088bbedb1687</id>
<content type='text'>
amdgpu_ras_block_late_init will be invoked in IP
specific ras_late_init call as a common helper for
all the IP blocks.

However, when amdgpu_ras_block_late_init call
amdgpu_ras_query_error_count to query ras error
counters, amdgpu_ras_query_error_count queries
all the IP blocks that support ras query interface.

This results to wrong error counters cached in
software copies when there are ras errors detected
at time zero or warm reset procedure. i.e., in
sdma_ras_late_init phase, it counts on sdma/mmhub
errors, while, in mmhub_ras_late_init phase, it
still counts on sdma/mmhub errors.

The change updates amdgpu_ras_query_error_count
interface to allow query specific ip error counter.
It introduces a new input parameter: query_info. if
query_info is NULL,  it means query all the IP blocks,
otherwise, only query the ip block specified by
query_info.

Signed-off-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: remove enable ras cmd call trace</title>
<updated>2023-01-03T21:57:58Z</updated>
<author>
<name>Stanley.Yang</name>
<email>Stanley.Yang@amd.com</email>
</author>
<published>2022-12-21T10:17:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c26cd999180dcb6d0a5705884485d66cd4bb4afd'/>
<id>urn:sha1:c26cd999180dcb6d0a5705884485d66cd4bb4afd</id>
<content type='text'>
[Why]
    [   41.285804] RIP: 0010:amdgpu_ras_feature_enable+0x15c/0x310 [amdgpu]
    [   41.285945] Code: 48 89 c1 48 c7 c2 b9 f2 88 c1 48 c7 c0 c0 f2 88 c1 49 8b 3c 24 48 0f 44 d0 48 c7 c6 98 33 80 c1 e8 5f 52 75 d9 e9 fa fe ff ff &lt;0f&gt; 0b e9 66 ff ff ff 48 8b 3d 86 8c 0f da ba 00 04 00 00 be c0 0d
    [   41.285946] RSP: 0018:ffffbccdc72efc90 EFLAGS: 00010246
    [   41.285948] RAX: 0000000000000004 RBX: ffff931897406980 RCX: 0000000000000002
    [   41.285949] RDX: 0000000000000dc0 RSI: 0000000000000002 RDI: ffff931500042b00
    [   41.285950] RBP: ffffbccdc72efcc0 R08: 0000000000000002 R09: ffff931885b87000
    [   41.285951] R10: 0000000000ffff10 R11: 0000000000000001 R12: ffff931893e20000
    [   41.285952] R13: 0000000000000001 R14: ffff931885b87000 R15: 0000000000000000
    [   41.285953] FS:  0000000000000000(0000) GS:ffff931c6f200000(0000) knlGS:0000000000000000
    [   41.285954] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [   41.285955] CR2: 000055dd6f532008 CR3: 000000061b010006 CR4: 00000000003706e0
    [   41.285956] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [   41.285957] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [   41.285958] Call Trace:
    [   41.285959]  &lt;TASK&gt;
    [   41.285963]  ? gfx_v11_0_early_init+0x250/0x250 [amdgpu]
    [   41.286117]  gfx_v11_0_late_init+0x8c/0xb0 [amdgpu]
    [   41.286271]  amdgpu_device_ip_late_init+0x8d/0x3c0 [amdgpu]
    [   41.286401]  amdgpu_device_init.cold+0x1677/0x1fda [amdgpu]
    [   41.286616]  ? pci_bus_read_config_word+0x4a/0x70
    [   41.286621]  ? do_pci_enable_device+0xdb/0x110
    [   41.286625]  amdgpu_driver_load_kms+0x1a/0x160 [amdgpu]
    [   41.286762]  amdgpu_pci_probe+0x18d/0x3a0 [amdgpu]
    [   41.286898]  local_pci_probe+0x4b/0x90
    [   41.286901]  work_for_cpu_fn+0x1a/0x30
    [   41.286903]  process_one_work+0x22b/0x3d0
    [   41.286905]  worker_thread+0x223/0x420
    [   41.286907]  ? process_one_work+0x3d0/0x3d0
    [   41.286908]  kthread+0x12a/0x150
    [   41.286911]  ? set_kthread_struct+0x50/0x50
    [   41.286913]  ret_from_fork+0x22/0x30

[How]
    For specific asic, only mem ecc is enabled, sram ecc is not enabled,
    but it still need to send ras enable cmd to gfx block to support
    poison mode, so add check posion mode.

Signed-off-by: Stanley.Yang &lt;Stanley.Yang@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: define RAS query poison mode function</title>
<updated>2022-12-15T17:18:19Z</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2022-12-06T02:46:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2dd9032beb699016f8c3076c98a1d457a13abb10'/>
<id>urn:sha1:2dd9032beb699016f8c3076c98a1d457a13abb10</id>
<content type='text'>
1. no need to query poison mode on SRIOV guest side, host can handle it.
2. define the function to simplify code.

v2: rename amdgpu_ras_poison_mode_query to amdgpu_ras_query_poison_mode.

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: update VCN/JPEG RAS setting</title>
<updated>2022-12-15T17:18:19Z</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2022-12-05T08:23:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3189501e6f024931079936a592d677128826ef14'/>
<id>urn:sha1:3189501e6f024931079936a592d677128826ef14</id>
<content type='text'>
Support VCN/JPEG RAS in both bare metal and SRIOV environment.

v2: update commit description.

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: skip RAS error injection in SRIOV</title>
<updated>2022-12-15T17:18:19Z</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2022-12-08T03:51:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=248c9635b8bd9d0c1649031da531d80e850fbdbe'/>
<id>urn:sha1:248c9635b8bd9d0c1649031da531d80e850fbdbe</id>
<content type='text'>
Injection on guest is not allowed.

v2: return directly in SRIOV environment.

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
