<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c, branch v6.4</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.4</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.4'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2023-04-14T17:47:49Z</updated>
<entry>
<title>Revert "drm/amdgpu: enable ras for mp0 v13_0_10 on SRIOV"</title>
<updated>2023-04-14T17:47:49Z</updated>
<author>
<name>Jane Jian</name>
<email>Jane.Jian@amd.com</email>
</author>
<published>2023-04-14T03:33:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b805d8d785e49cb3ee9279dad1402d5dcf902166'/>
<id>urn:sha1:b805d8d785e49cb3ee9279dad1402d5dcf902166</id>
<content type='text'>
This reverts commit fe120b9f5ce873516a2604e4ff0c19084be94e8c.
This patch impacts sriov multi-vf stability

Signed-off-by: Jane Jian &lt;Jane.Jian@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: correct ras enabled flag</title>
<updated>2023-04-11T22:03:45Z</updated>
<author>
<name>Stanley.Yang</name>
<email>Stanley.Yang@amd.com</email>
</author>
<published>2023-04-10T11:43:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=58bc2a9cbfdd4abdbfaafd835a0cd78bdad11423'/>
<id>urn:sha1:58bc2a9cbfdd4abdbfaafd835a0cd78bdad11423</id>
<content type='text'>
XGMI RAS should be according to the gmc xgmi physical nodes number,
XGMI RAS should not be enabled if xgmi num_physical_nodes is zero.

Signed-off-by: Stanley.Yang &lt;Stanley.Yang@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Add fatal error handling in nbio v4_3</title>
<updated>2023-03-31T15:18:32Z</updated>
<author>
<name>Hawking Zhang</name>
<email>Hawking.Zhang@amd.com</email>
</author>
<published>2023-03-23T02:21:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9af357bc3e05400eb632f3975986e1eac196f159'/>
<id>urn:sha1:9af357bc3e05400eb632f3975986e1eac196f159</id>
<content type='text'>
GPU will stop working once fatal error is detected.
it will inform driver to do reset to recover from
the fatal error.

v2: squash in logic fix (Srinivasan)
v3: squash in logic fix (Dan)

Signed-off-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Reviewed-by: Candice Li &lt;candice.li@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: enable ras for mp0 v13_0_10 on SRIOV</title>
<updated>2023-03-22T04:58:37Z</updated>
<author>
<name>YiPeng Chai</name>
<email>YiPeng.Chai@amd.com</email>
</author>
<published>2023-03-03T02:01:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fe120b9f5ce873516a2604e4ff0c19084be94e8c'/>
<id>urn:sha1:fe120b9f5ce873516a2604e4ff0c19084be94e8c</id>
<content type='text'>
Enable ras for mp0 v13_0_10 on SRIOV.

Signed-off-by: YiPeng Chai &lt;YiPeng.Chai@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: drop ras check at asic level for new blocks</title>
<updated>2023-03-15T22:45:27Z</updated>
<author>
<name>Hawking Zhang</name>
<email>Hawking.Zhang@amd.com</email>
</author>
<published>2023-03-04T12:22:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=370808876b5cab365f8fc6dbaf8cae13a2bc6efa'/>
<id>urn:sha1:370808876b5cab365f8fc6dbaf8cae13a2bc6efa</id>
<content type='text'>
amdgpu_ras_register_ras_block should always be invoked
by ras_sw_init, where driver needs to check ras caps
at ip level, instead of asic level.

Signed-off-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Reviewed-by: Stanley Yang &lt;Stanley.Yang@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Rework pcie_bif ras sw_init</title>
<updated>2023-03-15T22:45:27Z</updated>
<author>
<name>Hawking Zhang</name>
<email>Hawking.Zhang@amd.com</email>
</author>
<published>2023-03-13T06:18:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fdc94d3a8c887e4e06a7ff8dcb51d55cd70e16cf'/>
<id>urn:sha1:fdc94d3a8c887e4e06a7ff8dcb51d55cd70e16cf</id>
<content type='text'>
pcie_bif ras blocks needs to be initialized as early
as possible to handle fatal error detected in hw_init
phase. also align the pcie_bif ras sw_init with other
ras blocks

Signed-off-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Reviewed-by: Stanley Yang &lt;Stanley.Yang@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: change default behavior of bad_page_threshold parameter</title>
<updated>2023-02-23T22:35:59Z</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2023-02-21T07:25:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f3cbe70e215a87dcfdf028582a2fa94b24a08efe'/>
<id>urn:sha1:f3cbe70e215a87dcfdf028582a2fa94b24a08efe</id>
<content type='text'>
Ignore ras umc bad page threshold by default, GPU initialization won't
be stopped in this mode.

v2: refine the description of bad_page_threshold.

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Stanley.Yang &lt;Stanley.Yang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: exclude duplicate pages from UMC RAS UE count</title>
<updated>2023-02-23T22:35:59Z</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2023-02-10T08:33:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4d33e0f1340b3d08002ff8f9bcbf256cfdc4f3ba'/>
<id>urn:sha1:4d33e0f1340b3d08002ff8f9bcbf256cfdc4f3ba</id>
<content type='text'>
If a UMC bad page is reserved but not freed by an application, the
application may trigger uncorrectable error repeatly by accessing the page.

v2: add specific function to do the check.
v3: remove duplicate pages, calculate new added bad page number.
v4: reuse save_bad_pages to calculate new added bad page number.

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Stanley.Yang &lt;Stanley.Yang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Adjust ras support check condition for special asic</title>
<updated>2023-01-17T21:11:51Z</updated>
<author>
<name>YiPeng Chai</name>
<email>YiPeng.Chai@amd.com</email>
</author>
<published>2023-01-06T12:16:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8f453c51cfae92fded6e232985f6943c51b7829c'/>
<id>urn:sha1:8f453c51cfae92fded6e232985f6943c51b7829c</id>
<content type='text'>
[Why]:
     Amdgpu ras uses amdgpu_ras_is_supported to check whether
  the ras block supports the ras function. amdgpu_ras_is_supported
  uses .ras_enabled to determine whether the ras function of the
  block is enabled.
     But for special asic with mem ecc enabled but sram ecc not
  enabled, some ras blocks support poison mode but their ras function
  is not enabled on .ras_enabled, these ras blocks will run abnormally.

[How]:
    If the ras block is not supported on .ras_enabled but the asic
  supports poison mode and the ras block has ras configuration, it
  can be considered that the ras block supports ras function.

Signed-off-by: YiPeng Chai &lt;YiPeng.Chai@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Remove unnecessary ras block support check</title>
<updated>2023-01-17T21:11:51Z</updated>
<author>
<name>YiPeng Chai</name>
<email>YiPeng.Chai@amd.com</email>
</author>
<published>2023-01-06T12:54:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8c305a3fdf9b10e3ad773d843306eae2f7b76473'/>
<id>urn:sha1:8c305a3fdf9b10e3ad773d843306eae2f7b76473</id>
<content type='text'>
[Why]:
   For special asic with mem ecc enabled but sram ecc
not enabled, some ras blocks can register their ras
configuration to ras list, but these ras blocks are not
enabled on .ras_enabled, so it can not get ras block
object using amdgpu_ras_get_ras_block.

[How]:
   Remove ras block support check. Even if the ras block
checked is not in the ras list, it will return a null
pointer and will have no effect.

Signed-off-by: YiPeng Chai &lt;YiPeng.Chai@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
