<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c, branch v5.7</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.7</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.7'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2020-04-07T18:01:43Z</updated>
<entry>
<title>drm/amdgpu: resolve mGPU RAS query instability</title>
<updated>2020-04-07T18:01:43Z</updated>
<author>
<name>John Clements</name>
<email>john.clements@amd.com</email>
</author>
<published>2020-04-07T07:08:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0b9ebd7eebb7419d18030f0364a9392ffbf1d793'/>
<id>urn:sha1:0b9ebd7eebb7419d18030f0364a9392ffbf1d793</id>
<content type='text'>
upon receiving uncorrectable error, query every GPU node for ras errors

Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: John Clements &lt;john.clements@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: protect RAS sysfs during GPU reset</title>
<updated>2020-03-20T14:45:00Z</updated>
<author>
<name>John Clements</name>
<email>john.clements@amd.com</email>
</author>
<published>2020-03-19T06:41:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=43c4d57618bef018eecd769c2805ce6f4e849a0d'/>
<id>urn:sha1:43c4d57618bef018eecd769c2805ce6f4e849a0d</id>
<content type='text'>
MMHub EDC becomes dirty after BACO reset

EDC registers should be cleared early on in reset phase

Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: John Clements &lt;john.clements@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: fix warning in ras_debugfs_create_all()</title>
<updated>2020-03-13T15:52:34Z</updated>
<author>
<name>Stanley.Yang</name>
<email>Stanley.Yang@amd.com</email>
</author>
<published>2020-03-12T10:18:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c1509f3f6fa41addf863aee5af7f406e56d110b4'/>
<id>urn:sha1:c1509f3f6fa41addf863aee5af7f406e56d110b4</id>
<content type='text'>
Fix the warning
"warn: variable dereferenced before check 'obj' (see line 1131)"
by removing unnecessary checks as amdgpu_ras_debugfs_create_all()
is only called from amdgpu_debugfs_init() where obj member in
con-&gt;head list is not NULL.
Use list_for_each_entry() instead list_for_each_entry_safe() as obj
do not to be freeing or removing from list during this process.

Signed-off-by: Stanley.Yang &lt;Stanley.Yang@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: update ras capability's query based on mem ecc configuration</title>
<updated>2020-03-13T15:52:34Z</updated>
<author>
<name>Guchun Chen</name>
<email>guchun.chen@amd.com</email>
</author>
<published>2020-03-10T10:27:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=88474ccad5f85ee934785e2919689282ba6c6767'/>
<id>urn:sha1:88474ccad5f85ee934785e2919689282ba6c6767</id>
<content type='text'>
RAS support capability needs to be updated on top of different
memeory ECC enablement, and remove redundant memory ecc check
in gmc module for vega20 and arcturus.

v2: check HBM ECC enablement and set ras mask accordingly.
v3: avoid to invoke atomfirmware interface to query twice.

Suggested-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Guchun Chen &lt;guchun.chen@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: call ras_debugfs_create_all in debugfs_init</title>
<updated>2020-03-10T19:55:11Z</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2020-03-06T04:24:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=204eaac625d57d32d8b4c42b57271a359b76db5a'/>
<id>urn:sha1:204eaac625d57d32d8b4c42b57271a359b76db5a</id>
<content type='text'>
and remove each ras IP's own debugfs creation

this is required to fix ras when the driver does not use the drm load
and unload callbacks due to ordering issues with the drm device node.

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Stanley.Yang &lt;Stanley.Yang@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: add function to creat all ras debugfs node</title>
<updated>2020-03-10T19:55:02Z</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2020-03-06T03:59:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f9317014ea517e13a59b9d437948c5ab17ce4ed7'/>
<id>urn:sha1:f9317014ea517e13a59b9d437948c5ab17ce4ed7</id>
<content type='text'>
centralize all debugfs creation in one place for ras

this is required to fix ras when the driver does not use the drm load
and unload callbacks due to ordering issues with the drm device node.

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Stanley.Yang &lt;Stanley.Yang@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: enable PCS error report on VG20</title>
<updated>2020-03-06T19:31:35Z</updated>
<author>
<name>Hawking Zhang</name>
<email>Hawking.Zhang@amd.com</email>
</author>
<published>2020-02-21T13:37:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ec01fe2dbf8cf2b2a25c6ac73eac357988f1bf7d'/>
<id>urn:sha1:ec01fe2dbf8cf2b2a25c6ac73eac357988f1bf7d</id>
<content type='text'>
Now driver will report XGMI/WAFL PCS error through
sysfs xgmi_wafl_err_count node on Vega20

Signed-off-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Reviewed-by: Guchun Chen &lt;guchun.chen@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: move get_xgmi_relative_phy_addr to amdgpu_xgmi.c</title>
<updated>2020-02-26T19:17:32Z</updated>
<author>
<name>Hawking Zhang</name>
<email>Hawking.Zhang@amd.com</email>
</author>
<published>2020-02-24T07:36:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=19744f5f2dff50290f0767a68c0dc418c1a59cec'/>
<id>urn:sha1:19744f5f2dff50290f0767a68c0dc418c1a59cec</id>
<content type='text'>
centralize all the xgmi related function to amdgpu_xgmi.c

Signed-off-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Acked-by: Evan Quan &lt;evan.quan@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: log on non-zero error conter per IP before GPU reset</title>
<updated>2020-02-19T15:36:26Z</updated>
<author>
<name>Guchun Chen</name>
<email>guchun.chen@amd.com</email>
</author>
<published>2020-02-13T07:34:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=313c8fd33ebc1945f1c875481a7d5b6921265e0d'/>
<id>urn:sha1:313c8fd33ebc1945f1c875481a7d5b6921265e0d</id>
<content type='text'>
Once sync flood interrupt is triggered by RAS error, before
actual GPU recovery job, it's necessary to log on and print
non-zero error counter, this will help user knows where the
RAS error source is from quickly.

Signed-off-by: Guchun Chen &lt;guchun.chen@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: added support to get mGPU DRAM base</title>
<updated>2020-01-22T21:34:07Z</updated>
<author>
<name>John Clements</name>
<email>john.clements@amd.com</email>
</author>
<published>2020-01-17T04:18:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a6c44d2538c469f412c3fded0de2290494d762d7'/>
<id>urn:sha1:a6c44d2538c469f412c3fded0de2290494d762d7</id>
<content type='text'>
resolves issue with RAS error injection in mGPU configuration

Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: John Clements &lt;john.clements@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
