<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h, branch v5.12</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.12</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.12'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2020-12-09T04:02:05Z</updated>
<entry>
<title>drm/amdgpu: fix debugfs creation/removal, again</title>
<updated>2020-12-09T04:02:05Z</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2020-12-03T23:06:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cedf788459683b84800fd3ecc63c63e2e3a5be33'/>
<id>urn:sha1:cedf788459683b84800fd3ecc63c63e2e3a5be33</id>
<content type='text'>
There is still a warning when CONFIG_DEBUG_FS is disabled:

drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1145:13: error: 'amdgpu_ras_debugfs_create_ctrl_node' defined but not used [-Werror=unused-function]
 1145 | static void amdgpu_ras_debugfs_create_ctrl_node(struct amdgpu_device *adev)

Change the code again to make the compiler actually drop
this code but not warn about it.

Fixes: ae2bf61ff39e ("drm/amdgpu: guard ras debugfs creation/removal based on CONFIG_DEBUG_FS")
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: fix the issue of reserving bad pages failed</title>
<updated>2020-10-30T04:57:29Z</updated>
<author>
<name>Dennis Li</name>
<email>Dennis.Li@amd.com</email>
</author>
<published>2020-10-22T09:44:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=676deb38770582abac87447f47d1ee643bb14681'/>
<id>urn:sha1:676deb38770582abac87447f47d1ee643bb14681</id>
<content type='text'>
In amdgpu_ras_reset_gpu, because bad pages may not be freed,
it has high probability to reserve bad pages failed.

Change to reserve bad pages when freeing VRAM.

v2:
1. avoid allocating the drm_mm node outside of amdgpu_vram_mgr.c
2. move bad page reserving into amdgpu_ras_add_bad_pages, if vram mgr
   reserve bad page failed, it will put it into pending list, otherwise
   put it into processed list;
3. remove amdgpu_ras_release_bad_pages, because retired page's info has
   been moved into amdgpu_vram_mgr

v3:
1. formate code style;
2. rename amdgpu_vram_reserve_scope as amdgpu_vram_reservation;
3. rename scope_pending as reservations_pending;
4. rename scope_processed as reserved_pages;
5. change to iterate over all the pending ones and try to insert them
   with drm_mm_reserve_node();

v4:
1. rename amdgpu_vram_mgr_reserve_scope as
amdgpu_vram_mgr_reserve_range;
2. remove unused include "amdgpu_ras.h";
3. rename amdgpu_vram_mgr_check_and_reserve as
amdgpu_vram_mgr_do_reserve;
4. refine amdgpu_vram_mgr_reserve_range to call
amdgpu_vram_mgr_do_reserve.

Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;hawking.zhang@amd.com&gt;
Signed-off-by: Dennis Li &lt;Dennis.Li@amd.com&gt;
Signed-off-by: Wenhui Sheng &lt;Wenhui.Sheng@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: remove redundant GPU reset</title>
<updated>2020-10-30T04:57:23Z</updated>
<author>
<name>Dennis Li</name>
<email>Dennis.Li@amd.com</email>
</author>
<published>2020-10-19T06:49:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5eeb45934c753322ea8f6dce6f069bf977bf5282'/>
<id>urn:sha1:5eeb45934c753322ea8f6dce6f069bf977bf5282</id>
<content type='text'>
Because bad pages saving has been moved to UMC error interrupt callback,
which will trigger a new GPU reset after saving.

Signed-off-by: Dennis Li &lt;Dennis.Li@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;hawking.zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: change to save bad pages in UMC error interrupt callback</title>
<updated>2020-10-30T04:57:17Z</updated>
<author>
<name>Dennis Li</name>
<email>Dennis.Li@amd.com</email>
</author>
<published>2020-10-19T06:19:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=22503d803dab174b7f038fc9886c225ef30ee95c'/>
<id>urn:sha1:22503d803dab174b7f038fc9886c225ef30ee95c</id>
<content type='text'>
Instead of saving bad pages in amdgpu_ras_reset_gpu, it will reduce
the unnecessary calling of amdgpu_ras_save_bad_pages.

Signed-off-by: Dennis Li &lt;Dennis.Li@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;hawking.zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: bypass querying ras error count registers</title>
<updated>2020-08-14T20:12:22Z</updated>
<author>
<name>Guchun Chen</name>
<email>guchun.chen@amd.com</email>
</author>
<published>2020-08-04T07:00:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f75e94d86829e92a758a26fc5bbdb4c9eba86260'/>
<id>urn:sha1:f75e94d86829e92a758a26fc5bbdb4c9eba86260</id>
<content type='text'>
Once ras recovery is issued by ras sync flood interrupt or
ras controller interrupt, add this guard to bypass or execute
ras error count register harvest of all IPs.

Signed-off-by: Guchun Chen &lt;guchun.chen@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Reviewed-by: Dennis Li &lt;Dennis.Li@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: break GPU recovery once it's in bad state(v4)</title>
<updated>2020-08-04T21:26:54Z</updated>
<author>
<name>Guchun Chen</name>
<email>guchun.chen@amd.com</email>
</author>
<published>2020-07-23T08:20:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e8fbaf03429d085b05e66b153a8363b746c94b43'/>
<id>urn:sha1:e8fbaf03429d085b05e66b153a8363b746c94b43</id>
<content type='text'>
When GPU executes recovery and retriving bad GPU tag
from external eerpom device, the recovery will be broken
and error message is printed as well for user's awareness.

v2: Refine warning message in threshold reaching case, and
    fix spelling typo.

v3: Fix explicit calling of bad gpu.

v4: Rename function names.

Signed-off-by: Guchun Chen &lt;guchun.chen@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: skip bad page reservation once issuing from eeprom write</title>
<updated>2020-08-04T21:26:38Z</updated>
<author>
<name>Guchun Chen</name>
<email>guchun.chen@amd.com</email>
</author>
<published>2020-07-23T07:50:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=35cd2cdadbccf08e9b4d5a8fca4914dc37ab48e0'/>
<id>urn:sha1:35cd2cdadbccf08e9b4d5a8fca4914dc37ab48e0</id>
<content type='text'>
Once the ras recovery is issued from eeprom write itself,
bad page reservation should be ignored, otherwise, recursive
calling of writting to eeprom would happen.

Signed-off-by: Guchun Chen &lt;guchun.chen@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: validate bad page threshold in ras(v3)</title>
<updated>2020-08-04T21:25:58Z</updated>
<author>
<name>Guchun Chen</name>
<email>guchun.chen@amd.com</email>
</author>
<published>2020-07-22T02:00:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c84d46707ebbfc303268e27d5aeb8bb4e6d5b1ff'/>
<id>urn:sha1:c84d46707ebbfc303268e27d5aeb8bb4e6d5b1ff</id>
<content type='text'>
Bad page threshold value should be valid in the range between
-1 and max records length of eeprom. It could determine when
saved bad pages exceed threshold value, and proceed corresponding
actions.

v2: When using the default typical value, it should be min
value between typical value and eeprom max records length.

v3: drop the case of setting bad_page_cnt_threshold to be
    0xFFFFFFFF, as it confuses user.

Signed-off-by: Guchun Chen &lt;guchun.chen@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: RAS emergency restart logic refine</title>
<updated>2020-07-15T16:41:47Z</updated>
<author>
<name>Wenhui Sheng</name>
<email>Wenhui.Sheng@amd.com</email>
</author>
<published>2020-07-13T07:14:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bb5c7235eaafb4e2f957e9f0f71a187db5cf525a'/>
<id>urn:sha1:bb5c7235eaafb4e2f957e9f0f71a187db5cf525a</id>
<content type='text'>
If we are in RAS triggered situation and
BACO isn't support, emergency restart is needed,
and this code is only needed for some specific
cases(vega20 with given smu fw version).

After we add smu mode1 reset for sienna cichlid, we
need to share AMD_RESET_METHOD_MODE1 with psp mode1 reset,
so in amdgpu_device_gpu_recover, we need differentiate
which mode1 reset we are using, then decide if it's
a full reset and then decide if emergency restart is needed,
the logic will become much more complex.

After discussion with Hawking, move emergency restart logic
to an independent function.

Signed-off-by: Likun Gao &lt;Likun.Gao@amd.com&gt;
Signed-off-by: Wenhui Sheng &lt;Wenhui.Sheng@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: disable ras query and iject during gpu reset</title>
<updated>2020-04-01T18:44:42Z</updated>
<author>
<name>John Clements</name>
<email>john.clements@amd.com</email>
</author>
<published>2020-03-25T08:01:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=61380faa4b4cc577df8a7ff5db5859bac6b351f7'/>
<id>urn:sha1:61380faa4b4cc577df8a7ff5db5859bac6b351f7</id>
<content type='text'>
added flag to ras context to indicate if ras query functionality is ready

Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: John Clements &lt;john.clements@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
