linux/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c, branch v6.14

drm/amdgpu: disable BAR resize on Dell G5 SE

2025-02-25T17:20:27Z

There was a quirk added to add a workaround for a Sapphire RX 5600 XT Pulse that didn't allow BAR resizing. However, the quirk caused a regression with runtime pm on Dell laptops using those chips, rather than narrowing the scope of the resizing quirk, add a quirk to prevent amdgpu from resizing the BAR on those Dell platforms unless runtime pm is disabled. v2: update commit message, add runpm check Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1707 Fixes: 907830b0fc9e ("PCI: Add a REBAR size quirk for Sapphire RX 5600 XT Pulse") Reviewed-by: Lijo Lazar Signed-off-by: Alex Deucher (cherry picked from commit 5235053f443cef4210606e5fb71f99b915a9723d) Cc: stable@vger.kernel.org

drm/amd/amdgpu: Prevent null pointer dereference in GPU bandwidth calculation

2025-01-24T14:55:26Z

If the parent is NULL, adev->pdev is used to retrieve the PCIe speed and width, ensuring that the function can still determine these capabilities from the device itself. Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:6193 amdgpu_device_gpu_bandwidth() error: we previously assumed 'parent' could be null (see line 6180) drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 6170 static void amdgpu_device_gpu_bandwidth(struct amdgpu_device *adev, 6171 enum pci_bus_speed *speed, 6172 enum pcie_link_width *width) 6173 { 6174 struct pci_dev *parent = adev->pdev; 6175 6176 if (!speed || !width) 6177 return; 6178 6179 parent = pci_upstream_bridge(parent); 6180 if (parent && parent->vendor == PCI_VENDOR_ID_ATI) { ^^^^^^ If parent is NULL 6181 /* use the upstream/downstream switches internal to dGPU */ 6182 *speed = pcie_get_speed_cap(parent); 6183 *width = pcie_get_width_cap(parent); 6184 while ((parent = pci_upstream_bridge(parent))) { 6185 if (parent->vendor == PCI_VENDOR_ID_ATI) { 6186 /* use the upstream/downstream switches internal to dGPU */ 6187 *speed = pcie_get_speed_cap(parent); 6188 *width = pcie_get_width_cap(parent); 6189 } 6190 } 6191 } else { 6192 /* use the device itself */ --> 6193 *speed = pcie_get_speed_cap(parent); ^^^^^^ Then we are toasted here. 6194 *width = pcie_get_width_cap(parent); 6195 } 6196 } Fixes: 757e8b951ce2 ("drm/amdgpu: cache gpu pcie link width") Cc: Christian König Cc: Alex Deucher Reported-by: Dan Carpenter Signed-off-by: Srinivasan Shanmugam Suggested-by: Lijo Lazar Reviewed-by: Lijo Lazar Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: cache gpu pcie link width

2025-01-24T14:53:24Z

Get the PCIe link with of the device itself (or it's integrated upstream bridge) and cache that. v2: fix typo Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3820 Reviewed-by: Yang Wang Signed-off-by: Alex Deucher

drm/amdgpu: Refine ip detection log message

2025-01-24T14:52:58Z

'add ip block' causes a confusion if the blocks are disabled later with ip_block_mask. Instead change to 'detected' and also add device context. Signed-off-by: Lijo Lazar Reviewed-by: Asad Kamal Suggested-by: Alex Deucher Signed-off-by: Alex Deucher

drm/amd: Require CONFIG_HOTPLUG_PCI_PCIE for BOCO

2024-12-18T17:15:40Z

If the kernel hasn't been compiled with PCIe hotplug support this can lead to problems with dGPUs that use BOCO because they effectively drop off the bus. To prevent issues, disable BOCO support when compiled without PCIe hotplug. Reported-by: Gabriel Marcano Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1707#note_2696862 Acked-by: Alex Deucher Link: https://lore.kernel.org/r/20241211155601.3585256-1-superm1@kernel.org Signed-off-by: Mario Limonciello Signed-off-by: Alex Deucher

drm/amdgpu: Avoid VF for RAS recovery source check

2024-12-11T22:30:59Z

VF device sets the RAS flag when mailbox data can't be read properly. There is no conclusive way to tell if the real source is RAS error. Therefore VF schedules a KFD based reset which doesn't set RAS source. SKip checking RAS source for any VF scheduled recovery. Signed-off-by: Lijo Lazar Reported-by: Vojislav Tomasevic Reviewed-by: Yiqing Yao Tested-by: Yiqing Yao Fixes: e1ee2111ca48 ("drm/amdgpu: Prefer RAS recovery for scheduler hang") Signed-off-by: Alex Deucher

drm/amd: Add the capability to mark certain firmware as "required"

2024-12-10T15:26:51Z

Some of the firmware that is loaded by amdgpu is not actually required. For example the ISP firmware on some SoCs is optional, and if it's not present the ISP IP block just won't be initialized. The firmware loader core however will show a warning when this happens like this: ``` Direct firmware load for amdgpu/isp_4_1_0.bin failed with error -2 ``` To avoid confusion for non-required firmware, adjust the amd-ucode helper to take an extra argument indicating if the firmware is required or optional. On optional firmware use firmware_request_nowarn() instead of request_firmware() to avoid the warnings. Reviewed-by: Alex Deucher Link: https://lore.kernel.org/amd-gfx/df71d375-7abd-4b32-97ce-15e57846eed8@amd.com/T/#t Signed-off-by: Mario Limonciello Signed-off-by: Alex Deucher

drm/amdgpu: add initial support for gfx950

2024-12-10T15:26:50Z

add gfx950 basic support Signed-off-by: Le Ma Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher

drm/amdgpu: device: fix spellos and punctuation

2024-12-10T15:26:50Z

Make spelling and punctuation changes to ease reading of the comments. Signed-off-by: Randy Dunlap Cc: Alex Deucher Cc: Christian König Cc: Xinhui Pan Cc: amd-gfx@lists.freedesktop.org Cc: David Airlie Cc: Simona Vetter Signed-off-by: Alex Deucher

drm/amd: Add Suspend/Hibernate notification callback support

2024-12-10T15:26:50Z

As part of the suspend sequence VRAM needs to be evicted on dGPUs. In order to make suspend/resume more reliable we moved this into the pmops prepare() callback so that the suspend sequence would fail but the system could remain operational under high memory usage suspend. Another class of issues exist though where due to memory fragementation there isn't a large enough contiguous space and swap isn't accessible. Add support for a suspend/hibernate notification callback that could evict VRAM before tasks are frozen. This should allow paging out to swap if necessary. Link: https://github.com/ROCm/ROCK-Kernel-Driver/issues/174 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3476 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2362 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3781 Reviewed-by: Lijo Lazar Link: https://lore.kernel.org/r/20241128032656.2090059-2-superm1@kernel.org Signed-off-by: Mario Limonciello Signed-off-by: Alex Deucher