| Age | Commit message (Collapse) | Author | Lines |
|
The TIDSS hardware does not have independent maximum or minimum pixel
clock limits for each video port. Instead, these limits are determined
by the SoC's clock architecture. Previously, this constraint was
modeled using the 'max_pclk_khz' and 'min_pclk_khz' fields in
'dispc_features', but this approach is static and does not account for
the dynamic behavior of PLLs.
This patch removes the 'max_pclk_khz' and 'min_pclk_khz' fields from
'dispc_features'. The correct way to check if a requested mode's pixel
clock is supported is by using 'clk_round_rate()' in the 'mode_valid()'
hook. If the best frequency match for the mode clock falls within the
supported tolerance, it is approved. TIDSS supports a 5% pixel clock
tolerance, which is now reflected in the validation logic.
This change allows existing DSS-compatible drivers to be reused across
SoCs that only differ in their pixel clock characteristics. The
validation uses 'clk_round_rate()' for each mode, which may introduce
additional delay (about 3.5 ms for 30 modes), but this is generally
negligible. Users desiring faster validation may bypass these calls
selectively, for example, checking only the highest resolution mode,
as shown here[1].
[1]: https://lore.kernel.org/all/20250704094851.182131-3-j-choudhary@ti.com/
Tested-by: Michael Walle <mwalle@kernel.org>
Reviewed-by: Devarsh Thakkar <devarsht@ti.com>
Reviewed-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Signed-off-by: Jayesh Choudhary <j-choudhary@ti.com>
Signed-off-by: Swamil Jain <s-jain1@ti.com>
Link: https://patch.msgid.link/20251104151422.307162-2-s-jain1@ti.com
[Tomi: dropped 'inline' from check_pixel_clock]
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
|
|
It's not used anymore, so remove it. This allows trully independent
layer state from mixer.
Reviewed-by: Chen-Yu Tsai <wens@kernel.org>
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251104180942.61538-31-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
With "floating" planes in DE33, mixer can't be stored in layer structure
anymore. Find mixer using currently bound crtc.
Reviewed-by: Chen-Yu Tsai <wens@kernel.org>
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251104180942.61538-30-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
This allows to almost completely decouple layer code from mixer. This is
important for DE33.
Reviewed-by: Chen-Yu Tsai <wens@kernel.org>
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251104180942.61538-29-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
Later special plane only driver for DE33 will provide separate
configuration. This change will also help layer driver migrate away from
mixer structure.
Reviewed-by: Chen-Yu Tsai <wens@kernel.org>
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251104180942.61538-28-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
On DE2 and DE3, UI scalers are located right after VI scalers. So in
order to calculate proper UI scaler base address, number of VI scalers
must be known. In practice, it is same as number of VI channels, but it
doesn't need to be.
Let's make a quirk for this number. Code for configuring channels and
associated functions won't have access to vi_num quirk anymore after
rework for independent planes.
Reviewed-by: Chen-Yu Tsai <wens@kernel.org>
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251104180942.61538-27-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
They can't be triggered if mixer configuration is properly specified in
quirks. Additionally, number of VI channels won't be available in future
due to rework for DE33 support.
Reviewed-by: Chen-Yu Tsai <wens@kernel.org>
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251104180942.61538-26-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
Determination if FCC unit can be used for VI layer alpha depends on
number of VI channels. This info won't be available anymore in future
to VI layer driver because of DE33 way of allocating planes from same
pool to different mixers.
While order is slightly changed, it doesn't affect anything due to
double buffering of registers. New order keeps related registers
together and quirk separate.
Reviewed-by: Chen-Yu Tsai <wens@kernel.org>
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251104180942.61538-25-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
Now that channel base calculation is straightforward, let's update VI
scaler base calculation to be simpler. At the same time, also introduce
macro to avoid magic numbers.
Note, reason why current magic value and new macro value isn't the same
is because sun8i_channel_base() already introduces offset to channel
registers. Previous value is just the difference to VI scaler registers.
However, new code calculates scaler base from channel base. This is also
easier to understand when looking into BSP driver. Macro value can be
easily found whereas old diff value was not.
Reviewed-by: Chen-Yu Tsai <wens@kernel.org>
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251104180942.61538-24-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
This avoids plane mapping in layers code, which allows future
refactoring, when layer code will move away from accessing mixer
structure.
Reviewed-by: Chen-Yu Tsai <wens@kernel.org>
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251104180942.61538-23-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
Layer will be more universal, due to DE33 support.
Reviewed-by: Chen-Yu Tsai <wens@kernel.org>
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251104180942.61538-22-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
Till DE33, there were no reason to decouple registers from mixer.
However, with future new plane driver, this will be necessary.
Reviewed-by: Chen-Yu Tsai <wens@kernel.org>
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251104180942.61538-21-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
Layer related peripherals should take layer struct as a input. This
looks cleaner and also necessary for proper DE33 support later.
Reviewed-by: Chen-Yu Tsai <wens@kernel.org>
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251104180942.61538-20-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
Layer related peripherals should take layer struct as a input. This
looks cleaner and also necessary for proper DE33 support later.
Reviewed-by: Chen-Yu Tsai <wens@kernel.org>
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251104180942.61538-19-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
This change is equally a cleanup (less arguments) and preparation for
DE33 separate plane driver. It will introduce additional register space.
No functional changes.
Reviewed-by: Chen-Yu Tsai <wens@kernel.org>
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251104180942.61538-18-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
This change is equally a cleanup (less arguments) and preparation for
DE33 separate plane driver. It will introduce additional register space.
Reviewed-by: Chen-Yu Tsai <wens@kernel.org>
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251104180942.61538-17-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
With DE33, number of planes no longer depends on mixer because layers
are shared between all mixers.
Get this value via parameter, so DE specific code can fill in proper
value.
Reviewed-by: Chen-Yu Tsai <wens@kernel.org>
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251104180942.61538-16-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
In the pursuit of making UI/VI layer code independent of DE version,
change meaning of UI index to index of the plane within mixer. DE33 can
split amount of VI and UI planes between multiple mixer in whatever way
it deems acceptable, so simple calculation VI num + UI index won't be
meaningful anymore.
Reviewed-by: Chen-Yu Tsai <wens@kernel.org>
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251104180942.61538-15-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
Plane type determination logic inside layer init functions doesn't allow
index register to be repurposed to plane sequence, which it almost is.
So move out the logic to mixer, which allows further rework for DE33
support.
Reviewed-by: Chen-Yu Tsai <wens@kernel.org>
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251104180942.61538-14-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
Taking plane state directly reduces number of arguments, avoids copying
values and allows making additional decisions. For example, when plane
is disabled, CSC should be turned off.
This is also cleanup for later patches which will move call to another
place.
Reviewed-by: Chen-Yu Tsai <wens@kernel.org>
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251104180942.61538-13-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
Merging both function into one lets this one decide on it's own if CSC
should be enabled or not. Currently heuristics for that is pretty simple
- enable it for YUV formats and disable for RGB. DE3 and newer allows
YUV pipeline, which will be easier to implement these way.
Reviewed-by: Chen-Yu Tsai <wens@kernel.org>
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251104180942.61538-12-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
Enable or disable layer only in layer atomic update callback. Doing so
will enable having separate layer driver later for DE33.
There is no fear that enable bit would be set incorrectly, as all
read-modify-write sequences for that register are now eliminated.
Reviewed-by: Chen-Yu Tsai <wens@kernel.org>
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251104180942.61538-11-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
It turns out that none of the VI channel registers were meant to be
read. Mostly it works fine but sometimes it returns incorrect values.
Rework VI layer code to write all registers in one go to avoid reads.
This rework will also allow proper code separation.
Reviewed-by: Chen-Yu Tsai <wens@kernel.org>
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251104180942.61538-10-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
It turns out that none of the UI channel registers were meant to be
read. Mostly it works fine but sometimes it returns incorrect values.
Rework UI layer code to write all registers in one go to avoid reads.
This rework will also allow proper code separation.
Reviewed-by: Chen-Yu Tsai <wens@kernel.org>
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251104180942.61538-9-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
With upcoming DE33 support, layer management must be decoupled from
other operations like blender configuration. There are two reasons:
- DE33 will have separate driver for planes and thus it will be harder
to manage different register spaces
- Architecturaly it's better to split access by modules. Blender is now
exclusively managed by mixer.
Reviewed-by: Chen-Yu Tsai <wens@kernel.org>
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251104180942.61538-8-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
Functions called by atomic_commit callback should not fail. None of them
actually returns error, so make them void.
No functional change.
Reviewed-by: Chen-Yu Tsai <wens@csie.org>
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251104180942.61538-7-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
DRM requires that all check are done in atomic_check callback. Move
one check from atomic_commit to atomic_check callback.
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Reviewed-by: Chen-Yu Tsai <wens@kernel.org>
Link: https://patch.msgid.link/20251104180942.61538-6-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
DRM requires that all checks are done in atomic_check callback. Move
one check from atomic_commit to atomic_check callback.
Reviewed-by: Chen-Yu Tsai <wens@csie.org>
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251104180942.61538-5-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
drm_universal_plane_init() can already call some callbacks, like
format_mod_supported, during initialization. Because of that, fields
should be initialized beforehand.
Reviewed-by: Chen-Yu Tsai <wens@csie.org>
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251104180942.61538-4-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
Those engine versions don't need ccsc argument, since CSC units are
located on different position and for each layer.
Reviewed-by: Chen-Yu Tsai <wens@csie.org>
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251104180942.61538-3-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
Properly define macros. Till now raw numbers and inappropriate macro was
used.
Reviewed-by: Chen-Yu Tsai <wens@csie.org>
Tested-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251104180942.61538-2-jernej.skrabec@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
Fix the following corner case:-
Consider a 2M huge page SVM allocation, followed by prefetch call for
the first 4K page. The whole range is initially mapped with single PTE.
After the prefetch, this range gets split to first page + rest of the
pages. Currently, the first page mapping is not updated on MI300A (APU)
since page hasn't migrated. However, after range split PTE mapping it not
valid.
Fix this by forcing page table update for the whole range when prefetch
is called. Calling prefetch on APU doesn't improve performance. If all
it deteriotes. However, functionality has to be supported.
v2: Use apu_prefer_gtt as this issue doesn't apply to APUs with carveout
VRAM
v3: Simplify by setting the flag for all ASICs as it doesn't affect dGPU
v4: Remove v2 and v3 changes. Force update_mapping when range is split
at a size that is not aligned to prange granularity
Suggested-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Philip Yang<Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 076470b9f6f8d9c7c8ca73a9f054942a686f9ba7)
|
|
Over allocation of save area is not fatal, only under allocation is.
ROCm has various components that independently claim authority over save
area size.
Unless KFD decides to claim single authority, relax size checks.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 15bd4958fe38e763bc17b607ba55155254a01f55)
Cc: stable@vger.kernel.org
|
|
enable parse_cs callback for JPEG5_0_1.
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 547985579932c1de13f57f8bcf62cd9361b9d3d3)
Cc: stable@vger.kernel.org
|
|
When the BO pointer provided to amdgpu_bo_create_kernel() points to
non-NULL, amdgpu_bo_create_kernel() takes it as a hint to pin that address
rather than allocate a new BO.
This functionality is never desired for allocating ISP buffers. A new BO
should always be created when isp_kernel_buffer_alloc() is called, per the
description for isp_kernel_buffer_alloc().
Ensure this by zeroing *bo right before the amdgpu_bo_create_kernel() call.
Fixes: 55d42f616976 ("drm/amd/amdgpu: Add helper functions for isp buffers")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 73c8c29baac7f0c7e703d92eba009008cbb5228e)
|
|
[Why]
When changing resolution (e.g., 4K → FHD) in mirror/clone mode with
certain monitors, the monitor blanks and loses connection due to an early
exit in vrr_settings_require_update(). The function only checks if VRR
state, fixed refresh target, or min/max refresh rate range has changed.
During mode changes, if the calculated min/max refresh values remain the
same even though the stream's v_total changed, the function returns early
without updating vrr_params.adjust.v_total_min/max, leaving the monitor's
VRR timing parameters unsynced with the new mode, causing it to blank out.
[How]
Explicitly adjust VRR parameters to the stream's nominal v_total when VRR
is supported, but inactive.
Fixes: 6d31602a9f57 ("drm/amd/display: more liberal vmin/vmax update for freesync")
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 607df8248a011524211ee34850345305a1913f9e)
|
|
Fix a potential deadlock caused by inconsistent spinlock usage
between interrupt and process contexts in the userq fence driver.
The issue occurs when amdgpu_userq_fence_driver_process() is called
from both:
- Interrupt context: gfx_v11_0_eop_irq() -> amdgpu_userq_fence_driver_process()
- Process context: amdgpu_eviction_fence_suspend_worker() ->
amdgpu_userq_fence_driver_force_completion() -> amdgpu_userq_fence_driver_process()
In interrupt context, the spinlock was acquired without disabling
interrupts, leaving it in {IN-HARDIRQ-W} state. When the same lock
is acquired in process context, the kernel detects inconsistent
locking since the process context acquisition would enable interrupts
while holding a lock previously acquired in interrupt context.
Kernel log shows:
[ 4039.310790] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
[ 4039.310804] kworker/7:2/409 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 4039.310818] ffff9284e1bed000 (&fence_drv->fence_list_lock){?...}-{3:3},
[ 4039.310993] {IN-HARDIRQ-W} state was registered at:
[ 4039.311004] lock_acquire+0xc6/0x300
[ 4039.311018] _raw_spin_lock+0x39/0x80
[ 4039.311031] amdgpu_userq_fence_driver_process.part.0+0x30/0x180 [amdgpu]
[ 4039.311146] amdgpu_userq_fence_driver_process+0x17/0x30 [amdgpu]
[ 4039.311257] gfx_v11_0_eop_irq+0x132/0x170 [amdgpu]
Fix by using spin_lock_irqsave()/spin_unlock_irqrestore() to properly
manage interrupt state regardless of calling context.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ded3ad780cf97a04927773c4600823b84f7f3cc2)
Cc: stable@vger.kernel.org
|
|
drm_sched_entity_init wasn't called yet, so the only thing to
do is to release allocated memory.
This doesn't fix any bug since entity is zero allocated and
drm_sched_entity_fini does nothing in this case.
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ec49374ccb8da86b465beaf09c367f3dfd648d8f)
|
|
Certain multi-GPU configurations (especially GFX12) may hit
data corruption when a DCC-compressed VRAM surface is shared across GPUs
using peer-to-peer (P2P) DMA transfers.
Such surfaces rely on device-local metadata and cannot be safely accessed
through a remote GPU’s page tables. Attempting to import a DCC-enabled
surface through P2P leads to incorrect rendering or GPU faults.
This change disables P2P for DCC-enabled VRAM buffers that are contiguous
and allocated on GFX12+ hardware. In these cases, the importer falls back
to the standard system-memory path, avoiding invalid access to compressed
surfaces.
Future work could consider optional migration (VRAM→System→VRAM) if a
performance regression is observed when `attach->peer2peer = false`.
Tested on:
- Dual RX 9700 XT (Navi4x) setup
- GNOME and Wayland compositor scenarios
- Confirmed no corruption after disabling P2P under these conditions
v2: Remove check TTM_PL_VRAM & TTM_PL_FLAG_CONTIGUOUS.
v3: simplify for upsteam and fix ip version check (Alex)
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 9dff2bb709e6fbd97e263fd12bf12802d2b5a0cf)
Cc: stable@vger.kernel.org
|
|
Fix the following corner case:-
Consider a 2M huge page SVM allocation, followed by prefetch call for
the first 4K page. The whole range is initially mapped with single PTE.
After the prefetch, this range gets split to first page + rest of the
pages. Currently, the first page mapping is not updated on MI300A (APU)
since page hasn't migrated. However, after range split PTE mapping it not
valid.
Fix this by forcing page table update for the whole range when prefetch
is called. Calling prefetch on APU doesn't improve performance. If all
it deteriotes. However, functionality has to be supported.
v2: Use apu_prefer_gtt as this issue doesn't apply to APUs with carveout
VRAM
v3: Simplify by setting the flag for all ASICs as it doesn't affect dGPU
v4: Remove v2 and v3 changes. Force update_mapping when range is split
at a size that is not aligned to prange granularity
Suggested-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Philip Yang<Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Sometimes the VCE PLL times out waiting for CTLACK/CTLACK2.
When it happens, the VCE still works, but much slower.
Observed on a Tahiti GPU, but not all:
- FirePro W9000 has the issue
- Radeon R9 280X not affected
- Radeon HD 7990 not affected
As a workaround, on the affected chip just don't put the
VCE PLL in sleep mode. Leaving the VCE PLL in bypass mode
or reset mode both work. Using bypass mode is simpler.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add the VCE1 IP block to the SI GPUs that have it.
Advertise the encoder capabilities corresponding to VCE1,
so the userspace applications can detect and use it.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Co-developed-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
On SI GPUs, the SMC needs to be aware of whether or not the VCE1
is used. The VCE1 is enabled/disabled through the DPM code.
Also print VCE clocks in amdgpu_pm_info.
Users can inspect the current power state using:
cat /sys/kernel/debug/dri/<card>/amdgpu_pm_info
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Based on research and ideas by Alexandre and Christian.
VCE1 actually executes its code from the VCPU BO.
Due to various hardware limitations, the VCE1 requires
the VCPU BO to be in the low 32 bit address range.
However, VRAM is typically mapped at the high address range,
which means the VCPU can't access VRAM through the FB aperture.
To solve this, we write a few page table entries to
map the VCPU BO in the GART address range. And we make sure
that the GART is located at the low address range.
That way the VCE1 can access the VCPU BO.
v2:
- Adjust to v2 of the GART helper commit.
- Add empty line to multi-line comment.
v3:
- Instead of relying on gmc_v6 to set the GART space before GTT,
add a new function amdgpu_vce_required_gart_pages() which is
called from amdgpu_gtt_mgr_init() directly.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Co-developed-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Access XGMI registers only if AID is active.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Implement the necessary functionality to support the VCE1.
This implementation is based on:
- VCE2 code from amdgpu
- VCE1 code from radeon (the old driver)
- Some trial and error
A subsequent commit will ensure correct mapping for
the VCPU BO, which will make this actually work.
v2:
- Use memset_io more.
- Use memcpy_toio more.
- Remove __func__ from warnings.
- Don't reserve and map the VCPU BO anymore.
- Add empty line to multi-line comments
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Co-developed-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Load VCE1 firmware using amdgpu_ucode_request, just like
it is done for other VCE versions.
All SI chips share the same VCE1 firmware file: vce_1_0_0.bin
which will be sent to linux-firmware soon.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Co-developed-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The sid.h header contained some VCE1 register definitions, but
they were using byte offsets (probably copied from the old radeon
driver). Move all of these to the proper VCE1 headers and ensure
they are in dword offsets.
Also add the register definitions that we need for the
firmware validation mechanism in VCE1.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Co-developed-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The VCPU BO doesn't only contain the VCE firmware but also other
ranges that the VCE uses for its stack and data. Let's initialize
this to zero to avoid having garbage in the VCPU BO.
Additionally, don't unmap/unreserve the VCPU BO.
The VCPU BO needs to stay at the same location before and after
sleep/resume because the FW code is not relocatable once it's
started.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Try to load the VCE firmware at early_init.
When the correct firmware is not found, return -ENOENT.
This way, the driver initialization will complete even
without VCE, and the GPU will be functional, albeit
without video encoding capabilities.
This is necessary because we are planning to add support
for the VCE1, and AMD hasn't yet publised the correct
firmware for this version. So we need to anticipate that
users will try to boot amdgpu on SI GPUs without the
correct VCE1 firmware present on their system.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|