benchmark

Ryzen 7 8745HS (Zen 4 APU, Phoenix) Part I: General Performance Analysis

2 Mar 2026 12 min read

System: AceMagic W1 mini PC, AMI BIOS PHXPM7B0, 32GB DDR5-5600, CachyOS

iGPU: AMD Radeon 780M (12 CUs, RDNA 3)

EC: ITE IT5570, single fan cooling solution

See also: Part II — GPU Roofline & Bandwidth Analysis for roofline modeling, hardware performance counter profiling, and cross-GPU comparison.

BIOS Modification

The stock BIOS from AceMagic was bare and they hide all advanced menus (AMD CBS, AMD PBS, ACPI, fTPM, SATA, etc.) behind a "Hide setup" flag (QuestionId 0x96, VarOffset 0x9D in the Setup NVRAM variable). This flag defaults to Enabled and is itself hidden by SuppressIf True, making it inaccessible through the BIOS UI.

Patch method: Extracted the Setup DXE driver PE32 body (GUID 899407D7-99FE-43D8-9A21-79EC328CAC21), replaced all 16 occurrences of the IFR byte pattern 12 06 96 00 01 00 (EqIdVal QuestionId:0x96, Value:0x01) with 12 06 96 00 02 00 (Value:0x02). Since "Hide setup" only has values 0 and 1, comparing against 2 means the SuppressIf condition never triggers, permanently unhiding all menus. Reinserted patched PE32 using uefireplace and flashed via AfuEfix64.efi from UEFI Shell.

Failed approaches: Direct NVRAM writes (setup_var.efi, RU.efi), changing IFR OneOfOption defaults (AMITSE ignores them), flashrom SPI read (crashes system; AMD PSP owns the SPI bus), ryzenadj --tctl-temp (SMU accepts but ignores the value).

After unlocking, TjMax was changed from 85°C to 95°C via AMD CBS > SMU Common Options > Thermal Control.

CPU Benchmarks

All tests at stock 54W unless noted. TjMax column shows the configured thermal limit.

Config	TjMax	7zip (MIPS)	stress-ng (bogo/s)	Peak Tctl	Peak Fan
Stock baseline	85°C	91,758	47,022	85.1°C	3317 RPM
BIOS unlocked	95°C	98,828 (+7.7%)	49,961 (+6.3%)	94.7°C	3412 RPM
65W ryzenadj	95°C	99,554 (+8.5%)	51,358 (+9.2%)	95.1°C	3510 RPM
35W ryzenadj	95°C	89,445 (-2.5%)	42,990 (-8.6%)	76.3°C	2743 RPM

Raising TjMax from 85 to 95°C is the biggest single gain: the CPU was thermally throttling at 85°C and leaving performance on the table. Increasing power to 65W yields marginal further improvement because the cooler can't dissipate more than ~54W sustained; the CPU just hits the 95°C thermal wall sooner. At 35W the system runs much cooler and quieter with only a small performance penalty.

Cooler Thermal Capacity

To find the exact thermal ceiling, we ran CPU-only stress tests (stress-ng matrix, 60s) at progressively lower power limits and observed where the temperature stabilizes vs hits TjMax.

Limit	Sustained Pkg	Sustained Tctl	Peak Tctl	Fan RPM (PWM)	CPU bogo/s
65W	65→59W (throttled)	95.0°C (clamped)	95.1°C	3438 (255 max)	51,467
60W	59W (throttled)	95.0°C (clamped)	95.1°C	3438 (255 max)	51,127
55W	55W (sustained)	89-90°C (rising)	90.3°C	3312 (242)	49,491
50W	50W (sustained)	83-84°C (stable)	84.2°C	3180 (227)	47,988
45W	45W (sustained)	77-78°C (stable)	77.8°C	3041 (212)	46,283

The heatsink's sustained dissipation limit is ~58-60W. At 65W the SMU immediately thermal-throttles to ~59W to hold TjMax. At 60W it still reaches 95°C and throttles slightly. At 55W the temperature stabilizes at 90°C with headroom, though it would likely reach 95°C in a longer run. It holds steady within 60s.

65W and 60W produce nearly identical performance (~51K bogo/s) because both end up thermally clamped at the same effective wattage. The fan maxes out at PWM 255 (3438 RPM) for both, the cooler giving everything it has.

Power limits were adjusted at runtime using ryzenadj (STAPM, fast PPT, slow PPT all set to the same value) and sustained for 60s under stress-ng --matrix CPU load. The fan and thermal controller were left in automatic mode throughout.

The manufacturer's stock TjMax of 85°C is the real bottleneck. The cooler stabilizes at ~90°C when dissipating 54W, but with TjMax set to 85°C the CPU begins throttling 5°C before the cooler reaches its actual limit. Raising TjMax to 95°C allows the CPU to use its full 54W budget without premature throttling, which is the single biggest performance unlock (+7.7%).

Practical recommendations:

Stock 54W is already near-optimal. It sits just 1W below the tested 55W sweet spot, meaning the manufacturer's power limit is well-chosen for this cooler, once TjMax is raised to 95°C via the unlocked BIOS to stop premature throttling.
For tinkers: Repasting with quality thermal compound and raising the power limit to 60W in the unlocked BIOS could squeeze out a few more percent. Hitting TjMax 95°C is almost certain at this wattage, but the chip handles it fine. TjMax can be raised further to 100°C, though there is little practical reason to do so; the gains are marginal and component longevity suffers.
Thermal headroom scales linearly: each 5W reduction drops sustained temperature by roughly 6-7°C, useful for tuning noise/performance tradeoffs (e.g. 45W for a near-silent desk setup).

The CPU and iGPU share a single 54W package power budget (STAPM). When both are loaded simultaneously, the SMU dynamically splits power between them.

Combined Load Tests (stress-ng + FurMark VK 1080p)

Config	FurMark Score	CPU bogo/s	Sustained Pkg	Peak Tctl	Peak iGPU
GPU only (54W)	1,753	—	29W	69.7°C	68°C
CPU only (54W)	—	49,892	54W	95.0°C	63°C
CPU+GPU (54W)	1,533 (-12.5%)	41,881 (-16.1%)	54W	79.6°C	67°C
CPU+GPU (75W)	1,578 (-10%)	45,259 (-9.3%)	65W	88.8°C	72°C

Under combined load at 54W, the CPU loses 16% and the GPU loses 12.5% of their standalone performance. Raising the limit to 75W helps combined workloads (+8% CPU, +3% GPU) as the package settles at ~65W before thermal throttling kicks in.

Power Budget Sweep (CPU+GPU Combined)

Limit	FurMark Score	CPU bogo/s	GPU Clock
15W	704	10,056	800 MHz
25W	1,272	18,039	800-1571 MHz
35W	1,529	28,136	1154-1944 MHz
45W	1,543	35,447	1521-2017 MHz

The GPU score plateaus at ~1,530 above 35W under combined load. Beyond this point, additional power goes entirely to the CPU. This is consistent with the GPU's ~25-30W effective power ceiling (see clpeak results below).

GPU Compute Scaling (clpeak, OpenCL)

Isolated GPU workload, no CPU contention. clpeak runs pure compute kernels with no display/vsync dependency.

Pkg Limit	FP32 scalar (GFLOPS)	FP32 vec2 (GFLOPS)	FP16 vec2 (GFLOPS)	Mem BW (GB/s)
15W	3,020	2,926	3,598	70.1
20W	3,763 (+25%)	4,087 (+40%)	4,892 (+36%)	70.6
25W	4,522 (+50%)	4,981 (+70%)	4,998 (+39%)	70.4
30W	4,646 (+54%)	5,459 (+86%)	6,292 (+75%)	73.2
35W	4,649 (+54%)	5,984 (+104%)	6,140 (+71%)	71.0
45W	4,216 (+40%)	6,374 (+118%)	6,958 (+93%)	72.9
54W	4,506 (+49%)	6,579 (+125%)	6,060 (+68%)	72.3

Percentages relative to 15W baseline. FP32 scalar results at 45-54W show some variance below the 30W peak; clpeak runs short bursts and GPU clock/power fluctuate between subtests, so minor run-to-run variation is expected.

Findings:

FP32 scalar compute plateaus at ~25-30W (~4,500-4,650 GFLOPS). Beyond this, additional power yields no scalar improvement.
Vectorized compute (FP32 vec2, FP16 vec2) continues scaling up to 45-54W, reaching ~6,500 GFLOPS FP32 and ~7,000 GFLOPS FP16. Wider SIMD operations can utilize more power.
Memory bandwidth is flat at ~71-73 GB/s regardless of power, limited by DDR5-5600 shared system memory (not GDDR/HBM).
The GPU will consume all available package power when uncontested: at 54W limit, the package draws 53W with GPU-only load.

GPU Compute Scaling (Geekbench 6, Vulkan)

Geekbench 6 GPU compute runs a mix of image processing and physics workloads. Subtests are short bursts, so some run-to-run variance is expected (~2% overall).

Subtest	54W	45W	35W	30W	25W	20W	54→25W	54→20W
Overall	38,935	37,703	37,359	36,334	35,528	32,821	-8.7%	-15.7%
Background Blur	20,121	19,930	19,729	20,956	19,571	18,494	-2.7%	-8.1%
Face Detection	11,559	11,714	11,611	11,842	11,584	10,872	+0.2%	-5.9%
Horizon Detection	34,933	27,682	31,494	30,661	33,462	34,010	-4.2%	-2.6%
Edge Detection	42,963	42,830	43,185	42,876	43,031	41,543	+0.2%	-3.3%
Gaussian Blur	40,563	40,317	39,358	38,347	36,045	32,620	-11.1%	-19.6%
Feature Matching	12,041	12,181	12,340	12,146	11,853	10,956	-1.6%	-9.0%
Stereo Matching	194,693	190,753	171,398	160,269	144,369	120,429	-25.8%	-38.1%
Particle Physics	159,097	157,467	146,305	124,730	126,076	110,122	-20.8%	-30.8%

From 54W down to 20W, most subtests (Background Blur, Face Detection, Edge Detection, Feature Matching) are completely flat, fully memory bandwidth-bound. Only Stereo Matching and Particle Physics scale meaningfully with power. At 20W the previously flat subtests finally start dropping as the GPU becomes compute-starved across the board. The transition from bandwidth-bound to compute-bound occurs around 25W package power.

GPU FurMark Scaling (vsync-limited data)

FurMark Vulkan at 1080p was vsync-limited by the Wayland compositor at ~29 FPS, but clock data still shows GPU behavior:

Pkg Limit	FurMark Score	Avg FPS	Sustained GPU Clock	Peak iGPU Temp
15W	983	16 FPS	800 MHz (floor)	40°C
20W	1,446	24 FPS	800-1614 MHz	45°C
25W	1,763	29 FPS	1037-1463 MHz	48°C
30W	1,738	29 FPS	1553-1773 MHz	52°C
35W	1,698	28 FPS	1799-2146 MHz	55°C
45W	1,772	30 FPS	2037-2472 MHz	62°C
54W	1,734	29 FPS	2174-2600 MHz	67°C

FPS is capped at 29-30 from 25W upward (compositor vsync). The GPU clock continues scaling from 800 MHz (15W) to 2600 MHz (54W) but produces no additional FPS; the extra frequency is wasted. For graphics workloads capped at display refresh, ~25W is sufficient.

Game Benchmark: Resident Evil 6 (DX9, 1080p)

GPU-only workload (built-in benchmark), power sweep to find the point of diminishing returns:

Pkg Limit	RE6 Score	vs Peak	GPU Clock (approx)
20W	7,599	-26.1%	~1600 MHz
25W	8,842	-14.0%	~1900 MHz
30W	9,683	-5.8%	~2100 MHz
35W	9,964	-3.1%	~2200 MHz
40W	10,205	-0.7%	~2200 MHz
45W	10,281	baseline	~2200 MHz
54W	10,280	0%	2600 MHz

Performance plateaus above 40W despite the GPU clock continuing to scale up to 2600 MHz at 54W. The likely explanation is DDR5 memory bandwidth: the 780M shares system memory (~73 GB/s measured via clpeak) rather than having dedicated GDDR. Once the GPU has enough compute power to saturate the memory bus, higher clocks produce no additional frames. The transition from compute-bound to memory-bound occurs around 30-35W, consistent with the clpeak FP32 scalar plateau at 25-30W.

Game Benchmark: Black Myth Wukong (1440p, FSR)

A GPU-heavy AAA title running at 1440p with FSR upscaling. Same scene tested at each package power level.

Pkg Limit	Avg FPS	Min FPS
25W	38	31
30W	40	34
35W	41	36
40W	43	36
45W	43	37
50W	43	38
54W	45	38

Even a demanding modern title confirms the bandwidth ceiling. From 40W to 54W the average gains just 2 FPS (+5%) despite 14W of additional package power. The GPU clocks scale upward with the extra budget, but the CUs spend most of their cycles idle, waiting on data from DDR5-5600 shared memory. Higher clocks without higher bandwidth just means more time stalling.

Both RE6 and BMW point to the same conclusion: the GPU doesn't need much power before DDR5 bandwidth becomes the limiting factor. Once the memory bus is saturated, additional package power simply drives GPU clocks higher to no effect.

iGPU Bandwidth Wall

Every GPU benchmark tells the same story. RE6 scores plateau above 40W. BMW averages plateau above 40W. Geekbench compute subtests are flat from 54W down to 25W. clpeak FP32 scalar peaks at 25-30W. The 780M's 12 CUs have far more compute throughput than the DDR5-5600 memory bus can feed.

For most users, including gamers, this means the iGPU can run at a fraction of the rated power budget and still deliver close to full performance. At 35W package power the GPU retains over 90% of its 54W gaming performance while running significantly cooler and quieter. The power savings are essentially free: the extra wattage at higher budgets just heats the chip while the CUs idle-wait on memory fetches. Only pure compute workloads (physics simulations, certain ML inference) benefit from the full power budget, and even then the gains are modest.

This mirrors what cryptocurrency miners discovered during the Ethereum era. Ethash was memory-bandwidth bound, so miners would power-limit and undervolt the GPU core while overclocking VRAM as aggressively as possible, achieving up to 30% higher hashrates with 30% less core power. The same principle applies here: when the workload is bottlenecked by memory bandwidth, throwing more compute power at it is pure waste heat.

Single-Core Power Requirements

To determine the minimum package power needed for full single-core boost, we ran stress-ng --matrix 1 at various power limits.

Pkg Limit	Single-core bogo/s	Post Tctl	Actual Pkg Draw
10W	4,105 (-29.8%)	42°C	9W
15W	5,351 (-8.5%)	54°C	14W
20W	5,738 (-1.9%)	61°C	18W
22W	5,816 (-0.6%)	63°C	19W
25W	5,850 (baseline)	63°C	20W

A single boosting Zen 4 core draws approximately 18-20W at package level (core + uncore + idle system overhead). Performance plateaus above 20W; the core has reached its max boost clock and additional power budget goes unused.

At 10W the core is severely starved: the ~6W idle system overhead leaves only 3-4W for the core itself, cutting performance by 30%. At 15W the deficit narrows to 9%. Since these are package-level limits shared with the iGPU, occasional GPU activity (desktop compositing, video decode) can consume 3-5W and temporarily starve the CPU of boost headroom. In practice, a 22W package limit is sufficient to sustain full single-core boost with margin for incidental GPU usage, relevant for bursty single-threaded tasks like application launches, web page loads, and UI interactions.

Recommended Power Configuration

Based on all testing, the optimal configuration for this system is:

Parameter	Value
TjMax	95°C (via unlocked BIOS)
PL1 / STAPM (sustained)	54W
PL2 / Fast PPT (burst)	65W
STAPM time constant	10s
Slow PPT time constant	5s

ryzenadj --stapm-limit=54000 --fast-limit=65000 --slow-limit=54000 --stapm-time=10 --slow-time=5

This allows the CPU to burst at 65W for ~10 seconds (benefiting compilations, app launches, and other short multi-threaded loads) before settling to 54W sustained. The burst phase approaches but stays just under TjMax; sustained operation stabilizes at ~88-89°C with fan headroom remaining.

These ryzenadj settings are volatile and reset on reboot. A systemd service (ryzenadj-tuning.service) reapplies them at boot.

Burst behavior tested

stapm-time	Burst at 60W+	Hits TjMax?	Sustained Tctl	All-core bogo/s
default (~5s)	~5s	No (peak 92°C)	88°C	50,140
20s	~15s	Yes (at 10s)	89°C	49,856
30s	~25s	Yes (at 10s)	89°C	50,264

Sustained performance is identical regardless of burst duration (~50K bogo/s). The burst only helps workloads that complete within the burst window.

System Power (Wall)

State	Wall Power	Package Power	Tctl	Fan
Idle (desktop)	12-15W	~7W	~35°C	~1350 RPM (inaudible)
All-core burst (PL2)	90W	64W	90°C	~3300 RPM (ramping)
All-core sustained (PL1)	75W	54W	88°C	~3270 RPM

Platform overhead covers PSU conversion losses, VRM inefficiency, DDR5, NVMe, Wi-Fi, and display output. The overhead rises under load due to VRM losses at higher current draw. For an 8-core Zen 4 with RDNA 3 iGPU driving a 5120x1440 ultrawide, 12-15W idle at the wall is remarkably efficient.

Summary

Parameter	Value
Stock STAPM / TjMax	54W / 85°C
Optimal TjMax	95°C (+7.7% CPU perf)
Cooler sustained dissipation	~58-60W
Recommended power config	PL1 54W / PL2 65W / 10s burst
Single-core full boost floor	~20W package
Geekbench 6 Vulkan (54W)	38,935
RE6 1080p (54W)	10,280
BMW 1440p FSR (54W)	45 avg FPS
GPU compute power wall	~25-30W package (~20-25W GPU after idle overhead)
GPU gaming bandwidth wall	~35-40W package (>90% perf retained)
GPU memory bandwidth ceiling	~73 GB/s (DDR5-5600 shared)
GPU max clock observed	2600 MHz (at 54W, GPU-only)
ryzenadj STAPM/PPT control	Works (volatile, resets on reboot)
ryzenadj TjMax control	Does NOT work (SMU rejects)

It is entirely possible that Proton translation puts more strain on overall system memory bandwidth thus leaving less for the GPU, causing a significant bottleneck. However, given how heavily Valve is invested in the compatibility layer and validated AMD APUs as their choice, the chance of that being the main cause of concern is low.

Takeaways

We believe that DDR5-5600 is a significant limiting factor for Ryzen systems equipped with the Radeon 780M iGPU. In discrete GPU testing, DDR5 has shown minimal FPS uplift when the system is GPU-bound, and the added bandwidth moving from the Zen 3 / DDR4 era to Zen 4's DDR5 still falls short of what the 12 CU RDNA 3 integrated GPU demands.

LPDDR5X may help address this if OEMs implement a 128-bit bus. At 7500 MT/s on a 128-bit bus, theoretical bandwidth reaches 120 GB/s — a meaningful step up from dual-channel DDR5-5600's ~89.6 GB/s. For reference, Strix Halo achieves 256 GB/s with its 256-bit LPDDR5X-8000 bus (or ~275 GB/s at 8533 MT/s). However, a 128-bit LPDDR5X configuration adds cost and complexity, requiring 8 DRAM dies at 16 bits each. We may never see this kind of configuration purposefully built just to let an iGPU perform at its best.

For everyday tasks, these Zen 4 APUs are extremely capable. With power limits configured properly, they strike a good balance between power efficiency (in an x86 context) and responsiveness. Shared memory bandwidth becomes a noticeable bottleneck primarily during gaming — even at 1080p in more demanding titles — and the constraint only grows at higher resolutions. This makes upscaling technologies like AMD's FSR essentially a must for a playable experience.

Test Environment

CachyOS (Arch-based rolling), kernel 6.19.5-3-cachyos
sched_ext with LAVD scheduler (v1.0.21)
amd-pstate-epp driver, powersave governor (hardware-managed P-states)
Game benchmarks run via Steam with Proton compatibility layer; results are representative of Linux/Proton performance and may differ from native Windows

Tools Used

ryzenadj v0.17.0 — runtime power limit adjustment via SMU
ryzen_smu kernel module — direct SMU access for ryzenadj
uefiextract / uefireplace v0.28.0 — BIOS ROM extraction and patching
ifrextractor-rs v1.6.0 — IFR form data extraction from PE32 binaries
AfuEfix64.efi — AMI BIOS flash utility (UEFI Shell)
FurMark v2.10.2 — GPU stress test (Vulkan)
clpeak v1.1.5 — OpenCL compute peak performance profiler
stress-ng — CPU stress testing with bogo ops metrics
Geekbench 6 v6.5.0 — cross-platform GPU compute benchmark (Vulkan)
7zip — LZMA compression benchmark
Resident Evil 6 — built-in benchmark (DX9, 1080p, Steam/Proton)
Black Myth Wukong — in-game benchmark (1440p, FSR, Steam/Proton)

Ryzen 7 8745HS (Zen 4 APU, Phoenix) Part I: General Performance Analysis

BIOS Modification

CPU Benchmarks

Cooler Thermal Capacity