Ryzen 7 8745HS (Zen 4 APU, Phoenix): In-depth Performance Analysis

Ryzen 7 8745HS power and thermal analysis in the AceMagic W1 mini PC. BIOS unlock boosts CPU performance 7.7% by raising TjMax from 85 to 95°C. GPU benchmarks reveal a DDR5 bandwidth wall at 35-40W, where the Radeon 780M retains over 90% of peak gaming performance at a fraction of full power.

Ryzen 7 8745HS (Zen 4 APU, Phoenix): In-depth Performance Analysis

System: AceMagic W1 mini PC, AMI BIOS PHXPM7B0, 32GB DDR5-5600, CachyOS

iGPU: AMD Radeon 780M (12 CUs, RDNA 3)

EC: ITE IT5570, single fan cooling solution

BIOS Modification

The stock BIOS from AceMagic was bare and they hide all advanced menus (AMD CBS, AMD PBS, ACPI, fTPM, SATA, etc.) behind a "Hide setup" flag (QuestionId 0x96, VarOffset 0x9D in the Setup NVRAM variable). This flag defaults to Enabled and is itself hidden by SuppressIf True, making it inaccessible through the BIOS UI.

Patch method: Extracted the Setup DXE driver PE32 body (GUID 899407D7-99FE-43D8-9A21-79EC328CAC21), replaced all 16 occurrences of the IFR byte pattern 12 06 96 00 01 00 (EqIdVal QuestionId:0x96, Value:0x01) with 12 06 96 00 02 00 (Value:0x02). Since "Hide setup" only has values 0 and 1, comparing against 2 means the SuppressIf condition never triggers, permanently unhiding all menus. Reinserted patched PE32 using uefireplace and flashed via AfuEfix64.efi from UEFI Shell.

Failed approaches: Direct NVRAM writes (setup_var.efi, RU.efi), changing IFR OneOfOption defaults (AMITSE ignores them), flashrom SPI read (crashes system; AMD PSP owns the SPI bus), ryzenadj --tctl-temp (SMU accepts but ignores the value).

After unlocking, TjMax was changed from 85°C to 95°C via AMD CBS > SMU Common Options > Thermal Control.

CPU Benchmarks

All tests at stock 54W unless noted. TjMax column shows the configured thermal limit.

Config TjMax 7zip (MIPS) stress-ng (bogo/s) Peak Tctl Peak Fan
Stock baseline 85°C 91,758 47,022 85.1°C 3317 RPM
BIOS unlocked 95°C 98,828 (+7.7%) 49,961 (+6.3%) 94.7°C 3412 RPM
65W ryzenadj 95°C 99,554 (+8.5%) 51,358 (+9.2%) 95.1°C 3510 RPM
35W ryzenadj 95°C 89,445 (-2.5%) 42,990 (-8.6%) 76.3°C 2743 RPM

Raising TjMax from 85 to 95°C is the biggest single gain: the CPU was thermally throttling at 85°C and leaving performance on the table. Increasing power to 65W yields marginal further improvement because the cooler can't dissipate more than ~54W sustained; the CPU just hits the 95°C thermal wall sooner. At 35W the system runs much cooler and quieter with only a small performance penalty.

Cooler Thermal Capacity

To find the exact thermal ceiling, we ran CPU-only stress tests (stress-ng matrix, 60s) at progressively lower power limits and observed where the temperature stabilizes vs hits TjMax.

Limit Sustained Pkg Sustained Tctl Peak Tctl Fan RPM (PWM) CPU bogo/s
65W 65→59W (throttled) 95.0°C (clamped) 95.1°C 3438 (255 max) 51,467
60W 59W (throttled) 95.0°C (clamped) 95.1°C 3438 (255 max) 51,127
55W 55W (sustained) 89-90°C (rising) 90.3°C 3312 (242) 49,491
50W 50W (sustained) 83-84°C (stable) 84.2°C 3180 (227) 47,988
45W 45W (sustained) 77-78°C (stable) 77.8°C 3041 (212) 46,283

The heatsink's sustained dissipation limit is ~58-60W. At 65W the SMU immediately thermal-throttles to ~59W to hold TjMax. At 60W it still reaches 95°C and throttles slightly. At 55W the temperature stabilizes at 90°C with headroom, though it would likely reach 95°C in a longer run. It holds steady within 60s.

65W and 60W produce nearly identical performance (~51K bogo/s) because both end up thermally clamped at the same effective wattage. The fan maxes out at PWM 255 (3438 RPM) for both, the cooler giving everything it has.

Power limits were adjusted at runtime using ryzenadj (STAPM, fast PPT, slow PPT all set to the same value) and sustained for 60s under stress-ng --matrix CPU load. The fan and thermal controller were left in automatic mode throughout.

The manufacturer's stock TjMax of 85°C is the real bottleneck. The cooler stabilizes at ~90°C when dissipating 54W, but with TjMax set to 85°C the CPU begins throttling 5°C before the cooler reaches its actual limit. Raising TjMax to 95°C allows the CPU to use its full 54W budget without premature throttling, which is the single biggest performance unlock (+7.7%).

Practical recommendations:

  • Stock 54W is already near-optimal. It sits just 1W below the tested 55W sweet spot, meaning the manufacturer's power limit is well-chosen for this cooler, once TjMax is raised to 95°C via the unlocked BIOS to stop premature throttling.
  • For tinkers: Repasting with quality thermal compound and raising the power limit to 60W in the unlocked BIOS could squeeze out a few more percent. Hitting TjMax 95°C is almost certain at this wattage, but the chip handles it fine. TjMax can be raised further to 100°C, though there is little practical reason to do so; the gains are marginal and component longevity suffers.
  • Thermal headroom scales linearly: each 5W reduction drops sustained temperature by roughly 6-7°C, useful for tuning noise/performance tradeoffs (e.g. 45W for a near-silent desk setup).

CPU + GPU Power Budget Sharing

The CPU and iGPU share a single 54W package power budget (STAPM). When both are loaded simultaneously, the SMU dynamically splits power between them.

Combined Load Tests (stress-ng + FurMark VK 1080p)

Config FurMark Score CPU bogo/s Sustained Pkg Peak Tctl Peak iGPU
GPU only (54W) 1,753 29W 69.7°C 68°C
CPU only (54W) 49,892 54W 95.0°C 63°C
CPU+GPU (54W) 1,533 (-12.5%) 41,881 (-16.1%) 54W 79.6°C 67°C
CPU+GPU (75W) 1,578 (-10%) 45,259 (-9.3%) 65W 88.8°C 72°C

Under combined load at 54W, the CPU loses 16% and the GPU loses 12.5% of their standalone performance. Raising the limit to 75W helps combined workloads (+8% CPU, +3% GPU) as the package settles at ~65W before thermal throttling kicks in.

Power Budget Sweep (CPU+GPU Combined)

Limit FurMark Score CPU bogo/s GPU Clock
15W 704 10,056 800 MHz
25W 1,272 18,039 800-1571 MHz
35W 1,529 28,136 1154-1944 MHz
45W 1,543 35,447 1521-2017 MHz

The GPU score plateaus at ~1,530 above 35W under combined load. Beyond this point, additional power goes entirely to the CPU. This is consistent with the GPU's ~25-30W effective power ceiling (see clpeak results below).

GPU Compute Scaling (clpeak, OpenCL)

Isolated GPU workload, no CPU contention. clpeak runs pure compute kernels with no display/vsync dependency.

Pkg Limit FP32 scalar (GFLOPS) FP32 vec2 (GFLOPS) FP16 vec2 (GFLOPS) Mem BW (GB/s)
15W 3,020 2,926 3,598 70.1
20W 3,763 (+25%) 4,087 (+40%) 4,892 (+36%) 70.6
25W 4,522 (+50%) 4,981 (+70%) 4,998 (+39%) 70.4
30W 4,646 (+54%) 5,459 (+86%) 6,292 (+75%) 73.2
35W 4,649 (+54%) 5,984 (+104%) 6,140 (+71%) 71.0
45W 4,216 (+40%) 6,374 (+118%) 6,958 (+93%) 72.9
54W 4,506 (+49%) 6,579 (+125%) 6,060 (+68%) 72.3

Percentages relative to 15W baseline. FP32 scalar results at 45-54W show some variance below the 30W peak; clpeak runs short bursts and GPU clock/power fluctuate between subtests, so minor run-to-run variation is expected.

Findings:

  • FP32 scalar compute plateaus at ~25-30W (~4,500-4,650 GFLOPS). Beyond this, additional power yields no scalar improvement.
  • Vectorized compute (FP32 vec2, FP16 vec2) continues scaling up to 45-54W, reaching ~6,500 GFLOPS FP32 and ~7,000 GFLOPS FP16. Wider SIMD operations can utilize more power.
  • Memory bandwidth is flat at ~71-73 GB/s regardless of power, limited by DDR5-5600 shared system memory (not GDDR/HBM).
  • The GPU will consume all available package power when uncontested: at 54W limit, the package draws 53W with GPU-only load.

GPU Compute Scaling (Geekbench 6, Vulkan)

Geekbench 6 GPU compute runs a mix of image processing and physics workloads. Subtests are short bursts, so some run-to-run variance is expected (~2% overall).

Subtest 54W 45W 35W 30W 25W 20W 54→25W 54→20W
Overall 38,935 37,703 37,359 36,334 35,528 32,821 -8.7% -15.7%
Background Blur 20,121 19,930 19,729 20,956 19,571 18,494 -2.7% -8.1%
Face Detection 11,559 11,714 11,611 11,842 11,584 10,872 +0.2% -5.9%
Horizon Detection 34,933 27,682 31,494 30,661 33,462 34,010 -4.2% -2.6%
Edge Detection 42,963 42,830 43,185 42,876 43,031 41,543 +0.2% -3.3%
Gaussian Blur 40,563 40,317 39,358 38,347 36,045 32,620 -11.1% -19.6%
Feature Matching 12,041 12,181 12,340 12,146 11,853 10,956 -1.6% -9.0%
Stereo Matching 194,693 190,753 171,398 160,269 144,369 120,429 -25.8% -38.1%
Particle Physics 159,097 157,467 146,305 124,730 126,076 110,122 -20.8% -30.8%

From 54W down to 20W, most subtests (Background Blur, Face Detection, Edge Detection, Feature Matching) are completely flat, fully memory bandwidth-bound. Only Stereo Matching and Particle Physics scale meaningfully with power. At 20W the previously flat subtests finally start dropping as the GPU becomes compute-starved across the board. The transition from bandwidth-bound to compute-bound occurs around 25W package power.

GPU FurMark Scaling (vsync-limited data)

FurMark Vulkan at 1080p was vsync-limited by the Wayland compositor at ~29 FPS, but clock data still shows GPU behavior:

Pkg Limit FurMark Score Avg FPS Sustained GPU Clock Peak iGPU Temp
15W 983 16 FPS 800 MHz (floor) 40°C
20W 1,446 24 FPS 800-1614 MHz 45°C
25W 1,763 29 FPS 1037-1463 MHz 48°C
30W 1,738 29 FPS 1553-1773 MHz 52°C
35W 1,698 28 FPS 1799-2146 MHz 55°C
45W 1,772 30 FPS 2037-2472 MHz 62°C
54W 1,734 29 FPS 2174-2600 MHz 67°C

FPS is capped at 29-30 from 25W upward (compositor vsync). The GPU clock continues scaling from 800 MHz (15W) to 2600 MHz (54W) but produces no additional FPS; the extra frequency is wasted. For graphics workloads capped at display refresh, ~25W is sufficient.

Game Benchmark: Resident Evil 6 (DX9, 1080p)

GPU-only workload (built-in benchmark), power sweep to find the point of diminishing returns:

Pkg Limit RE6 Score vs Peak GPU Clock (approx)
20W 7,599 -26.1% ~1600 MHz
25W 8,842 -14.0% ~1900 MHz
30W 9,683 -5.8% ~2100 MHz
35W 9,964 -3.1% ~2200 MHz
40W 10,205 -0.7% ~2200 MHz
45W 10,281 baseline ~2200 MHz
54W 10,280 0% 2600 MHz

Performance plateaus above 40W despite the GPU clock continuing to scale up to 2600 MHz at 54W. The likely explanation is DDR5 memory bandwidth: the 780M shares system memory (~73 GB/s measured via clpeak) rather than having dedicated GDDR. Once the GPU has enough compute power to saturate the memory bus, higher clocks produce no additional frames. The transition from compute-bound to memory-bound occurs around 30-35W, consistent with the clpeak FP32 scalar plateau at 25-30W.

Game Benchmark: Black Myth Wukong (1440p, FSR)

A GPU-heavy AAA title running at 1440p with FSR upscaling. Same scene tested at each package power level.

Pkg Limit Avg FPS Min FPS
25W 38 31
30W 40 34
35W 41 36
40W 43 36
45W 43 37
50W 43 38
54W 45 38

Even a demanding modern title confirms the bandwidth ceiling. From 40W to 54W the average gains just 2 FPS (+5%) despite 14W of additional package power. The GPU clocks scale upward with the extra budget, but the CUs spend most of their cycles idle, waiting on data from DDR5-5600 shared memory. Higher clocks without higher bandwidth just means more time stalling.

Both RE6 and BMW point to the same conclusion: the GPU doesn't need much power before DDR5 bandwidth becomes the limiting factor. Once the memory bus is saturated, additional package power simply drives GPU clocks higher to no effect.

iGPU Bandwidth Wall

Every GPU benchmark tells the same story. RE6 scores plateau above 40W. BMW averages plateau above 40W. Geekbench compute subtests are flat from 54W down to 25W. clpeak FP32 scalar peaks at 25-30W. The 780M's 12 CUs have far more compute throughput than the DDR5-5600 memory bus can feed.

For most users, including gamers, this means the iGPU can run at a fraction of the rated power budget and still deliver close to full performance. At 35W package power the GPU retains over 90% of its 54W gaming performance while running significantly cooler and quieter. The power savings are essentially free: the extra wattage at higher budgets just heats the chip while the CUs idle-wait on memory fetches. Only pure compute workloads (physics simulations, certain ML inference) benefit from the full power budget, and even then the gains are modest.

This mirrors what cryptocurrency miners discovered during the Ethereum era. Ethash was memory-bandwidth bound, so miners would power-limit and undervolt the GPU core while overclocking VRAM as aggressively as possible, achieving up to 30% higher hashrates with 30% less core power. The same principle applies here: when the workload is bottlenecked by memory bandwidth, throwing more compute power at it is pure waste heat.

Single-Core Power Requirements

To determine the minimum package power needed for full single-core boost, we ran stress-ng --matrix 1 at various power limits.

Pkg Limit Single-core bogo/s Post Tctl Actual Pkg Draw
10W 4,105 (-29.8%) 42°C 9W
15W 5,351 (-8.5%) 54°C 14W
20W 5,738 (-1.9%) 61°C 18W
22W 5,816 (-0.6%) 63°C 19W
25W 5,850 (baseline) 63°C 20W

A single boosting Zen 4 core draws approximately 18-20W at package level (core + uncore + idle system overhead). Performance plateaus above 20W; the core has reached its max boost clock and additional power budget goes unused.

At 10W the core is severely starved: the ~6W idle system overhead leaves only 3-4W for the core itself, cutting performance by 30%. At 15W the deficit narrows to 9%. Since these are package-level limits shared with the iGPU, occasional GPU activity (desktop compositing, video decode) can consume 3-5W and temporarily starve the CPU of boost headroom. In practice, a 22W package limit is sufficient to sustain full single-core boost with margin for incidental GPU usage, relevant for bursty single-threaded tasks like application launches, web page loads, and UI interactions.

Based on all testing, the optimal configuration for this system is:

Parameter Value
TjMax 95°C (via unlocked BIOS)
PL1 / STAPM (sustained) 54W
PL2 / Fast PPT (burst) 65W
STAPM time constant 10s
Slow PPT time constant 5s
ryzenadj --stapm-limit=54000 --fast-limit=65000 --slow-limit=54000 --stapm-time=10 --slow-time=5

This allows the CPU to burst at 65W for ~10 seconds (benefiting compilations, app launches, and other short multi-threaded loads) before settling to 54W sustained. The burst phase approaches but stays just under TjMax; sustained operation stabilizes at ~88-89°C with fan headroom remaining.

These ryzenadj settings are volatile and reset on reboot. A systemd service (ryzenadj-tuning.service) reapplies them at boot.

Burst behavior tested

stapm-time Burst at 60W+ Hits TjMax? Sustained Tctl All-core bogo/s
default (~5s) ~5s No (peak 92°C) 88°C 50,140
20s ~15s Yes (at 10s) 89°C 49,856
30s ~25s Yes (at 10s) 89°C 50,264

Sustained performance is identical regardless of burst duration (~50K bogo/s). The burst only helps workloads that complete within the burst window.

System Power (Wall)

State Wall Power Package Power Tctl Fan
Idle (desktop) 12-15W ~7W ~35°C ~1350 RPM (inaudible)
All-core burst (PL2) 90W 64W 90°C ~3300 RPM (ramping)
All-core sustained (PL1) 75W 54W 88°C ~3270 RPM

Platform overhead covers PSU conversion losses, VRM inefficiency, DDR5, NVMe, Wi-Fi, and display output. The overhead rises under load due to VRM losses at higher current draw. For an 8-core Zen 4 with RDNA 3 iGPU driving a 5120x1440 ultrawide, 12-15W idle at the wall is remarkably efficient.

Summary

Parameter Value
Stock STAPM / TjMax 54W / 85°C
Optimal TjMax 95°C (+7.7% CPU perf)
Cooler sustained dissipation ~58-60W
Recommended power config PL1 54W / PL2 65W / 10s burst
Single-core full boost floor ~20W package
Geekbench 6 Vulkan (54W) 38,935
RE6 1080p (54W) 10,280
BMW 1440p FSR (54W) 45 avg FPS
GPU compute power wall ~25-30W package (~20-25W GPU after idle overhead)
GPU gaming bandwidth wall ~35-40W package (>90% perf retained)
GPU memory bandwidth ceiling ~73 GB/s (DDR5-5600 shared)
GPU max clock observed 2600 MHz (at 54W, GPU-only)
ryzenadj STAPM/PPT control Works (volatile, resets on reboot)
ryzenadj TjMax control Does NOT work (SMU rejects)

It is entirely possible that Proton translation puts more strain on overall system memory bandwidth thus leaving less for the GPU, causing a significant bottleneck. However, given how heavily Valve is invested in the compatibility layer and validated AMD APUs as their choice, the chance of that being the main cause of concern is low.

Takeaways

We believe that DDR5-5600 is a significant limiting factor for Ryzen systems equipped with the Radeon 780M iGPU. In discrete GPU testing, DDR5 has shown minimal FPS uplift when the system is GPU-bound, and the added bandwidth moving from the Zen 3 / DDR4 era to Zen 4's DDR5 still falls short of what the 12 CU RDNA 3 integrated GPU demands.

LPDDR5X may help address this if OEMs implement a 128-bit bus. At 7500 MT/s on a 128-bit bus, theoretical bandwidth reaches 120 GB/s — a meaningful step up from dual-channel DDR5-5600's ~89.6 GB/s. For reference, Strix Halo achieves 256 GB/s with its 256-bit LPDDR5X-8000 bus (or ~275 GB/s at 8533 MT/s). However, a 128-bit LPDDR5X configuration adds cost and complexity, requiring 8 DRAM dies at 16 bits each. We may never see this kind of configuration purposefully built just to let an iGPU perform at its best.

For everyday tasks, these Zen 4 APUs are extremely capable. With power limits configured properly, they strike a good balance between power efficiency (in an x86 context) and responsiveness. Shared memory bandwidth becomes a noticeable bottleneck primarily during gaming — even at 1080p in more demanding titles — and the constraint only grows at higher resolutions. This makes upscaling technologies like AMD's FSR essentially a must for a playable experience.

Test Environment

  • CachyOS (Arch-based rolling), kernel 6.19.5-3-cachyos
  • sched_ext with LAVD scheduler (v1.0.21)
  • amd-pstate-epp driver, powersave governor (hardware-managed P-states)
  • Game benchmarks run via Steam with Proton compatibility layer; results are representative of Linux/Proton performance and may differ from native Windows

Tools Used

  • ryzenadj v0.17.0 — runtime power limit adjustment via SMU
  • ryzen_smu kernel module — direct SMU access for ryzenadj
  • uefiextract / uefireplace v0.28.0 — BIOS ROM extraction and patching
  • ifrextractor-rs v1.6.0 — IFR form data extraction from PE32 binaries
  • AfuEfix64.efi — AMI BIOS flash utility (UEFI Shell)
  • FurMark v2.10.2 — GPU stress test (Vulkan)
  • clpeak v1.1.5 — OpenCL compute peak performance profiler
  • stress-ng — CPU stress testing with bogo ops metrics
  • Geekbench 6 v6.5.0 — cross-platform GPU compute benchmark (Vulkan)
  • 7zip — LZMA compression benchmark
  • Resident Evil 6 — built-in benchmark (DX9, 1080p, Steam/Proton)
  • Black Myth Wukong — in-game benchmark (1440p, FSR, Steam/Proton)