Ryzen 7 8745HS (Zen 4 APU, Phoenix): In-depth Performance Analysis
Ryzen 7 8745HS power and thermal analysis in the AceMagic W1 mini PC. BIOS unlock boosts CPU performance 7.7% by raising TjMax from 85 to 95°C. GPU benchmarks reveal a DDR5 bandwidth wall at 35-40W, where the Radeon 780M retains over 90% of peak gaming performance at a fraction of full power.
System: AceMagic W1 mini PC, AMI BIOS PHXPM7B0, 32GB DDR5-5600, CachyOS
iGPU: AMD Radeon 780M (12 CUs, RDNA 3)
EC: ITE IT5570, single fan cooling solution
BIOS Modification
The stock BIOS from AceMagic was bare and they hide all advanced menus (AMD CBS, AMD PBS, ACPI, fTPM, SATA, etc.) behind a "Hide setup" flag (QuestionId 0x96, VarOffset 0x9D in the Setup NVRAM variable). This flag defaults to Enabled and is itself hidden by SuppressIf True, making it inaccessible through the BIOS UI.
Patch method: Extracted the Setup DXE driver PE32 body (GUID 899407D7-99FE-43D8-9A21-79EC328CAC21), replaced all 16 occurrences of the IFR byte pattern 12 06 96 00 01 00 (EqIdVal QuestionId:0x96, Value:0x01) with 12 06 96 00 02 00 (Value:0x02). Since "Hide setup" only has values 0 and 1, comparing against 2 means the SuppressIf condition never triggers, permanently unhiding all menus. Reinserted patched PE32 using uefireplace and flashed via AfuEfix64.efi from UEFI Shell.
Failed approaches: Direct NVRAM writes (setup_var.efi, RU.efi), changing IFR OneOfOption defaults (AMITSE ignores them), flashrom SPI read (crashes system; AMD PSP owns the SPI bus), ryzenadj --tctl-temp (SMU accepts but ignores the value).
After unlocking, TjMax was changed from 85°C to 95°C via AMD CBS > SMU Common Options > Thermal Control.
CPU Benchmarks
All tests at stock 54W unless noted. TjMax column shows the configured thermal limit.
| Config | TjMax | 7zip (MIPS) | stress-ng (bogo/s) | Peak Tctl | Peak Fan |
|---|---|---|---|---|---|
| Stock baseline | 85°C | 91,758 | 47,022 | 85.1°C | 3317 RPM |
| BIOS unlocked | 95°C | 98,828 (+7.7%) | 49,961 (+6.3%) | 94.7°C | 3412 RPM |
| 65W ryzenadj | 95°C | 99,554 (+8.5%) | 51,358 (+9.2%) | 95.1°C | 3510 RPM |
| 35W ryzenadj | 95°C | 89,445 (-2.5%) | 42,990 (-8.6%) | 76.3°C | 2743 RPM |
Raising TjMax from 85 to 95°C is the biggest single gain: the CPU was thermally throttling at 85°C and leaving performance on the table. Increasing power to 65W yields marginal further improvement because the cooler can't dissipate more than ~54W sustained; the CPU just hits the 95°C thermal wall sooner. At 35W the system runs much cooler and quieter with only a small performance penalty.
Cooler Thermal Capacity
To find the exact thermal ceiling, we ran CPU-only stress tests (stress-ng matrix, 60s) at progressively lower power limits and observed where the temperature stabilizes vs hits TjMax.
| Limit | Sustained Pkg | Sustained Tctl | Peak Tctl | Fan RPM (PWM) | CPU bogo/s |
|---|---|---|---|---|---|
| 65W | 65→59W (throttled) | 95.0°C (clamped) | 95.1°C | 3438 (255 max) | 51,467 |
| 60W | 59W (throttled) | 95.0°C (clamped) | 95.1°C | 3438 (255 max) | 51,127 |
| 55W | 55W (sustained) | 89-90°C (rising) | 90.3°C | 3312 (242) | 49,491 |
| 50W | 50W (sustained) | 83-84°C (stable) | 84.2°C | 3180 (227) | 47,988 |
| 45W | 45W (sustained) | 77-78°C (stable) | 77.8°C | 3041 (212) | 46,283 |
The heatsink's sustained dissipation limit is ~58-60W. At 65W the SMU immediately thermal-throttles to ~59W to hold TjMax. At 60W it still reaches 95°C and throttles slightly. At 55W the temperature stabilizes at 90°C with headroom, though it would likely reach 95°C in a longer run. It holds steady within 60s.
65W and 60W produce nearly identical performance (~51K bogo/s) because both end up thermally clamped at the same effective wattage. The fan maxes out at PWM 255 (3438 RPM) for both, the cooler giving everything it has.
Power limits were adjusted at runtime using ryzenadj (STAPM, fast PPT, slow PPT all set to the same value) and sustained for 60s under stress-ng --matrix CPU load. The fan and thermal controller were left in automatic mode throughout.
The manufacturer's stock TjMax of 85°C is the real bottleneck. The cooler stabilizes at ~90°C when dissipating 54W, but with TjMax set to 85°C the CPU begins throttling 5°C before the cooler reaches its actual limit. Raising TjMax to 95°C allows the CPU to use its full 54W budget without premature throttling, which is the single biggest performance unlock (+7.7%).
Practical recommendations:
- Stock 54W is already near-optimal. It sits just 1W below the tested 55W sweet spot, meaning the manufacturer's power limit is well-chosen for this cooler, once TjMax is raised to 95°C via the unlocked BIOS to stop premature throttling.
- For tinkers: Repasting with quality thermal compound and raising the power limit to 60W in the unlocked BIOS could squeeze out a few more percent. Hitting TjMax 95°C is almost certain at this wattage, but the chip handles it fine. TjMax can be raised further to 100°C, though there is little practical reason to do so; the gains are marginal and component longevity suffers.
- Thermal headroom scales linearly: each 5W reduction drops sustained temperature by roughly 6-7°C, useful for tuning noise/performance tradeoffs (e.g. 45W for a near-silent desk setup).
CPU + GPU Power Budget Sharing
The CPU and iGPU share a single 54W package power budget (STAPM). When both are loaded simultaneously, the SMU dynamically splits power between them.
Combined Load Tests (stress-ng + FurMark VK 1080p)
| Config | FurMark Score | CPU bogo/s | Sustained Pkg | Peak Tctl | Peak iGPU |
|---|---|---|---|---|---|
| GPU only (54W) | 1,753 | — | 29W | 69.7°C | 68°C |
| CPU only (54W) | — | 49,892 | 54W | 95.0°C | 63°C |
| CPU+GPU (54W) | 1,533 (-12.5%) | 41,881 (-16.1%) | 54W | 79.6°C | 67°C |
| CPU+GPU (75W) | 1,578 (-10%) | 45,259 (-9.3%) | 65W | 88.8°C | 72°C |
Under combined load at 54W, the CPU loses 16% and the GPU loses 12.5% of their standalone performance. Raising the limit to 75W helps combined workloads (+8% CPU, +3% GPU) as the package settles at ~65W before thermal throttling kicks in.
Power Budget Sweep (CPU+GPU Combined)
| Limit | FurMark Score | CPU bogo/s | GPU Clock |
|---|---|---|---|
| 15W | 704 | 10,056 | 800 MHz |
| 25W | 1,272 | 18,039 | 800-1571 MHz |
| 35W | 1,529 | 28,136 | 1154-1944 MHz |
| 45W | 1,543 | 35,447 | 1521-2017 MHz |
The GPU score plateaus at ~1,530 above 35W under combined load. Beyond this point, additional power goes entirely to the CPU. This is consistent with the GPU's ~25-30W effective power ceiling (see clpeak results below).
GPU Compute Scaling (clpeak, OpenCL)
Isolated GPU workload, no CPU contention. clpeak runs pure compute kernels with no display/vsync dependency.
| Pkg Limit | FP32 scalar (GFLOPS) | FP32 vec2 (GFLOPS) | FP16 vec2 (GFLOPS) | Mem BW (GB/s) |
|---|---|---|---|---|
| 15W | 3,020 | 2,926 | 3,598 | 70.1 |
| 20W | 3,763 (+25%) | 4,087 (+40%) | 4,892 (+36%) | 70.6 |
| 25W | 4,522 (+50%) | 4,981 (+70%) | 4,998 (+39%) | 70.4 |
| 30W | 4,646 (+54%) | 5,459 (+86%) | 6,292 (+75%) | 73.2 |
| 35W | 4,649 (+54%) | 5,984 (+104%) | 6,140 (+71%) | 71.0 |
| 45W | 4,216 (+40%) | 6,374 (+118%) | 6,958 (+93%) | 72.9 |
| 54W | 4,506 (+49%) | 6,579 (+125%) | 6,060 (+68%) | 72.3 |
Percentages relative to 15W baseline. FP32 scalar results at 45-54W show some variance below the 30W peak; clpeak runs short bursts and GPU clock/power fluctuate between subtests, so minor run-to-run variation is expected.
Findings:
- FP32 scalar compute plateaus at ~25-30W (~4,500-4,650 GFLOPS). Beyond this, additional power yields no scalar improvement.
- Vectorized compute (FP32 vec2, FP16 vec2) continues scaling up to 45-54W, reaching ~6,500 GFLOPS FP32 and ~7,000 GFLOPS FP16. Wider SIMD operations can utilize more power.
- Memory bandwidth is flat at ~71-73 GB/s regardless of power, limited by DDR5-5600 shared system memory (not GDDR/HBM).
- The GPU will consume all available package power when uncontested: at 54W limit, the package draws 53W with GPU-only load.
GPU Compute Scaling (Geekbench 6, Vulkan)
Geekbench 6 GPU compute runs a mix of image processing and physics workloads. Subtests are short bursts, so some run-to-run variance is expected (~2% overall).
| Subtest | 54W | 45W | 35W | 30W | 25W | 20W | 54→25W | 54→20W |
|---|---|---|---|---|---|---|---|---|
| Overall | 38,935 | 37,703 | 37,359 | 36,334 | 35,528 | 32,821 | -8.7% | -15.7% |
| Background Blur | 20,121 | 19,930 | 19,729 | 20,956 | 19,571 | 18,494 | -2.7% | -8.1% |
| Face Detection | 11,559 | 11,714 | 11,611 | 11,842 | 11,584 | 10,872 | +0.2% | -5.9% |
| Horizon Detection | 34,933 | 27,682 | 31,494 | 30,661 | 33,462 | 34,010 | -4.2% | -2.6% |
| Edge Detection | 42,963 | 42,830 | 43,185 | 42,876 | 43,031 | 41,543 | +0.2% | -3.3% |
| Gaussian Blur | 40,563 | 40,317 | 39,358 | 38,347 | 36,045 | 32,620 | -11.1% | -19.6% |
| Feature Matching | 12,041 | 12,181 | 12,340 | 12,146 | 11,853 | 10,956 | -1.6% | -9.0% |
| Stereo Matching | 194,693 | 190,753 | 171,398 | 160,269 | 144,369 | 120,429 | -25.8% | -38.1% |
| Particle Physics | 159,097 | 157,467 | 146,305 | 124,730 | 126,076 | 110,122 | -20.8% | -30.8% |
From 54W down to 20W, most subtests (Background Blur, Face Detection, Edge Detection, Feature Matching) are completely flat, fully memory bandwidth-bound. Only Stereo Matching and Particle Physics scale meaningfully with power. At 20W the previously flat subtests finally start dropping as the GPU becomes compute-starved across the board. The transition from bandwidth-bound to compute-bound occurs around 25W package power.
GPU FurMark Scaling (vsync-limited data)
FurMark Vulkan at 1080p was vsync-limited by the Wayland compositor at ~29 FPS, but clock data still shows GPU behavior:
| Pkg Limit | FurMark Score | Avg FPS | Sustained GPU Clock | Peak iGPU Temp |
|---|---|---|---|---|
| 15W | 983 | 16 FPS | 800 MHz (floor) | 40°C |
| 20W | 1,446 | 24 FPS | 800-1614 MHz | 45°C |
| 25W | 1,763 | 29 FPS | 1037-1463 MHz | 48°C |
| 30W | 1,738 | 29 FPS | 1553-1773 MHz | 52°C |
| 35W | 1,698 | 28 FPS | 1799-2146 MHz | 55°C |
| 45W | 1,772 | 30 FPS | 2037-2472 MHz | 62°C |
| 54W | 1,734 | 29 FPS | 2174-2600 MHz | 67°C |
FPS is capped at 29-30 from 25W upward (compositor vsync). The GPU clock continues scaling from 800 MHz (15W) to 2600 MHz (54W) but produces no additional FPS; the extra frequency is wasted. For graphics workloads capped at display refresh, ~25W is sufficient.
Game Benchmark: Resident Evil 6 (DX9, 1080p)
GPU-only workload (built-in benchmark), power sweep to find the point of diminishing returns:
| Pkg Limit | RE6 Score | vs Peak | GPU Clock (approx) |
|---|---|---|---|
| 20W | 7,599 | -26.1% | ~1600 MHz |
| 25W | 8,842 | -14.0% | ~1900 MHz |
| 30W | 9,683 | -5.8% | ~2100 MHz |
| 35W | 9,964 | -3.1% | ~2200 MHz |
| 40W | 10,205 | -0.7% | ~2200 MHz |
| 45W | 10,281 | baseline | ~2200 MHz |
| 54W | 10,280 | 0% | 2600 MHz |
Performance plateaus above 40W despite the GPU clock continuing to scale up to 2600 MHz at 54W. The likely explanation is DDR5 memory bandwidth: the 780M shares system memory (~73 GB/s measured via clpeak) rather than having dedicated GDDR. Once the GPU has enough compute power to saturate the memory bus, higher clocks produce no additional frames. The transition from compute-bound to memory-bound occurs around 30-35W, consistent with the clpeak FP32 scalar plateau at 25-30W.
Game Benchmark: Black Myth Wukong (1440p, FSR)
A GPU-heavy AAA title running at 1440p with FSR upscaling. Same scene tested at each package power level.
| Pkg Limit | Avg FPS | Min FPS |
|---|---|---|
| 25W | 38 | 31 |
| 30W | 40 | 34 |
| 35W | 41 | 36 |
| 40W | 43 | 36 |
| 45W | 43 | 37 |
| 50W | 43 | 38 |
| 54W | 45 | 38 |
Even a demanding modern title confirms the bandwidth ceiling. From 40W to 54W the average gains just 2 FPS (+5%) despite 14W of additional package power. The GPU clocks scale upward with the extra budget, but the CUs spend most of their cycles idle, waiting on data from DDR5-5600 shared memory. Higher clocks without higher bandwidth just means more time stalling.
Both RE6 and BMW point to the same conclusion: the GPU doesn't need much power before DDR5 bandwidth becomes the limiting factor. Once the memory bus is saturated, additional package power simply drives GPU clocks higher to no effect.
iGPU Bandwidth Wall
Every GPU benchmark tells the same story. RE6 scores plateau above 40W. BMW averages plateau above 40W. Geekbench compute subtests are flat from 54W down to 25W. clpeak FP32 scalar peaks at 25-30W. The 780M's 12 CUs have far more compute throughput than the DDR5-5600 memory bus can feed.
For most users, including gamers, this means the iGPU can run at a fraction of the rated power budget and still deliver close to full performance. At 35W package power the GPU retains over 90% of its 54W gaming performance while running significantly cooler and quieter. The power savings are essentially free: the extra wattage at higher budgets just heats the chip while the CUs idle-wait on memory fetches. Only pure compute workloads (physics simulations, certain ML inference) benefit from the full power budget, and even then the gains are modest.
This mirrors what cryptocurrency miners discovered during the Ethereum era. Ethash was memory-bandwidth bound, so miners would power-limit and undervolt the GPU core while overclocking VRAM as aggressively as possible, achieving up to 30% higher hashrates with 30% less core power. The same principle applies here: when the workload is bottlenecked by memory bandwidth, throwing more compute power at it is pure waste heat.
Single-Core Power Requirements
To determine the minimum package power needed for full single-core boost, we ran stress-ng --matrix 1 at various power limits.
| Pkg Limit | Single-core bogo/s | Post Tctl | Actual Pkg Draw |
|---|---|---|---|
| 10W | 4,105 (-29.8%) | 42°C | 9W |
| 15W | 5,351 (-8.5%) | 54°C | 14W |
| 20W | 5,738 (-1.9%) | 61°C | 18W |
| 22W | 5,816 (-0.6%) | 63°C | 19W |
| 25W | 5,850 (baseline) | 63°C | 20W |
A single boosting Zen 4 core draws approximately 18-20W at package level (core + uncore + idle system overhead). Performance plateaus above 20W; the core has reached its max boost clock and additional power budget goes unused.
At 10W the core is severely starved: the ~6W idle system overhead leaves only 3-4W for the core itself, cutting performance by 30%. At 15W the deficit narrows to 9%. Since these are package-level limits shared with the iGPU, occasional GPU activity (desktop compositing, video decode) can consume 3-5W and temporarily starve the CPU of boost headroom. In practice, a 22W package limit is sufficient to sustain full single-core boost with margin for incidental GPU usage, relevant for bursty single-threaded tasks like application launches, web page loads, and UI interactions.
Recommended Power Configuration
Based on all testing, the optimal configuration for this system is:
| Parameter | Value |
|---|---|
| TjMax | 95°C (via unlocked BIOS) |
| PL1 / STAPM (sustained) | 54W |
| PL2 / Fast PPT (burst) | 65W |
| STAPM time constant | 10s |
| Slow PPT time constant | 5s |
ryzenadj --stapm-limit=54000 --fast-limit=65000 --slow-limit=54000 --stapm-time=10 --slow-time=5
This allows the CPU to burst at 65W for ~10 seconds (benefiting compilations, app launches, and other short multi-threaded loads) before settling to 54W sustained. The burst phase approaches but stays just under TjMax; sustained operation stabilizes at ~88-89°C with fan headroom remaining.
These ryzenadj settings are volatile and reset on reboot. A systemd service (ryzenadj-tuning.service) reapplies them at boot.
Burst behavior tested
| stapm-time | Burst at 60W+ | Hits TjMax? | Sustained Tctl | All-core bogo/s |
|---|---|---|---|---|
| default (~5s) | ~5s | No (peak 92°C) | 88°C | 50,140 |
| 20s | ~15s | Yes (at 10s) | 89°C | 49,856 |
| 30s | ~25s | Yes (at 10s) | 89°C | 50,264 |
Sustained performance is identical regardless of burst duration (~50K bogo/s). The burst only helps workloads that complete within the burst window.
System Power (Wall)
| State | Wall Power | Package Power | Tctl | Fan |
|---|---|---|---|---|
| Idle (desktop) | 12-15W | ~7W | ~35°C | ~1350 RPM (inaudible) |
| All-core burst (PL2) | 90W | 64W | 90°C | ~3300 RPM (ramping) |
| All-core sustained (PL1) | 75W | 54W | 88°C | ~3270 RPM |
Platform overhead covers PSU conversion losses, VRM inefficiency, DDR5, NVMe, Wi-Fi, and display output. The overhead rises under load due to VRM losses at higher current draw. For an 8-core Zen 4 with RDNA 3 iGPU driving a 5120x1440 ultrawide, 12-15W idle at the wall is remarkably efficient.
Summary
| Parameter | Value |
|---|---|
| Stock STAPM / TjMax | 54W / 85°C |
| Optimal TjMax | 95°C (+7.7% CPU perf) |
| Cooler sustained dissipation | ~58-60W |
| Recommended power config | PL1 54W / PL2 65W / 10s burst |
| Single-core full boost floor | ~20W package |
| Geekbench 6 Vulkan (54W) | 38,935 |
| RE6 1080p (54W) | 10,280 |
| BMW 1440p FSR (54W) | 45 avg FPS |
| GPU compute power wall | ~25-30W package (~20-25W GPU after idle overhead) |
| GPU gaming bandwidth wall | ~35-40W package (>90% perf retained) |
| GPU memory bandwidth ceiling | ~73 GB/s (DDR5-5600 shared) |
| GPU max clock observed | 2600 MHz (at 54W, GPU-only) |
| ryzenadj STAPM/PPT control | Works (volatile, resets on reboot) |
| ryzenadj TjMax control | Does NOT work (SMU rejects) |
It is entirely possible that Proton translation puts more strain on overall system memory bandwidth thus leaving less for the GPU, causing a significant bottleneck. However, given how heavily Valve is invested in the compatibility layer and validated AMD APUs as their choice, the chance of that being the main cause of concern is low.
Takeaways
We believe that DDR5-5600 is a significant limiting factor for Ryzen systems equipped with the Radeon 780M iGPU. In discrete GPU testing, DDR5 has shown minimal FPS uplift when the system is GPU-bound, and the added bandwidth moving from the Zen 3 / DDR4 era to Zen 4's DDR5 still falls short of what the 12 CU RDNA 3 integrated GPU demands.
LPDDR5X may help address this if OEMs implement a 128-bit bus. At 7500 MT/s on a 128-bit bus, theoretical bandwidth reaches 120 GB/s — a meaningful step up from dual-channel DDR5-5600's ~89.6 GB/s. For reference, Strix Halo achieves 256 GB/s with its 256-bit LPDDR5X-8000 bus (or ~275 GB/s at 8533 MT/s). However, a 128-bit LPDDR5X configuration adds cost and complexity, requiring 8 DRAM dies at 16 bits each. We may never see this kind of configuration purposefully built just to let an iGPU perform at its best.
For everyday tasks, these Zen 4 APUs are extremely capable. With power limits configured properly, they strike a good balance between power efficiency (in an x86 context) and responsiveness. Shared memory bandwidth becomes a noticeable bottleneck primarily during gaming — even at 1080p in more demanding titles — and the constraint only grows at higher resolutions. This makes upscaling technologies like AMD's FSR essentially a must for a playable experience.
Test Environment
- CachyOS (Arch-based rolling), kernel 6.19.5-3-cachyos
- sched_ext with LAVD scheduler (v1.0.21)
- amd-pstate-epp driver,
powersavegovernor (hardware-managed P-states) - Game benchmarks run via Steam with Proton compatibility layer; results are representative of Linux/Proton performance and may differ from native Windows
Tools Used
ryzenadjv0.17.0 — runtime power limit adjustment via SMUryzen_smukernel module — direct SMU access for ryzenadjuefiextract/uefireplacev0.28.0 — BIOS ROM extraction and patchingifrextractor-rsv1.6.0 — IFR form data extraction from PE32 binariesAfuEfix64.efi— AMI BIOS flash utility (UEFI Shell)FurMarkv2.10.2 — GPU stress test (Vulkan)clpeakv1.1.5 — OpenCL compute peak performance profilerstress-ng— CPU stress testing with bogo ops metricsGeekbench 6v6.5.0 — cross-platform GPU compute benchmark (Vulkan)7zip— LZMA compression benchmarkResident Evil 6— built-in benchmark (DX9, 1080p, Steam/Proton)Black Myth Wukong— in-game benchmark (1440p, FSR, Steam/Proton)