DMA / Single player always wins…

A bet you can’t lose — and somehow did

Micro Channel arbitration is a tournament. Any device that wants the bus — a DMA controller, a bus-master card, the refresh logic — drops its 4‑bit arbitration level onto the shared ARB[3:0] lines during the arbitration window, and the lowest level wins. The lines are open‑collector and active‑low: you pull down the bits that are 0 in your level, you release the bits that are 1, and the wired‑AND of everyone’s drive settles to the winner. Lose a bit to someone with higher priority, and you drop out of all the lower bits too.

Here’s the thing about a tournament with one entrant: the single player always wins. When WonderMCA is the only card asking for the bus, its arbitration is a formality — present level 0001 (DMA channel 1), nobody contests it, the planar grants it the bus, and the CDMA pushes a byte into our test port. That’s the whole game.

If you remember the previous episode, Sound Blaster emulation was working , unstable, but working on Model 50Z and model 70. As part of my larger validation, I tested the WonderMCA on the Model 57 & 56 SLC2 and it was a complete disaster, Model 56 & 57 have something in common, the chassis has zero tolerance.

Prince of Persia was rebooting as soon as Digital Audio was starting…

So when I armed SBDIAG, expecting the Sound Blaster “TADA” to loop forever off chassis DMA, and instead got one TADA and then a frozen POST, I had a card that was losing a tournament it was the only player in. That should be impossible. It took most of a day, three independent bugs, and a trip through the chip’s own silicon manual to understand how you lose a race you can’t lose.

Layer 0: the firmware that quietly killed Core 1

Before the arbitration even mattered, the first symptom was blunter: CHRDY never released, and the bus hung. The MCA bus master asserts a cycle, our CPLD stretches it with CHRDY (a wait‑state), and the RP2350 firmware is supposed to pulse RP_REL to tear the latch down. CHRDY staying asserted means nobody is pulsing RP_REL — i.e. Core 1, which runs the whole bus handler, is dead.

The give‑away was a regression I’d caused that morning. I’d “optimised” main_core1() by splitting its one‑time init out into a separate __not_in_flash_func(core1_init) to reclaim ~700 bytes of the 4 KB SCRATCH_X budget. Clean idea, green build, scratch_x shrank 2024 → 1312 bytes. And it bricked Core 1: the GPIO coprocessor (CP0) enable, and the fragile single‑translation‑unit assumptions of that hand‑tuned loop, do not survive being chopped into two functions across two memory regions. Pull the card, POST completes; card in, frozen.

Lesson, logged: keep main_core1 a single __scratch_x function. I reverted the split. You don’t refactor the load‑bearing wall to save a picture frame’s worth of bytes. CHRDY released again, and the real DMA debugging could start.

Layer 1: a clock that “needs patching”

With the bus alive, the DMA still didn’t arbitrate. Time for the CPLD. WonderMCA’s arbitration is an ATF1508 design that mirrors a known‑good Verilog reference (mcsb.v). I opened the fitter report — and the fitter was telling me it was unhappy:

## Warning : Placement fail        (fitter pass 1)
## Warning : Placement fail        (fitter pass 2)
RP_DMA_GRANT.C equation needs patching.
MCA_PREEMPT.OE equation needs patching.
3 control equations need patching

RP_DMA_GRANT.C is the clock of the grant flip‑flop — the FF the firmware reads to know it’s been granted the bus. A clock getting “patched” onto a feedback node instead of a clean global‑clock net is exactly the kind of fragile that makes a grant latch unreliable. The source said RP_DMA_GRANT.CK = valid_io; — a combinational product term, not a clock pin. The design comment even confessed it: “Easy revert: change CK back to plain ADL.” So I did — RP_DMA_GRANT.CK = ADL, which sits on GCLK2 (PIN 2). The patch on the grant clock vanished. 3 → 1.

One down. But the boot still froze, and now I could see why on the logic analyzer.

Layer 2: ARB0 held low, forever

The LA told a strange story. With DMA channel 1 selected, the card should present 0001: drive ARB3/2/1 low, release ARB0. Instead:

  • ARB1 = 1 (not driven — it should be 0 for DMA1), and
  • ARB0 = 0all the time — even at idle, even with nothing arbitrating.

ARB0 stuck low is catastrophic, and the arbitration math says exactly why. For DMA1 the card’s win condition reduces to a single term:

drop_cum = !MCA_ARB_0          /* card_arb0 = 1, no higher-bit dropout */
arb_won  = !drop_cum & DMA_SESSION = MCA_ARB_0 & DMA_SESSION

The card can only win if ARB0 reads HIGH. It releases ARB0 and trusts the bus terminator to pull it up; a level‑0 contender pulling it low is the only legitimate reason to lose there. With ARB0 nailed low by something, arb_won is 0 on every cycle. The single player loses every hand because the table is rigged.

So who’s rigging it? The CPLD declares MCA_ARB_0 as a pure input — it shouldn’t be driving anything. The card‑out test settled it instantly: pull WonderMCA, POST completes. It was us. Our own CPLD was holding ARB0 low and freezing the whole planar’s arbitration.

The cross-check: what the reference actually does

Before “fixing” anything I read the Verilog reference in detail, because the LA and the source disagreed about whether ARB0 should ever be driven. mcsb.v is unambiguous:

inout  [3:0] arb;                                   <em>// bidirectional, all 4</em>
assign arb[i] = (~arb_en | arb_out[i]) ? 1'bZ : 1'b0;   <em>// open-drain</em>
assign arb_out[0] = card_arb[0] | ~arb_match[1] | ~arb_match[2] | ~arb_match[3];
card_arb (DMA1) = 0001;                             <em>// card_arb[0] = 1</em>

For DMA1, card_arb[0]=1 → arb_out[0]=1 → arb[0] is always released (Hi‑Z). The reference never drives ARB0 on our channels. It also declares the line inout — a real bidirectional open‑drain pin. The .pld matched the logic (ARB0 released) but had quietly made the pin input‑only to dodge a WinCUPL quirk. That mismatch was the thread to pull.

Layer 3: the trap — when “off” means “always on”

Here’s the part that cost the most hours. On this ATF1508 / WinCUPL flow, the open‑drain emulation everyone uses —

MCA_ARB_0     = 'b'0;
MCA_ARB_0.OE  = participate & !card_arb0 & !drop_1;   /* the OE controls drive */

— traps when the OE folds to a compile‑time constant 0. Because card_arb0 = 1!card_arb0 = 0, the whole OE simplifies to 0, and the fitter then “optimises” “output‑enable always 0” into “no OE term, output always on” — and drives the pin to its value, 0low, forever. The cure the design had reached for — removing the equation entirely so the pin is input‑only — looked right in the fitter report (MC88 57 -- MCA_ARB_0 INPUT) but a stale JED on the part still carried the old driving build, and even MCA_CD_DS16.OE = 'b'0 (my attempt to “disable” DS16) hit the same trap: the fit showed that macrocell on, driving /CD-DS16 low = 16‑bit forced on. Three different ways to say “don’t drive this pin,” three ways the fitter said “fine, I’ll drive it low.”

This is the moment the whole bug clicked: you cannot disable an ATF1508 output by giving it a constant‑0 enable. The OE trick is a footgun the fitter loads for you.

The fix: use the silicon’s real open-collector mode

The ATF150x macrocells have a hardware open‑collector output — drive low or release, in the buffer itself, no OE product term to collapse. It’s not in the WinCUPL language manual (that only documents .OE); it lives in the ATF15xx Device Fitter User’s Guide, as a fitter strategy you select from CUPL source:

PROPERTY ATMEL { open_collector = MCA_ARB_0, MCA_ARB_1, MCA_ARB_2, MCA_ARB_3, MCA_PREEMPT };

With the pin in real OC mode you stop emulating and just write the bus value — 0 drives low, 1 releases — which is exactly the Verilog:

MCA_ARB_0   = !participate # card_arb0 # drop_1;   /* folds to 1 -> always released, no jam */
MCA_ARB_1   = !participate # card_arb1 # drop_2;
MCA_ARB_2   = !participate # card_arb2 # drop_3;
MCA_ARB_3   = !participate # card_arb3;
MCA_PREEMPT = !preempt_drive;

No .OE anywhere on these pins, so there is no constant‑0 OE for the fitter to invert into a permanent low. ARB0’s equation folds to 1 — and in OC mode 1 means release, not drive high — so it sits Hi‑Z, exactly as the single‑player tournament requires. /CD-DS16, by contrast, is genuinely a totem‑pole output (it must actively drive both 8‑ and 16‑bit states for a fast early sample), so it stays push‑pull and out of the open‑collector list: MCA_CD_DS16 = !ds16_drive;.

The one caveat I left in the source in bold: verify the .fit echoes Open_collector = MCA_ARB_0, …. If WinCUPL silently ignores the directive, those value‑equations become push‑pull outputs driving the open‑drain bus high — bus contention, worse than the bug. The fitter’s own echo is the ground truth.

The single player wins again

Rev 72 burned to the CPLD. ARB0 released to the terminator’s HIGH, the card presented 0001, the planar granted it the bus, the CDMA streamed bytes into the SB port — and the TADA looped, continuously, the way a 22 kHz auto‑init DMA transfer is supposed to. The tournament has one entrant again, and the one entrant wins.

What this one taught me

  • A tool’s optimiser is part of your circuit. The bug wasn’t in my logic — drop_cumarb_out, the levels all matched the reference. It was WinCUPL turning “never drive” into “always drive low.” Read the fitter report like it’s a teammate; it was warning me in plain English for hours.
  • Use the device’s real feature, not an emulation of it. Hardware open‑collector existed the whole time. The ='b'0; .OE=cond pattern is a decades‑old habit that happens to detonate on this part when the condition is constant.
  • The manual you grabbed might be the wrong manual. The open‑collector syntax wasn’t in the WinCUPL language reference at all — it was a fitter strategy in a separate Microchip guide. An afternoon of “it’s not documented” was really “it’s documented elsewhere.”
  • Bisect across layers, not just lines. Three unrelated bugs stacked — a firmware regression, a clock placement, and the OE trap — each masking the next. Card‑out tests, the fitter echo, and the LA each isolated one layer. Guessing would never have peeled them apart.

Different MCA chassis, different fitter quirks, the same lesson WonderMCA keeps teaching: the bus is honest, the silicon is honest, the tools have opinions — and you don’t really understand a system until you understand how it can lose a game it can’t lose.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top