Jump to content

Seedbox and Storage Box Disk Priority

From Pulsed Media Wiki


Pulsed Media runs shared seedbox and storage box hosting on its own hardware in Finland. On a shared host, several customers share the same physical disks — so disk I/O priority decides how much disk bandwidth your account gets when the server is busy. This article explains how Pulsed Media handles that: what determines your share of the disk, how plan tiers map to priority, and — for the technically curious — the kernel-level machinery underneath, including a long-standing upstream issue Pulsed Media works around to make tiered priority actually function.

It is organized top-down: what it means for you first, the deep technical detail second.

What this means for you

Here is what you are actually buying on shared hardware: every megabyte per second the disk can deliver, delivered to whoever needs it, the instant they need it. No idle capacity sitting unused while your download waits its turn. No artificial speed cap throttling you to your "tier" when the disk is free. When you are moving data and nobody else is competing for the same platter, the entire disk is yours — full speed, every time. You are not paying for a slice of the disk; you are paying for first claim on it when it is contested, and the whole thing when it is not.

This works because the scheduler underneath is work-conserving: it never lets the disk sit partly idle while there is work to do. It drives utilization toward 100% constantly, handing the full disk to whatever account is active right now. Priority weights only enter the picture during a genuine traffic jam — several accounts hammering the same platter in the same instant. Then, and only then, is the disk's bandwidth split by tier. The rest of the time — which is most of the time — your tier is irrelevant and you get everything the hardware can give. A higher tier does not get you more speed on a quiet disk (there is no more to give — you already have it all); it gets you a bigger share of the disk during the rare moments it is genuinely fought over.

On a shared seedbox host, several customer accounts share the same physical disks. Most of the time this is invisible — when you are the only account doing heavy disk work, you get the full disk. That is the single most important thing to understand:

When the disk is not busy, your tier does not matter. You get full bandwidth regardless of plan.

Tier priority only comes into play during contention — the moments when several accounts hammer the disk at the same instant (a mass re-check after a tracker rebalance, simultaneous imports, cross-seed bursts). During those windows, the disk's bandwidth is divided between the competing accounts in proportion to their priority weight. A higher tier gets a larger slice of that contested bandwidth; a lower tier gets a smaller slice. Once the burst clears, everyone returns to taking as much as they need.

This is the right model for a seedbox: it does not artificially throttle you when the disk is free, and it protects higher-tier customers' responsiveness when the disk is genuinely contested.

Plan tiers and priority

Pulsed Media sells two product families on its shared hosts — active seedboxes and background-class storage boxes — and the plans group into a small set of priority bands. The band a plan sits in reflects two things: whether it is a storage box or an active seedbox, and where it sits in the price ladder.

At a glance, lowest to highest contested-disk priority:

Disk-priority tiers — relative share when the disk is contended
Tier Typical plans Relative priority
Idle floor free trials, throwaway promos 1
Promo base paid limited-edition promos 5
Storage storage boxes, any size 10
Entry seedbox cheapest active seedbox 30
Standard seedbox mid-catalog seedboxes 100
Premium seedbox upper-mid seedboxes 250
Pro / dedicated top shared + dedicated-resource 500
Maximum top dedicated seedbox 700

The step between adjacent paying tiers is mildly super-linear — roughly 2.5–3× per step. Stepping up the catalog buys disproportionately more contended-disk priority, not just proportionally more: the top tier carries a 700-to-1 priority ratio against the idle floor during contention.

Storage boxes deliberately sit low. A storage box is for holding data, not for winning a disk-bandwidth race against active seedboxes — so the whole storage line sits flat at a background priority. It still gets the entire disk whenever the disk is uncontested (the work-conserving rule below); the low tier only matters during a genuine simultaneous-contention burst.

Two worked examples make the behavior concrete.

You alone on a quiet disk. You run your torrent client; nobody else is doing heavy disk work. You get effectively 100% of disk bandwidth. Your tier is never consulted, because there is nothing to arbitrate. A free-tier account alone on the disk gets the same throughput as a top-tier account alone on the disk.

Five accounts competing at peak. Suppose five accounts hit the disk simultaneously: two standard-tier (priority 100 each), one premium (250), one pro (500), one entry (30). Total contested weight is 980. During that contention window the pro account gets 500/980 ≈ 51% of bandwidth; the entry account gets 30/980 ≈ 3%; the two standard accounts get ~10% each. As soon as the burst eases — one account finishes, others go idle — the remaining accounts get more, up to the full disk when alone.

These are time-averaged shares over seconds-to-minutes. Over millisecond windows the scheduler hands each account a contiguous burst of disk service in turn, sized by its weight — this is what keeps a rotational disk from thrashing.

The point that ties both examples together: the disk itself is driven to full utilization in both cases. Alone, you fill it. In contention, the competing accounts fill it between them. The disk is never sitting half-idle waiting on a priority decision — the priority weight only decides whose work fills the disk during the contested window, never whether the disk runs full. You are never paying the cost of wasted capacity; the only thing tiering changes is who wins the bandwidth during a genuine fight for it.

Every Pulsed Media seedbox and storage box runs on exactly this model — full disk speed when the platter is yours, protected priority when it is contested, and never an artificial cap on a quiet disk. See the seedbox range and storage boxes, all on Pulsed Media's own hardware in Finland, from 1Gbps entry plans through 10Gbps performance tiers.

The deep technical detail

Everything below is the "dirt" — the kernel and systemd machinery that implements the priority model described above. Customers do not need it; sysadmins and the curious may want it.

What BFQ is, and why Pulsed Media uses it

The scheduler is BFQ — Budget Fair Queueing — designed by Paolo Valente, in the Linux kernel since 4.12. It is built for rotational HDD storage with many independent tenants sharing one disk.

The alternatives on a modern Linux host are typically mq-deadline (latency-bounded FIFO with light prioritization) and none (a passthrough that defers to hardware). Both suit single-tenant workloads. Neither implements proportional bandwidth sharing across cgroups during contention. Pulsed Media runs multi-tenant seedbox hosts where multiple accounts share rotational drives, so proportional sharing under contention is exactly the primitive needed.

BFQ provides three properties at once:

  • Work-conserving — a single active account gets 100% of disk bandwidth regardless of weight. Weights only enforce ratios when multiple accounts compete at the same instant. (This is the "your tier does not matter on a quiet disk" property from above.)
  • Budget-based dispatching — each cgroup gets a "budget" of sectors to dispatch before yielding to the next. Budget size is proportional to weight. This amortizes the seek cost across a chunk of contiguous-ish requests per cgroup — critical for HDD throughput.
  • Proportional fairness over time-averaged windows — over seconds-to-minutes the bandwidth split converges to the weight ratio. Over milliseconds it does not; instantaneous slicing would seek-thrash the disk.

The trade-off is explicit: BFQ sacrifices strict instant proportionality for amortized throughput. On HDDs this trade is unambiguously correct. On NVMe SSDs the budget mechanism matters less — no seek cost — so weight ratio becomes a more direct bandwidth ratio.

Weight semantics: the kernel accepts 1 to 1000

Each cgroup carries a single integer weight in blkio.bfq.weight (cgroup v1) or io.bfq.weight (cgroup v2). The kernel accepts 1 to 1000, default 100.

The weight is a proportional-sharing coefficient. During contention, a cgroup with weight W gets a share of disk bandwidth proportional to W divided by the sum of weights across all actively-competing cgroups. If account A (weight 100) and account B (weight 400) are the only two competing, B gets 400/(100+400) = 80% and A gets 20%.

Three properties matter:

  1. A high weight does not guarantee high bandwidth. It guarantees a high share if and only if there is contention. An idle high-weight account gets nothing extra.
  2. A low weight does not starve an account when the disk is uncontested.
  3. The relative ratio of weights matters, not absolute values. Weights 100 and 400 behave identically to 1000 and 4000 — same 1:4 split.

The systemd v252 cgroup-v1 translation chain

The kernel exposes the full 1–1000 range directly. But when systemd writes the weight (via the unit property IOWeight=N, any integer 1–10000), it goes through a translation step first.

On cgroup-v2 hosts, systemd writes io.bfq.weight directly via its set_bfq_weight() function, mapping the 1–10000 input onto the 1–1000 kernel range with its BFQ_WEIGHT() macro.

On cgroup-v1 hybrid hosts, systemd writes blkio.bfq.weight instead — and on this path, v252's source code chains two transformations:

  1. cgroup_weight_io_to_blkio() first maps the systemd IOWeight (1–10000) onto the legacy CFQ-era blkio.weight space (10–1000) using IOWeight × 5, clamped to [10, 1000].
  2. BFQ_WEIGHT() then maps that 10–1000 value onto the kernel blkio.bfq.weight with a piecewise linear curve.

The first transformation saturates at IOWeight = 200. Once IOWeight × 5 ≥ 1000, the clamp kicks in and the input to BFQ_WEIGHT() becomes constant 1000 regardless of the original IOWeight. BFQ_WEIGHT(1000) = 181, so every value with systemd IOWeight ≥ 200 lands at kernel bfq.weight = 181. Below 200, the chain produces a compressed mapping: IOWeight = 100 → kernel 136, IOWeight = 50 → kernel 113.

This was verified empirically by writing values via systemctl set-property and reading back blkio.bfq.weight from the kernel cgroup file. Every IOWeight from 200 upward produced the same kernel value: 181. Values below 200 produced a narrow band of 50–181.

The practical implication: any two systemd IOWeight values at or above 200 — say 3000 and 12500 — produce the same kernel bfq.weight = 181 at the layer that actually controls bandwidth. Differentiation expressed through the systemd IOWeight setting vanishes at the kernel. This is precisely why a direct-write approach is needed to deliver tiered priority on cgroup-v1 hosts.

The maintainers acknowledge this

This is not a misunderstanding. The systemd maintainers describe the cgroup-v1 BFQ situation in their own words. From systemd PR #20522 (August 2021), Lennart Poettering:

urks, bfq. we should have never merged support for that mess. systemd should not be the place to work around kernel politics. And it sucks we always make two writes to these files, of which one then fails.

From systemd issue #21187 (November 2021), Poettering again:

Honestly, I'd just drop the whole bfq stuff. It's clearly not ready if you ask me, I am really not sure why we bother at all.

In the same thread, Zbigniew Jędrzejewski-Szmek:

Yeah, it would have been wise to not step into this mess in the first place. But I think it's too late now: people are using bfq, and if we rip out support, people will be unhappy.

The exact bandwidth-mis-distribution issue — systemd issue #27622 — has been open since May 2023 with no fix queued. An alternate clamp-only proposal was not merged in favor of the current chained scaling, which the maintainers describe as preferring a uniform API. The translation chain is upstream-deliberate, not a Debian-specific patch.

Pulsed Media's bypass: direct kernel write with self-heal

The kernel accepts 1–1000 written directly to blkio.bfq.weight on cgroup-v1 hosts. PMSS deploys a small cron job that reads each account's intended weight from its per-user configuration, computes the desired kernel value, and writes it directly to the cgroup file — bypassing the systemd chain entirely.

The mechanism is /scripts/cron/cgroupBfqWeightApply.php in the public PMSS repository, GPL-3.0. Anyone running Debian 12 with systemd 252+ and cgroup-v1 hybrid mode can read or adapt it.

Behavior:

  • Runs every five minutes via the PMSS root cron, plus once after reboot.
  • Reads each per-user configuration and computes the target weight.
  • Writes the target to /sys/fs/cgroup/blkio/user.slice/user-N.slice/blkio.bfq.weight only when the current kernel value differs. Idempotent — no writes on no-op cycles.
  • Self-heals systemd overwrite events. Several systemd operations (daemon-reload, systemctl set-property, user-slice reactivation) re-apply the chain-derived value. The cron catches the overwrite within at most five minutes and restores the intended value.
  • Falls back to a deterministic formula when no explicit weight is set: round(3.535 × √ramMiB) clamped to [1, 700], a smooth curve from small accounts up through larger ones.

The five-minute cycle balances catching overwrites quickly against not generating unnecessary writes. In practice, systemd overwrite events are uncommon — most cycles write nothing.

Cgroup v1 versus v2

The translation chain is specific to cgroup-v1 hybrid mode. On cgroup-v2 unified hosts, systemd writes io.bfq.weight directly via set_bfq_weight(), which does not chain the two transformations — so the saturation-at-200 behavior does not occur there.

So why run cgroup-v1 at all, when Debian 12 defaults to cgroup-v2? It is a deliberate choice in service of a customer feature. Pulsed Media forces the legacy hierarchy (systemd.unified_cgroup_hierarchy=0 on the kernel command line) so that rootless Docker keeps working on customer seedboxes — rootless Docker on current kernels needs the v1 hierarchy and must coexist with the process-isolation hardening Pulsed Media runs on every host. That single decision — keep rootless Docker available to customers — is what places the whole fleet on the v1 path, and the v1 path is the one carrying the broken systemd BFQ translation chain. The direct-write cron is the price of that trade: it neutralizes the chain so customers get both rootless Docker and correct tiered disk priority.

Pulsed Media's current production hosts therefore run cgroup-v1 hybrid mode. The direct-write cron delivers correct kernel weights on that configuration at no cost to running services. Where a host runs cgroup-v2 unified mode, systemd already writes the correct value and the cron simply finds nothing to change on each cycle.

Understanding the weight on a host

The value that determines a cgroup's proportional disk-bandwidth share lives in blkio.bfq.weight under the account's slice. Remember the work-conserving rule: this weight only affects bandwidth during contention. When a disk is uncontested, every account, regardless of weight, gets full bandwidth. Different tiers carrying different weights is by design and changes nothing about throughput when you are the only active workload on the disk.

For sysadmins running similar Debian 12 / systemd 252 / cgroup-v1 setups: the same translation chain affects you. The PMSS direct-write script (public source) is adaptable — the core logic is around 100 lines of PHP, portable to bash or any language. GPL-3.0.

References