MKDS Overlay System — Extensible Overlays for Mario Kart DS¶
Composable overlay framework for the DeSmuME renderer.
Compute geometry from live emulator memory, project to screen space, and enqueue drawing primitives for a GTK/Cairo render loop. Easily add your own overlays by writing a singleoverlay(emu, device)function.
Focus: Overlays (
utils/overlay.py) and drawing stack (utils/draw.py)Renderer example:
main.py(GTK window & render loop integration)
Table of Contents¶
Overview¶
The overlay system is designed to be highly extensible: write small, pure functions that read live game state from the emulator, project points to screen space (256×192), and enqueue Cairo drawing operations. The GTK renderer consumes those operations on each draw event, layering them over the emulator frame.
Core pieces:
utils/overlay.py: A set of example overlays showing best practices.utils/draw.py: A composable draw-queue built on a CairoContextand a decorator to enqueue draw calls from any thread.main.py: A concrete GTK + DeSmuME render loop showing how to plug in overlays.
How It Works¶
Execution Flow¶
Emulator tick:
emu.cycle()advances the game by one frame.Overlay evaluation (worker thread): Each overlay (
overlay(emu, device)) reads memory, computes geometry, projects to screen, and enqueues draw ops viadraw_*helpers.GTK draw event (UI thread):
on_draw_maincallsconsume_draw_stack(ctx), which dequeues closures and draws them onto a transparent overlay surface. That surface is composited over the emulator screen.
Threading Model¶
Overlays typically run in a worker thread (see
worker()and thetick()inmain.py).draw_queueis a thread-safequeue.Queue; decorateddraw_*functions push closures that will be executed later in the GTK thread with a CairoContext.The GTK
drawcallback maintains an overlay surface cache to reuse the last overlay frame when no new draw ops were enqueued (reduces flicker and work).
Getting Started¶
from utils.overlay import (
collision_overlay,
checkpoint_overlay_1, checkpoint_overlay_2,
player_overlay, raycasting_overlay, camera_overlay,
stats_overlay,
)
# In your run loop, pass a list of overlays:
run_emulator(
generate_trainer,
[
collision_overlay,
checkpoint_overlay_1,
checkpoint_overlay_2,
# player_overlay,
# raycasting_overlay,
# camera_overlay,
# stats_overlay, # example: queue text/numbers
],
)
Each overlay is called like
overlay(emu, device)and should only enqueue draw ops. It must not block.
Coordinate System & Screen Space¶
World space follows MKDS conventions (Y-up).
Screen space is 256×192 pixels; origin at top-left (X→right, Y→down).
Use
project_to_screen(emu, points, device)to map(N,3)world coordinates →(N,4)screen data[x_px, y_px, clip_z, depth_norm].Use
z_clip_mask(screen_tensor)to cull points outside near/far planes:(clip_z < -Z_NEAR) & (clip_z > -Z_FAR).Most overlays export the third channel in
draw_points/draw_linesbuffers as a scale/depth dimension used by the primitive (see below).
Overlay API (Built-ins)¶
All overlays follow the signature:
overlay(emu: DeSmuME, device=None) -> None.
collision_overlay¶
Purpose: Draw edges of collision triangles around the player, colored by attribute (walls and offroad types).
Flow:
Load KCL (
load_current_kcl) and playerposition.indices = kcl.search_triangles(position)finds nearby triangle indices (octree leaf).Filter by attributes:
is_wall→ magenta(1, 0, 1)collision_type in {2,3,5}(offroad) → pink(1, 0, 0.3)
Project triangle vertices
v1, v2, v3to screen; z-clip, then emit edges viadraw_triangles.
Notes:
Third component passed to draw is the normalized depth (used for stylized scaling).
raycasting_overlay¶
Purpose: Demonstrate dynamic raycasting to the nearest obstacle in front of the kart, smoothing the intersection point over time.
Flow:
Read player
position.read_facing_point_obstaclesamples a cone of rays to find the nearest hit (walls/offroad).Smooth a global
current_pointtoward the latest hit viainterpolate(0.1 factor).Compute forward/left/right distances for logging or additional overlays.
Notes:
This overlay does not draw by itself; it prepares values for other overlays or logs. Add your own
draw_pointsif desired.
camera_overlay¶
Purpose: Visualize the current camera target position.
Flow:
Read
read_camera_target_position.Project to screen and plot a red point via
draw_points(larger radius).
checkpoint_overlay_1¶
Purpose: Draw a line segment between the two endpoints of the next checkpoint, aligned to the player’s current Y (so it’s visible in 2D screen space).
Flow:
read_next_checkpoint_position→(2,3)endpoints.Replace endpoints’ Y with player Y (visual flattening).
Project to screen, z-clip; if one endpoint remains, draw a single green dot.
Otherwise, concat
[x, y, depth_norm, clip_z]and emit viadraw_lines(green).
checkpoint_overlay_2¶
Purpose: Draw a ray from the kart to its intersection with the next checkpoint boundary.
Flow:
read_directionandread_facing_point_checkpoint→ intersection point.Project intersection to screen, z-clip, and draw a green point.
Project current player position, normalize both to a small Z for visibility, and draw a blue line from kart → intersection.
player_overlay¶
Purpose: Scatter-plot visible objects by category: map_objects, racer_objects, item_objects, dynamic_objects.
Flow:
read_objects()groups object IDs by flags.For each group, collect positions with
read_object_position(skips deleted/ignored viasafe_object).Project to screen, z-clip, and draw points in group color.
Notes:
Uses four palette colors; colors broadcast to all points per group.
stats_overlay¶
Purpose: Example overlay that reads stats without drawing (clock, forward distances). In the GTK example, textual stats are drawn in the on_draw_main callback using draw_paragraph directly (outside the queue).
Suggested: Convert to overlay by formatting stats and calling draw_paragraph to enqueue text if you prefer everything in the queue.
Drawing Primitives & Queue¶
All drawing helpers live in utils/draw.py. They do not draw immediately. Instead, they’re decorated with @draw_stack_op, so calling them enqueues a closure that will be executed with a Cairo Context in the GTK draw thread.
draw_points¶
draw_points(pts: np.ndarray, colors: np.ndarray, radius_scale: float | np.ndarray)
Inputs:
pts: shape(N, 3)or(3,)→[x_px, y_px, z]zacts as a scale factor for the dot radius:radius = radius_scale * z
colors: shape(N, 3)or(3,)in [0,1] RGB; single color is broadcast to all points.radius_scale: typically a float. (Per-point arrays are not currently indexed per row.)
Behavior: Fills a disk at each point.
draw_lines¶
draw_lines(pts1: np.ndarray, pts2: np.ndarray, colors: np.ndarray, stroke_width_scale=1.0)
Inputs:
pts1,pts2: shapes(N,3)or(3,). Only X,Y are used for line endpoints.colors:(N,3)or(3,)RGB; single color broadcasts.stroke_width_scale: Cairo line width.
Behavior: Draws segments
pts1[i] → pts2[i]per batch entry.
draw_triangles¶
draw_triangles(pts1: np.ndarray, pts2: np.ndarray, pts3: np.ndarray, colors: np.ndarray)
Inputs: Three point arrays, shape
(N,3)each.colorsis(3,)or(N,3)RGB.Behavior: Renders triangle edges by internally enqueuing three
draw_linescalls (one per edge).
draw_text¶
Immediate text drawing helper (not queued).
draw_text(text, pos=(x,y), color=(r,g,b), alpha=1.0, font_size=12, font_family="Sans")
Use directly in a Cairo context (e.g., from
on_draw_main) for HUD-like text that you want to render every frame.
draw_paragraph¶
Immediate block text (multiple lines).
draw_paragraph(text, pos=(x,y), color, alpha, font_size, vertical_spacing, font_family)
consume_draw_stack¶
Consume and execute up to max_items enqueued draw operations.
num = consume_draw_stack(ctx: cairo.Context, max_items: int | None = None) -> int
Returns the number of ops executed.
Use from the GTK draw callback to render overlays on a transparent surface.
Decorator: draw_stack_op¶
@draw_stack_op
def draw_points(ctx: Context, ...):
...
Wraps a function of shape
(ctx, *args)into an enqueuer that accepts*argsnow and pushes a closure to call the original with a CairoContextlater.This decouples CPU-side geometry work from UI-thread rendering.
Creating Your Own Overlay¶
An overlay is a simple function:
import numpy as np
import torch
from utils.draw import draw_points
from utils.memory import read_position, project_to_screen, z_clip_mask
def my_overlay(emu, device=None):
# 1) Read from emulator
pos = read_position(emu, device=device).unsqueeze(0) # (1,3)
# 2) Project to screen
sp = project_to_screen(emu, pos, device=device) # (1,4) [x,y,clip_z,depth_norm]
# 3) Depth cull
mask = z_clip_mask(sp)
sp = sp[mask]
if sp.shape[0] == 0:
return
# 4) Prepare np arrays (x, y, depth_norm) and color
pts = torch.cat([sp[:, :2], sp[:, 3, None]], dim=-1).cpu().numpy()
color = np.array([0.2, 0.8, 0.6])
# 5) Enqueue draw op
draw_points(pts, colors=color, radius_scale=6.0)
Tips:
Never block in overlays. Do reads/computation; enqueue and return.
Convert to NumPy right before enqueuing (
.detach().cpu().numpy()).Use
z_clip_maskafter projection to avoid drawing behind-camera artifacts.To draw edges, prepare two
(N,3)arrays(pts1, pts2)and calldraw_lines.
Integration Example (GTK)¶
main.py shows a complete integration with GTK and DeSmuME:
Creates an
EmulatorWindowwith aGtk.DrawingAreascaled bySCALE.On each
draw:Renders the emulator frame into the Cairo target.
Creates a transparent overlay surface, clears it, then calls
consume_draw_stack(overlay_ctx).Caches the overlay surface when no draws occurred (saves work).
Composites the overlay over the emulator frame via
cairo.OPERATOR_OVER.Optionally draws HUD text via
draw_paragraph(immediate helper).
A timed callback (
GLib.timeout_add(16, tick)) advances the emulator at ~60 FPS, pushes the currentemuinto the worker queue, and schedules a redraw.The worker thread pulls
emuand runs each overlay in order, enqueuing draw ops via the queue-safe decorators.
Key bindings: Mapped to DS keys via pynput and desmume.controls (see KEY_MAP).
Performance Notes¶
Minimize device↔host transfers: keep tensors on device during math; transfer to CPU only when creating NumPy arrays for drawing.
Cull early: use
z_clip_maskprior to NumPy conversion.Batch operations: project arrays of points at once; avoid per-point projection.
Overlay surface cache: already implemented — reuses last overlay frame when nothing changed.
No anti-aliasing: overlay uses
cairo.ANTIALIAS_NONEandFILTER_NEARESTto preserve the DS aesthetic.
FAQ¶
Q: Do I need to call draw_* with a Cairo context?
A: No. The decorator enqueues a closure; the GTK thread supplies the Context later.
Q: Why does draw_points use the 3rd channel of pts?
A: It scales the point radius (radius_scale * z) to hint depth/parallax. For fixed-size markers, pass a constant z=1 per point.
Q: Can I pass per-point radii?
A: The current implementation expects a float radius_scale; per-point scaling isn’t indexed per row. If you need it, extend the function to accept a per-point array and index it inside the loop.
Q: Where do project_to_screen, z_clip_mask, etc. come from?
A: They’re provided by the memory/geometry utilities (utils.memory), which wrap camera pose reads and projection math.
Q: What if my overlay needs previous-frame state?
A: Use module-level globals (as raycasting_overlay does with current_point) or keep a small state object; be mindful of the worker thread context.
License¶
This overlay system documentation is part of the project repository. See LICENSE in the repo root for terms.