Documentation

Complete configuration reference for DemoDSL v2.7.0

Overview

DemoDSL is a DSL-driven automated product demo video generator. You describe your demo in a single YAML or JSON configuration file covering browser automation, voice narration, visual effects, video editing, audio mixing, and multi-format export. DemoDSL then orchestrates the full pipeline to produce a polished video.

A configuration file has 10 top-level sections. Only metadata is required — every other section is optional and has sensible defaults.

Root structure

metadata:        # REQUIRED — title, description, author, version
voice:           # TTS engine configuration
audio:           # Background music, voice processing, effects
device_rendering: # 3D device mockup settings
video:           # Intro, outro, transitions, watermark
subtitle:        # Subtitle overlay styles and timing
scenarios:       # Browser automation steps
pipeline:        # Post-processing chain
output:          # Export filenames, formats, social presets
analytics:       # Engagement tracking

Config Format

DemoDSL accepts both YAML (.yaml / .yml) and JSON (.json) configuration files. The format is auto-detected from the file extension.

demo.yaml

metadata:
  title: "My Demo"
scenarios:
  - name: "Tour"
    url: "https://example.com"
    steps:
      - action: "navigate"
        url: "https://example.com"

demo.json

{
  "metadata": {
    "title": "My Demo"
  },
  "scenarios": [{
    "name": "Tour",
    "url": "https://example.com",
    "steps": [{
      "action": "navigate",
      "url": "https://example.com"
    }]
  }]
}

💡Use demodsl init to generate a YAML template, or demodsl init -o demo.json for JSON.

Live ExampleYAML / JSON format switching

config.yaml

scenarios:
  - name: "Tab Switching"
    url: "https://fran-cois.github.io/demodsl/"
    browser: "webkit"
    viewport: { width: 1280, height: 720 }
    steps:
      - action: "scroll"
        direction: "down"
        pixels: 1800
        narration: "Scroll to the code example section."
        wait: 2.0
      - action: "click"
        locator:
          type: "text"
          value: "JSON"
        narration: "Click JSON tab to see JSON format."
        wait: 2.5
      - action: "click"
        locator:
          type: "text"
          value: "YAML"
        narration: "Switch back to YAML."
        wait: 2.0

metadata

The only required top-level section. Provides descriptive information about the demo.

Property	Type	Default	Description
title	string	—	Required. The demo title used in logs and output metadata.
description	string \| null	null	Optional description for documentation.
author	string \| null	null	Author name.
version	string \| null	null	Version string (e.g. "2.0.0").

Minimal valid config

metadata:
  title: "My Demo"

ℹ️title is the only truly required field in the entire config. Every other section and property has defaults or is optional.

voice

Configures the Text-to-Speech engine used to generate narration audio from the narration field in steps.

Property	Type	Default	Description
engine	"elevenlabs" \| "google" \| "azure" \| "aws_polly" \| "openai" \| "custom"	"elevenlabs"	TTS provider to use.
voice_id	string	"josh"	Voice identifier. Provider-specific.
speed	float	1.0	Playback speed multiplier (0.5 = half speed, 2.0 = double).
pitch	int	0	Pitch adjustment in semitones.
reference_audio	string	null	Path to a .wav/.mp3 sample of your voice for voice cloning. Supported by: elevenlabs, coqui, cosyvoice, custom.

Example

voice:
  engine: "elevenlabs"
  voice_id: "josh"
  speed: 1.0
  pitch: 0

Supported Engines

Property	Type	Default	Description
elevenlabs	—	—	High-quality neural TTS. Requires ELEVENLABS_API_KEY.
openai	—	—	OpenAI TTS (tts-1-hd). Voices: alloy, echo, fable, onyx, nova, shimmer. Requires OPENAI_API_KEY.
google	—	—	Google Cloud TTS (Wavenet). Requires GOOGLE_APPLICATION_CREDENTIALS (service account JSON path).
azure	—	—	Azure Cognitive Services Speech (Neural). Requires AZURE_SPEECH_KEY + AZURE_SPEECH_REGION.
aws_polly	—	—	Amazon Polly (Neural). Requires AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY.
cosyvoice	—	—	CosyVoice (Alibaba/Qwen). Local server. COSYVOICE_API_URL (default localhost:50000).
coqui	—	—	Coqui XTTS v2. Local inference via TTS library. COQUI_MODEL to override model.
piper	—	—	Piper TTS. Fast offline TTS via CLI. Requires PIPER_MODEL (path to .onnx).
local_openai	—	—	Any OpenAI-compatible local server (vLLM, LocalAI, AllTalk…). LOCAL_TTS_URL.
espeak	—	—	eSpeak-NG — robotic vintage voice. Zero-dependency debug TTS. ESPEAK_BIN to override binary.
gtts	—	—	Google Translate TTS (gTTS) — free, no API key. pip install gtts.

Voice IDs by Engine

Each engine uses its own voice naming convention. Set voice_id to a valid identifier for your chosen engine:

Property	Type	Default	Description
elevenlabs	voice_id	"josh"	ElevenLabs voice ID. Find IDs at elevenlabs.io/voices.
openai	voice_id	"alloy"	One of: alloy, echo, fable, onyx, nova, shimmer.
google	voice_id	"en-US-Wavenet-D"	Full voice name (e.g. "en-US-Wavenet-D", "fr-FR-Wavenet-A").
azure	voice_id	"en-US-JennyNeural"	Full voice name. Must contain "Neural" for neural voices.
aws_polly	voice_id	"Matthew"	Polly voice name (capitalized). E.g. "Joanna", "Matthew", "Léa".
cosyvoice	voice_id	"中文女"	Speaker name supported by your CosyVoice model.
coqui	voice_id	"speaker.wav"	Path to a reference .wav for voice cloning, or a built-in speaker name.
piper	voice_id	"en_US-lessac-medium.onnx"	.onnx model path, or same as PIPER_MODEL.
local_openai	voice_id	"alloy"	Voice name supported by your local server.
espeak	voice_id	"en"	eSpeak voice/language code. E.g. "en", "fr", "de", "en+whisper".
gtts	voice_id	"en"	Language code (ISO 639-1). E.g. "en", "fr", "es", "ja".
custom	voice_id	"default"	Any string. Passed as-is in the JSON body to your endpoint.

⚠️If no API key is found for the selected engine, DemoDSL automatically falls back to a DummyVoiceProvider that generates silent audio clips sized to match the narration text (~150 words per minute). This is useful for development and dry-runs.

Live ExamplegTTS voice narration synced to actions

config.yaml

voice:
  engine: "gtts"
  voice_id: "en"
  speed: 1.0

scenarios:
  - name: "Narrated Tour"
    steps:
      - action: "navigate"
        url: "https://fran-cois.github.io/demodsl/"
        narration: >
          Welcome to DemoDSL. Every step can include
          a narration field converted to speech.
        wait: 3.0
      - action: "scroll"
        direction: "down"
        pixels: 600
        narration: >
          DemoDSL supports twelve voice engines,
          from ElevenLabs to local Piper and eSpeak.
        wait: 3.0

Custom TTS endpoint

voice:
  engine: "custom"
  voice_id: "my-voice"
  speed: 1.0

# Environment variables:
#   CUSTOM_TTS_URL=https://my-tts-server.com/synthesize
#   CUSTOM_TTS_API_KEY=sk-...          (optional)
#   CUSTOM_TTS_RESPONSE_FORMAT=mp3     (mp3 or wav)

ℹ️The custom engine POSTs a JSON body {text, voice_id, speed, pitch} to your endpoint and expects raw audio bytes in the response. This lets you integrate any TTS service with a simple HTTP wrapper.

Voice Cloning (reference_audio)

Set reference_audio to a path to your own voice recording (.wav or .mp3) and DemoDSL will clone your voice on engines that support it. This way, the narration uses your voice instead of a stock voice.

Property	Type	Default	Description
elevenlabs	✓	Instant Voice Cloning	Uploads your sample via the Add Voice API. The cloned voice is cached for the session.
coqui	✓	XTTS v2 speaker_wav	Passes reference audio directly to tts_to_file(speaker_wav=...). Zero-shot cloning.
cosyvoice	✓	Zero-shot mode	Sends base64-encoded reference audio with mode="zero_shot" in the API payload.
custom	✓	Forwarded in JSON	Adds a base64-encoded reference_audio field to the JSON payload for your endpoint.
openai	✗	Not supported	OpenAI TTS does not support voice cloning.
google	✗	Not supported	Google Cloud TTS does not support voice cloning.
azure	✗	Not supported	Azure TTS does not support voice cloning.
aws_polly	✗	Not supported	Amazon Polly does not support voice cloning.
piper	✗	Not supported	Piper uses pre-trained .onnx models.
espeak	✗	Not supported	eSpeak is a formant synthesizer.
gtts	✗	Not supported	gTTS uses Google Translate voices.

Voice cloning with Coqui XTTS

voice:
  engine: "coqui"
  voice_id: "default"
  reference_audio: "samples/my_voice.wav"
  speed: 1.0

Voice cloning with ElevenLabs

voice:
  engine: "elevenlabs"
  voice_id: "josh"          # fallback if cloning fails
  reference_audio: "samples/my_voice.wav"
  speed: 1.0

ℹ️When reference_audio is set on an unsupported engine, a warning is logged and the field is ignored. The narration still generates using the standard voice_id.

audio

Controls background music, voice processing, and audio effects applied during the mix_audio pipeline stage.

audio.background_music

Property	Type	Default	Description
file	string	—	Required. Path to the audio file (MP3, WAV, OGG).
volume	float	0.3	Base volume (0.0–1.0). Converted to dB internally.
ducking_mode	"none" \| "light" \| "moderate" \| "heavy"	"moderate"	Volume reduction during narration.
loop	bool	true	Loop the music to cover the entire video duration.

Ducking modes control how much the background music volume drops when narration is playing:

Property	Type	Default	Description
none	—	0 dB	No ducking — music stays at full volume.
light	—	−6 dB	Subtle reduction. Music still audible.
moderate	—	−12 dB	Balanced. Default for most demos.
heavy	—	−20 dB	Near-silent music during speech.

audio.voice_processing

Property	Type	Default	Description
normalize	bool	true	Normalize audio loudness.
target_dbfs	int	-20	Target loudness in dBFS (decibels relative to full scale).
remove_silence	bool	true	Strip leading/trailing silence from clips.
silence_threshold	int	-40	dBFS below which audio is considered silence.
enhance_clarity	bool	false	Apply EQ boost to voice presence frequencies.
enhance_warmth	bool	false	Apply low-end EQ warmth to voice.
noise_reduction	bool	false	Remove background noise from recordings.

audio.effects

Property	Type	Default	Description
eq_preset	string \| null	null	EQ preset name (e.g. "podcast", "broadcast").
reverb_preset	string \| null	null	Reverb preset (e.g. "small_room", "hall").
compression	Compression \| null	null	Dynamic range compression settings.

audio.effects.compression

Property	Type	Default	Description
threshold	int	-20	Compression threshold in dB.
ratio	float	3.0	Compression ratio (e.g. 3.0 = 3:1).
attack	int	5	Attack time in milliseconds.
release	int	50	Release time in milliseconds.

Full audio example

audio:
  background_music:
    file: "audio/bg.mp3"
    volume: 0.3
    ducking_mode: "moderate"
    loop: true
  voice_processing:
    normalize: true
    target_dbfs: -20
    noise_reduction: true
  effects:
    eq_preset: "podcast"
    reverb_preset: "small_room"
    compression:
      threshold: -20
      ratio: 3.0
      attack: 5
      release: 50

device_rendering Beta

Wraps the captured browser video inside a 3D device mockup frame, processed during the render_device_mockup pipeline stage.

Property	Type	Default	Description
device	string	"iphone_15_pro"	Device model name.
orientation	"portrait" \| "landscape"	"portrait"	Screen orientation.
quality	"low" \| "medium" \| "high"	"high"	Render quality level.
render_engine	"eevee" \| "cycles"	"eevee"	Blender render engine. Eevee is faster, Cycles is more realistic.
camera_animation	string	"orbit_smooth"	Camera movement type around the device.
lighting	string	"studio"	Lighting preset.

Example

device_rendering:
  device: "iphone_15_pro"
  orientation: "portrait"
  quality: "high"
  render_engine: "eevee"
  camera_animation: "orbit_smooth"
  lighting: "studio"

ℹ️The render_device_mockup pipeline stage is optional. If it fails (e.g. Blender not installed), the pipeline continues with the raw video.

video

Controls video editing: intro/outro sequences, transitions between steps, watermark overlay, and output optimization. Processed during the edit_video pipeline stage.

video.intro

Property	Type	Default	Description
duration	float	3.0	Intro duration in seconds.
type	string	"fade_in"	Animation type for the intro.
text	string \| null	null	Main title text overlay.
subtitle	string \| null	null	Subtitle text below the title.
font_size	int	60	Font size in pixels.
font_color	string	"#FFFFFF"	Font color (hex).
background_color	string	"#1a1a1a"	Background color (hex).

video.transitions

Property	Type	Default	Description
type	"crossfade" \| "slide" \| "zoom" \| "dissolve"	"crossfade"	Transition style between steps.
duration	float	0.5	Transition duration in seconds.

video.watermark

Property	Type	Default	Description
image	string	—	Required. Path to the watermark image (PNG recommended).
position	"top_left" \| "top_right" \| "bottom_left" \| "bottom_right" \| "center"	"bottom_right"	Watermark position on the video.
opacity	float	0.7	Watermark opacity (0.0–1.0).
size	int	100	Watermark size in pixels (longest side).

video.outro

Property	Type	Default	Description
duration	float	4.0	Outro duration in seconds.
type	string	"fade_out"	Animation type for the outro.
text	string \| null	null	Main text overlay.
subtitle	string \| null	null	Subtitle text.
cta	string \| null	null	Call-to-action text (e.g. "Get Started").

video.optimization

Property	Type	Default	Description
target_size_mb	int \| null	null	Target file size. Bitrate is auto-calculated.
web_optimized	bool	true	Move moov atom for fast web streaming start.
compression_level	"low" \| "balanced" \| "high"	"balanced"	Encoding compression preset.

Full video example

video:
  intro:
    duration: 3.0
    type: "fade_in"
    text: "Product Name"
    subtitle: "v2.0"
    font_size: 60
    font_color: "#FFFFFF"
    background_color: "#1a1a1a"
  transitions:
    type: "crossfade"
    duration: 0.5
  watermark:
    image: "logo.png"
    position: "bottom_right"
    opacity: 0.7
    size: 100
  outro:
    duration: 4.0
    type: "fade_out"
    text: "Try it today!"
    cta: "Get Started"
  optimization:
    target_size_mb: 50
    web_optimized: true
    compression_level: "balanced"

Recording Quality

DemoDSL uses two recording backends depending on the browser. When browser: "chrome" is set, a high-quality CDP screenshot pipeline captures frames via a direct DevTools Protocol connection — completely bypassing Playwright's low-bitrate VP8 screencast. For WebKit and Firefox, an spp + hqdn3d deblocking filter is applied during export to smooth VP8 artefacts.

Native VP8 (webkit)~330 KB

CDP H.264 (chrome)~70 KB

	Native (VP8)	CDP (H.264)
Recording method	VP8 screencast	CDP screenshots
VP8 artefacts	Yes (deblocked)	None
File size	~330 KB	~70 KB
Total time	~13 s	~13 s
Supported browsers	All	Chromium only

💡Set browser: "chrome" in your scenario to automatically use CDP recording — no extra config needed. WebKit and Firefox fall back to native VP8 with post-processing deblocking.

subtitle

Burns styled subtitles into the video, synced word-by-word to narration timing. Subtitles are generated as ASS files and composited via ffmpeg. Can be set at the top level (applies to all scenarios) or per-scenario.

Property	Type	Default	Description
enabled	bool	true	Enable subtitle overlay.
style	"classic" \| "tiktok" \| "color" \| "word_by_word" \| "typewriter" \| "karaoke" \| "bounce" \| "cinema" \| "highlight_line" \| "fade_word" \| "emoji_react"	"classic"	Subtitle display style (see table below).
speed	"slow" \| "normal" \| "fast" \| "tiktok"	"normal"	Display speed preset — controls words per second.
font_size	int	48	Font size in pixels.
font_family	string	"Arial"	Font family name.
font_color	string	"#FFFFFF"	Primary text color (hex).
background_color	string	"rgba(0,0,0,0.6)"	Background fill behind text (hex or rgba).
position	"bottom" \| "center" \| "top"	"bottom"	Vertical position on screen.
highlight_color	string	"#FFD700"	Accent color for highlighted words.
max_words_per_line	int	8	Maximum words per subtitle line.
animation	"none" \| "fade" \| "pop" \| "slide"	"none"	Text entrance animation.

Subtitle Styles

Each style preset configures defaults for font size, position, colors, and animation. User values always override the preset.

Property	Type	Default	Description
classic	42px, bottom, white on dark box	—	Traditional subtitle bar at the bottom. Clean, readable.
tiktok	64px, center, bold word-by-word	—	Large centered text, one highlighted word at a time. Social media style.
color	48px, bottom, word highlight	—	Full line visible, current word changes to accent color.
word_by_word	56px, center, single word	—	One word at a time, centered. Maximum emphasis.
typewriter	44px, bottom, green on black	—	Characters appear letter by letter. Terminal/hacker aesthetic.
karaoke	52px, bottom, progressive fill	—	Words fill with color progressively, karaoke-bar style.
bounce	60px, center, scale animation	—	Words pop in with a bounce scale effect (120% → 100%).
cinema	38px, bottom, italic serif	—	Elegant italic serif font with shadow. Film subtitle look.
highlight_line	46px, bottom, dim/bright	—	Current line is bright white, rest stays dimmed gray.
fade_word	50px, center, fade-in	—	Each word fades in with a smooth alpha transition.
emoji_react	52px, bottom, emoji prefix	—	Auto-picks a contextual emoji based on narration keywords.

Style Demos

Each video below shows a subtitle style in action on short sample narration text.

Live Exampleclassic — traditional bottom bar

config.yaml

subtitle:
  style: "classic"
  speed: "normal"
  font_size: 42
  position: "bottom"

Live Exampletiktok — bold centered word-by-word

config.yaml

subtitle:
  style: "tiktok"
  speed: "fast"
  font_size: 64
  position: "center"
  highlight_color: "#FFD700"

Live Examplecolor — current word highlight

config.yaml

subtitle:
  style: "color"
  speed: "normal"
  highlight_color: "#00FF88"

Live Exampleword_by_word — one word at a time

config.yaml

subtitle:
  style: "word_by_word"
  speed: "normal"
  font_size: 56
  position: "center"

Live Exampletypewriter — letter-by-letter reveal

config.yaml

subtitle:
  style: "typewriter"
  font_color: "#00FF00"
  background_color: "rgba(0,0,0,0.8)"

Live Examplekaraoke — progressive color fill

config.yaml

subtitle:
  style: "karaoke"
  highlight_color: "#FF4444"
  position: "bottom"

Live Examplebounce — scale-pop animation

config.yaml

subtitle:
  style: "bounce"
  font_size: 60
  position: "center"

Live Examplecinema — italic serif with shadow

config.yaml

subtitle:
  style: "cinema"
  font_family: "Georgia"
  font_size: 38

Live Examplehighlight_line — dim/bright current line

config.yaml

subtitle:
  style: "highlight_line"
  highlight_color: "#FFFFFF"
  font_color: "#888888"

Live Examplefade_word — smooth alpha fade-in

config.yaml

subtitle:
  style: "fade_word"
  font_size: 50
  position: "center"

Live Exampleemoji_react — contextual emoji prefix

config.yaml

subtitle:
  style: "emoji_react"
  font_size: 52
  highlight_color: "#FFD700"

Speed Presets

Property	Type	Default	Description
slow	1.5 wps	—	Slow pace — good for technical content or tutorials.
normal	2.5 wps	—	Standard reading pace.
fast	4.0 wps	—	Fast pace for experienced viewers.
tiktok	6.0 wps	—	Very fast — matches TikTok/Reels pacing.

Top-level subtitle (all scenarios)

subtitle:
  enabled: true
  style: "tiktok"
  speed: "fast"
  font_size: 64
  highlight_color: "#FFD700"
  position: "center"

scenarios:
  - name: "Demo"
    url: "https://myapp.com"
    steps:
      - action: "navigate"
        url: "https://myapp.com"
        narration: "This text becomes a subtitle!"

pipeline:
  - generate_narration: {}
  - burn_subtitles: {}
  - edit_video: {}

Per-scenario subtitle override

scenarios:
  - name: "Intro"
    url: "https://myapp.com"
    subtitle:
      enabled: true
      style: "cinema"
      speed: "slow"
      font_family: "Georgia"
    steps:
      - action: "navigate"
        url: "https://myapp.com"
        narration: "An elegant introduction."
  - name: "Features"
    subtitle:
      style: "bounce"
      speed: "fast"
    steps:
      - action: "scroll"
        direction: "down"
        pixels: 500
        narration: "Fast-paced feature showcase!"

💡Add burn_subtitles: {} to your pipeline to enable subtitle rendering. Subtitles are generated from the narration field of each step — no separate subtitle file needed.

ℹ️The emoji_react style automatically picks emojis based on narration keywords: 👆 for "click", 📜 for "scroll", ⚡ for "fast", 🎬 for "video", and more. A 💬 default is used when no keyword matches.

languages

Generate multi-track audio narration and multi-language subtitles in a single render. The same scenario is recorded once, then narration is synthesised in every requested language and either embedded as additional tracks in the final MP4 or written as sidecar files (narration_{lang}.mp3, subtitles_{lang}.ass).

overview

Property	Type	Default	Description
default	string	"en"	Source language used by step narration: fields. BCP-47 (e.g. en, fr, en-US).
targets	list[string]	[]	Additional languages to render.
voices	dict[str, VoiceConfig]	{}	Optional per-language voice override (engine, voice_id, etc.).
embed	bool	true	When true, mux all audio + subtitle tracks into a single MP4. When false, write sidecar files next to the output.
burn_default	bool	false	When true, also burn the default-language subtitles into the picture (useful for social clips).
audio_only	bool	false	Only generate per-language audio tracks (no subtitle tracks).
subtitle_only	bool	false	Only generate per-language subtitle tracks (no audio tracks).

demo_multilang.yaml

metadata:
  title: Multilang demo

voice:
  engine: gtts
  voice_id: fr

languages:
  default: fr
  targets: [en, de]
  embed: true
  voices:
    en: { engine: gtts, voice_id: en }
    de: { engine: gtts, voice_id: de }

scenarios:
  - name: tour
    url: https://example.com
    steps:
      - narration: Bienvenue sur notre site.
        narrations:
          en: Welcome to our website.
          de: Willkommen auf unserer Website.
        action: scroll
        amount: 400

per-step translations

Each step gains an optional narrations mapping. Keys are BCP-47 language codes; values are the translated narration text. When a translation is missing, the engine falls back to the basenarration field so a partial translation never blocks the render.

steps:
  - narration: Cliquez ici pour commencer.
    narrations:
      en: Click here to get started.
      de: Klicken Sie hier, um zu beginnen.
    action: click
    target: "#start"

per-language voices

The languages.voices map lets each language use its own TTS engine, voice id, speed, etc. Any field omitted in the override inherits from the top-level voice block.

voice:
  engine: elevenlabs
  voice_id: french_voice_id

languages:
  default: fr
  targets: [en, ja]
  voices:
    en:
      engine: elevenlabs
      voice_id: english_voice_id
    ja:
      engine: openai
      voice_id: nova

embedded vs sidecar

With embed: true (default), the final MP4 contains one AAC audio track per language (with proper language= metadata, default-disposition on the source language) and one mov_text subtitle track per language. Players such as VLC, QuickTime, YouTube and Vimeo expose them as selectable tracks.

With embed: false, the engine still produces a single MP4 (default-language audio burnt-in) plus sidecar files:narration_en.mp3, subtitles_en.ass, and so on for each target language. Useful when uploading to platforms that require external caption files.

ℹ️When languages is active, the regular subtitle burn-in is skipped to keep the picture clean — set languages.burn_default: true to re-enable burning of the default-language subtitles.

CLI usage

# Standard render — picks up languages: from the YAML
demodsl run demo_multilang.yaml

# Inspect the planned tracks without rendering
demodsl run demo_multilang.yaml --dry-run

⚠️ffmpeg is required for multi-track muxing. If muxing fails, the engine gracefully falls back to a single-track export.

scenarios

A list of browser automation scenarios. Each scenario captures a recording from a web application. Multiple scenarios are concatenated in the final video.

Live ExampleTwo scenarios in one config

config.yaml

scenarios:
  - name: "Landing Page Overview"
    url: "https://fran-cois.github.io/demodsl/"
    browser: "webkit"
    steps:
      - action: "navigate"
        url: "https://fran-cois.github.io/demodsl/"
        narration: "Scenario one: the landing page."
        wait: 2.0
      - action: "scroll"
        direction: "down"
        pixels: 800
        narration: "Scroll through features."
        wait: 2.0

  - name: "Docs Deep Dive"
    url: "https://fran-cois.github.io/demodsl/docs"
    browser: "webkit"
    steps:
      - action: "navigate"
        url: "https://fran-cois.github.io/demodsl/docs"
        narration: "Scenario two: the docs page."
        wait: 2.0

Property	Type	Default	Description
name	string	—	Required. Human-readable scenario name.
url	string	—	Required. Base URL for the scenario.
browser	"chrome" \| "firefox" \| "webkit"	"chrome"	Browser engine (Playwright).
viewport	Viewport	1920×1080	Browser viewport dimensions.
cursor	CursorConfig	null	Visible cursor overlay mode. Shows mouse movement and click effects.
glow_select	GlowSelectConfig	null	Apple Intelligence-style animated glow highlight around clicked elements.
popup_card	PopupCardConfig	null	Popup card overlay synced with narration. Shows text and progressive item reveals.
avatar	AvatarConfig	null	Animated avatar overlay synced with narration audio. Free (animated) or paid (D-ID, HeyGen) providers.
subtitle	SubtitleConfig	null	Subtitle overlay config (per-scenario override). Overrides top-level subtitle settings.
steps	Step[]	[]	List of automation steps.

scenarios[].viewport

Property	Type	Default	Description
width	int	1920	Viewport width in pixels.
height	int	1080	Viewport height in pixels.

💡Common viewport sizes: 1920×1080 (Full HD), 1280×720 (HD), 390×844 (iPhone 14), 1024×768 (tablet).

Example

scenarios:
  - name: "Main Demo"
    url: "https://myapp.com"
    browser: "chrome"
    viewport:
      width: 1920
      height: 1080
    steps:
      - action: "navigate"
        url: "https://myapp.com"

scenarios[].cursor

Injects a visible fake cursor overlay captured in the recorded video. The cursor animates towards each target element before click/type actions and plays a visual effect on click.

Property	Type	Default	Description
visible	bool	true	Whether the cursor is shown.
style	"dot" \| "pointer"	"dot"	Cursor shape. Dot = circle, pointer = arrow SVG.
color	string	"#ef4444"	Cursor color (hex).
size	int	20	Cursor size in pixels.
click_effect	"ripple" \| "pulse" \| "none"	"ripple"	Visual effect on click.
smooth	float	0.4	Animation duration in seconds (ease-out).

Live ExampleCursor overlay — visible mouse movement + click ripple

config.yaml

scenarios:
  - name: "Cursor Showcase"
    url: "https://fran-cois.github.io/demodsl/"
    browser: "webkit"
    cursor:
      visible: true
      style: "dot"
      color: "#ef4444"
      size: 20
      click_effect: "ripple"
      smooth: 0.4
    steps:
      - action: "click"
        locator:
          type: "text"
          value: "Get Started"
        narration: "Cursor moves to the button and clicks."
        wait: 2.0
      - action: "click"
        locator:
          type: "text"
          value: "Documentation"
        narration: "Smooth animation to each target."
        wait: 2.0

scenarios[].glow_select

Apple Intelligence-style animated gradient glow that highlights elements before click and type actions. The glow pulses with a rotating hue and fades out after the action.

Property	Type	Default	Description
enabled	bool	true	Whether glow-select is active.
colors	string[]	["#a855f7","#6366f1","#ec4899","#a855f7"]	Gradient color stops for the glow border.
duration	float	0.8	Hue rotation cycle duration in seconds.
padding	int	8	Extra padding around the element bounding box.
border_radius	int	12	Border radius of the glow overlay.
intensity	float	0.9	Glow opacity (0–1).

💡Combine cursor and glow_select for a polished demo experience. The cursor animates into the glowing element, then clicks.

Live ExampleGlow select — Apple Intelligence-style highlight on click

config.yaml

scenarios:
  - name: "Glow Select Showcase"
    url: "https://fran-cois.github.io/demodsl/"
    browser: "webkit"
    cursor:
      style: "dot"
      color: "#a855f7"
    glow_select:
      enabled: true
      colors: ["#a855f7","#6366f1","#ec4899","#a855f7"]
      duration: 0.8
      padding: 8
      border_radius: 12
    steps:
      - action: "click"
        locator:
          type: "text"
          value: "Get Started"
        narration: "Glow appears around the button."
        wait: 2.0
      - action: "click"
        locator:
          type: "text"
          value: "Documentation"
        narration: "Each element gets the glow treatment."
        wait: 2.0

The popup_card mode injects styled overlay cards that appear synced with narration. When a step has a card field with a list of items, they are revealed progressively — each bullet appears one by one, timed to match the narrator.

Property	Type	Default	Description
enabled	boolean	true	Enable the popup card overlay.
position	"bottom-right" \| "bottom-left" \| "top-right" \| "top-left" \| "bottom-center" \| "top-center"	"bottom-right"	Card position on screen.
theme	"glass" \| "dark" \| "light" \| "gradient"	"glass"	Visual theme for the card.
max_width	number	420	Maximum card width in pixels.
animation	"slide" \| "fade" \| "scale"	"slide"	Entrance/exit animation style.
accent_color	string	"#818cf8"	Accent color for bullets and progress bar.
show_icon	boolean	true	Show emoji icon in the card header.
show_progress	boolean	true	Show a progress bar synced with narration duration.

Each step can include a card object with:

Property	Type	Default	Description
card.title	string	null	Card title text.
card.body	string	null	Card body/description text.
card.items	string[]	null	Bullet-point list. Revealed progressively when narration is present.
card.icon	string	null	Emoji or short text shown in the header (e.g. "🚀").

Live ExamplePopup cards — synced text overlays with progressive item reveal

config.yaml

scenarios:
  - name: "Card Overlay Tour"
    url: "https://fran-cois.github.io/demodsl/"
    browser: "webkit"
    popup_card:
      enabled: true
      position: "bottom-right"
      theme: "glass"
      animation: "slide"
    steps:
      - action: "navigate"
        url: "https://fran-cois.github.io/demodsl/"
        narration: "Welcome to DemoDSL."
        card:
          title: "DemoDSL"
          body: "A DSL-driven automated demo generator."
          icon: "🎬"
      - action: "scroll"
        direction: "down"
        pixels: 600
        narration: "Six integrated phases."
        card:
          title: "Six Phases"
          icon: "⚡"
          items:
            - "Browser Automation"
            - "Voice Narration"
            - "Visual Effects"
            - "Video Composition"
            - "Audio Mixing"
            - "Multi-format Export"

scenarios[].avatar

An animated avatar overlay that reacts to narration audio in real time. The avatar lip-syncs to TTS amplitude and is composited on top of the video at the chosen corner. Two provider types are available: animated (free, Pillow-generated) and API-based (D-ID, HeyGen, SadTalker — paid or self-hosted).

Property	Type	Default	Description
enabled	bool	true	Whether the avatar overlay is active.
provider	"animated" \| "d-id" \| "heygen" \| "sadtalker"	"animated"	Avatar generation engine. Animated is free, others require an API key.
image	string \| null	null	Path, URL (http/https), or preset name ("default", "robot", "circle"). URLs are downloaded and cached locally.
position	"bottom-right" \| "bottom-left" \| "top-right" \| "top-left"	"bottom-right"	Corner position of the avatar on the video.
size	int	120	Avatar diameter in pixels.
style	"bounce" \| "waveform" \| "pulse" \| "equalizer" \| "xp_bliss" \| "clippy" \| "visualizer"	"bounce"	Animation style (animated provider only). See table below.
shape	"circle" \| "rounded" \| "square"	"circle"	Avatar outline shape.
background	string	"rgba(0,0,0,0.5)"	Background fill behind the avatar (CSS color or rgba).
background_shape	"square" \| "circle" \| "rounded"	"square"	Shape of the avatar background. Use circle for a fully round overlay.
api_key	string \| null	null	API key for paid providers. Supports env-var syntax: "${D_ID_API_KEY}".
show_subtitle	bool	false	Display narration text below the avatar box during playback.
subtitle_font_size	int	18	Font size for the avatar subtitle text.
subtitle_font_color	string	"#FFFFFF"	Font color for the avatar subtitle.
subtitle_bg_color	string	"rgba(0,0,0,0.7)"	Background color for the avatar subtitle box.

Animation Styles (free)

These styles are available with the animated provider. Each generates a different visual animation from the narration audio waveform.

Property	Type	Default	Description
bounce	—	—	A circle that scales up and down with audio amplitude. Simple and clean.
waveform	—	—	Radial wave ring that expands from the center with audio pulses.
pulse	—	—	Glowing disc with a pulsing aura effect. Subtle and professional.
equalizer	—	—	Neon equalizer bars (Windows XP era). Retro audio visualizer look.
xp_bliss	—	—	Windows XP Bliss-inspired hills, sun and floating music notes.
clippy	—	—	Animated paperclip with googly eyes. A nostalgic Microsoft Office mascot.
visualizer	—	—	Circular spectrum analyzer with rainbow gradient bars.
pacman	—	—	Pac-Man chomping dots with a colorful ghost. Arcade nostalgia.
space_invader	—	—	Pixel-art Space Invaders alien with shields and cannon. Retro arcade.
mario_block	—	—	Bouncing Mario "?" block that pops coins on loud audio. Iconic gaming.
nyan_cat	—	—	Pixel-art cat on a rainbow trail with scrolling stars. Internet classic.
matrix	—	—	Cascading green Matrix code rain with avatar in the center.
pickle_rick	—	—	Pickle Rick with rat limbs, expressive eyes, and yelling mouth. Wubba lubba dub dub!
chrome_dino	—	—	Chrome's offline T-Rex dinosaur with desert, cacti, and 'No internet' message.
marvin	—	—	Marvin the Paranoid Android with sad eyes and depressive quotes. H2G2 classic.
mac128k	—	—	Macintosh 128K with expressive face on green screen. Retro computing icon.
floppy_disk	—	—	3.5" floppy disk with face, label, and '1.44 MB' nostalgia.
bsod	—	—	Blue Screen of Death with progressive error text and sad :( emoticon.
bugdroid	—	—	Android's green Bugdroid robot with waving arms and antennae.
qr_code	—	—	QR code pattern with expressive eyes in the center. 'SCAN ME!'
gpu_sweat	—	—	Sweating GPU with spinning fan, temperature display, and sweat drops.
rubber_duck	—	—	Yellow rubber duck debugging companion with judgmental speech bubbles.
fail_whale	—	—	Twitter's Fail Whale carried by birds. 'Twitter is over capacity.'
server_rack	—	—	Overheating server rack with red eyes, smoke, blinking LEDs, and temp bar.
cursor_hand	—	—	Windows pointing hand cursor that bosses you around. 'Click here!'
vhs_tape	—	—	VHS cassette with spinning reels, label, and scanlines. 'Be kind, rewind!'
cloud	—	—	Cute but capricious cloud with rain, lightning, and data ownership jokes.
wifi_low	—	—	Wi-Fi icon with one bar that stutters and cuts off mid-sen—
nokia3310	—	—	The indestructible Nokia 3310 with Snake and warrior quotes.
cookie	—	—	Browser cookie with creepy eyes that knows your browsing habits.
modem56k	—	—	56k modem with blinking LEDs, dial-up sounds, and green waveform.
esc_key	—	—	Panicked Escape key trying to break free — sweat drops & frantic quotes.
sad_mac	—	—	Classic dead Macintosh with X-eyed icon, error codes & hardware trauma.
usb_cable	—	—	Tangled USB-A cable frustrated by 3-try insertion. Always wrong side.
hourglass	—	—	Windows hourglass that speaks very slowly while sand trickles down.
firewire	—	—	Forgotten FireWire 400 cable living in a drawer, reminiscing glory days.
ai_hallucinated	—	—	Glitching robot mixing facts with recipes — spiral eye & glitch lines.
tamagotchi	—	—	Abandoned pixel egg pet asking why you haven't fed it since 1998.
lasso_tool	—	—	Obsessive Photoshop selection tool with marching ants on checkerboard.
battery_low	—	—	Battery at 1% — red, blinking, talks fast then cuts off abruptly.
incognito	—	—	Chrome Incognito detective with fedora & glasses. Sees nothing.
rainbow_wheel	—	—	Mac spinning rainbow wheel — hypnotic, unstoppable, rage-inducing.
error_404	—	—	Lost 404 page wandering around with question marks, literally unfindable.
google_blob	—	—	Google’s old melted blob emoji, nostalgic for its expressive past.
bit	—	—	Binary bit (0/1) with matrix rain — answers only Yes or No.
pc_fan	—	—	Spinning PC fan screaming at full RPM when you open 3 Chrome tabs.
captcha	—	—	Twisted, illegible CAPTCHA yelling PROVE YOU’RE HUMAN!
bluetooth	—	—	Bluetooth logo desperately searching, always failing to pair.
registry_key	—	—	Windows Registry key — bureaucratic folder controlling everything.
high_ping	—	—	999ms ping avatar with buffering spinner, responds 10 sec late.
scratched_cd	—	—	Scratched CD-ROM with rainbow reflections, st-st-stuttering speech.
kermit	—	—	Kermit sipping tea — but that's none of my business.
this_is_fine	—	—	Dog sitting in flames saying 'This is fine.'
trollface	—	—	Classic Trollface with a mocking grin — Problem?
no_idea_dog	—	—	Golden retriever at a computer — I have no idea what I'm doing.
surprised_pikachu	—	—	Pikachu with open mouth — feigned surprise at the obvious.
distracted_bf	—	—	Distracted boyfriend looking at the shiny new framework.
success_kid	—	—	Kid with clenched fist celebrating small victories.
expanding_brain	—	—	Luminous expanding brain — transcended enlightenment.
doge	—	—	Shiba Inu with floating 'such wow' 'much code' words.
wiki_globe	—	—	Wikipedia puzzle globe with glasses — [citation needed].

bounce

Scales up/down with audio

waveform

Radial wave ring

pulse

Glowing aura effect

equalizer

Neon retro bars

xp_bliss

Windows XP hills & notes

clippy

Animated paperclip mascot

visualizer

Circular spectrum analyzer

pacman

Arcade chomper & ghost

space_invader

Pixel-art alien arcade

mario_block

Bouncing "?" block with coins

nyan_cat

Rainbow trail pixel cat

matrix

Cascading green code rain

pickle_rick

Pickle Rick with rat limbs

chrome_dino

Chrome's offline T-Rex

marvin

Paranoid Android, depressive quotes

mac128k

Macintosh 128K retro green screen

floppy_disk

3.5" floppy with 1.44 MB nostalgia

bsod

Blue Screen of Death :(

bugdroid

Android's green robot

qr_code

QR code pattern — SCAN ME!

gpu_sweat

Sweating GPU with spinning fan

rubber_duck

Debugging companion duck

fail_whale

Twitter's over capacity whale

server_rack

Overheating server with smoke

cursor_hand

Bossy pointing hand cursor

vhs_tape

VHS cassette — Be kind, rewind!

cloud

Cute capricious cloud with rain

wifi_low

One bar Wi-Fi, cuts off mid-sen—

nokia3310

Indestructible Nokia with Snake

Creepy browser cookie that knows all

modem56k

56k modem — psshhh-kkkk-ding

esc_key

Panicked Esc key — LET ME OUT!

sad_mac

Dead Macintosh with X eyes

usb_cable

Tangled USB — wrong side, again

hourglass

Slow hourglass — Please… wait…

firewire

Forgotten cable in a drawer

ai_hallucinated

Glitching robot mixing facts

tamagotchi

Abandoned pet since 1998

lasso_tool

Obsessive selection tool

battery_low

1% battery — dying fast

incognito

Chrome detective sees nothing

rainbow_wheel

Mac spinning wheel of doom

error_404

Lost page, literally unfindable

google_blob

Old melted blob emoji, nostalgic

bit

Binary 0/1 — answers Yes or No

pc_fan

Screaming fan — MAX RPM!

captcha

PROVE YOU'RE HUMAN!

bluetooth

Desperately searching, pairing failed

registry_key

Bureaucratic folder, controls all

high_ping

999ms — responds 10 sec late

scratched_cd

Sk-sk-skip! Stuttering CD

kermit

None of my business… *sips tea*

this_is_fine

Dog in flames — everything is fine

trollface

Problem? U mad bro?

no_idea_dog

I have no idea what I'm doing

surprised_pikachu

Feigned surprise :O

distracted_bf

Looking at the new framework

success_kid

Fist pump! It compiled!

expanding_brain

Transcended the codebase

doge

Such code. Much wow. Very deploy.

wiki_globe

[citation needed]

Free animated avatar (equalizer)

scenarios:
  - name: "Demo with Avatar"
    url: "https://myapp.com"
    avatar:
      enabled: true
      provider: "animated"
      style: "equalizer"
      position: "bottom-right"
      size: 100
      shape: "circle"
      background: "rgba(0,0,0,0.6)"
    steps:
      - action: "navigate"
        url: "https://myapp.com"
        narration: "The avatar reacts to this narration."
        wait: 2.0

Avatar with custom image from URL

scenarios:
  - name: "Demo with Custom Avatar"
    url: "https://myapp.com"
    avatar:
      enabled: true
      provider: "animated"
      image: "https://avatars.githubusercontent.com/u/22380190?v=4"
      style: "bounce"
      position: "bottom-right"
      size: 120
      shape: "circle"
    steps:
      - action: "navigate"
        url: "https://myapp.com"
        narration: "My avatar uses an image loaded from a URL."
        wait: 2.0

Paid D-ID avatar (talking head)

scenarios:
  - name: "Demo with Talking Head"
    url: "https://myapp.com"
    avatar:
      enabled: true
      provider: "d-id"
      image: "presenter.jpg"
      position: "bottom-left"
      size: 200
      api_key: "${D_ID_API_KEY}"
    steps:
      - action: "navigate"
        url: "https://myapp.com"
        narration: "A real talking-head avatar powered by D-ID."
        wait: 3.0

💡Combine avatar with cursor and glow_select for a fully polished demo experience. Add composite_avatar to your pipeline to enable the overlay.

Avatar with inline subtitles

scenarios:
  - name: "Demo with Avatar Subtitles"
    url: "https://myapp.com"
    avatar:
      enabled: true
      provider: "animated"
      style: "clippy"
      position: "bottom-right"
      size: 100
      show_subtitle: true
      subtitle_font_size: 16
    steps:
      - action: "navigate"
        url: "https://myapp.com"
        narration: "Narration text appears right below the avatar."
        wait: 2.0

Live ExampleAvatar + subtitles — synced to narration

config.yaml

subtitle:
  enabled: true
  style: "cinema"
  speed: "normal"

scenarios:
  - name: "Site Tour"
    url: "https://fran-cois.github.io/demodsl/"
    avatar:
      enabled: true
      provider: "animated"
      style: "clippy"
      position: "bottom-right"
      size: 100
      shape: "circle"
    steps:
      - action: "navigate"
        url: "https://fran-cois.github.io/demodsl/"
        narration: "The avatar pulses to each narration."
        wait: 2.0
      - action: "scroll"
        direction: "down"
        pixels: 600
        narration: "Subtitles appear in cinema style."
        wait: 2.0

pipeline:
  - composite_avatar: {}
  - burn_subtitles: {}
  - edit_video: {}
  - mix_audio: {}
  - optimize: {}

steps

Steps define individual browser actions within a scenario. Each step has an action type and action-specific fields. All steps also support optional narration, wait, and effects.

Common Fields (all actions)

Property	Type	Default	Description
action	"navigate" \| "click" \| "type" \| "scroll" \| "wait_for" \| "screenshot"	—	Required. The action type.
narration	string \| null	null	Text-to-speech narration played during this step.
wait	float \| null	null	Seconds to wait after the action completes.
effects	Effect[]	null	Visual effects to apply during this step.
card	CardContent \| null	null	Popup card content (title, body, items, icon). Shown synced with narration when popup_card mode is enabled.

action: "navigate"

Navigate the browser to a URL.

Property	Type	Default	Description
url	string	—	Required. The URL to navigate to.

- action: "navigate"
  url: "https://myapp.com/dashboard"
  narration: "Let's visit the dashboard."
  wait: 2.0

action: "click"

Click on an element identified by a locator.

Property	Type	Default	Description
locator.type	"css" \| "id" \| "xpath" \| "text"	"css"	Locator strategy.
locator.value	string	—	Required. The selector/identifier.

- action: "click"
  locator:
    type: "css"
    value: "#submit-btn"
  narration: "Click submit."
  effects:
    - type: "highlight"
      color: "#FFD700"

Live ExampleClick Actions — using text locators

config.yaml

scenarios:
  - name: "Click Interactions"
    url: "https://fran-cois.github.io/demodsl/"
    browser: "webkit"
    viewport: { width: 1280, height: 720 }
    steps:
      - action: "navigate"
        url: "https://fran-cois.github.io/demodsl/"
        narration: "Open the documentation site."
        wait: 2.0
      - action: "click"
        locator:
          type: "text"
          value: "Get Started"
        narration: "Click Get Started via text locator."
        wait: 1.5
      - action: "click"
        locator:
          type: "text"
          value: "GitHub →"
        narration: "Click the GitHub link."
        wait: 2.0

action: "type"

Type text into an input field.

Property	Type	Default	Description
locator.type	"css" \| "id" \| "xpath" \| "text"	"css"	Locator strategy.
locator.value	string	—	Required. The selector.
value	string	—	Required. The text to type.

- action: "type"
  locator:
    type: "id"
    value: "email"
  value: "user@example.com"
  effects:
    - type: "typewriter"
      speed: 0.1

action: "scroll"

Scroll the page in a direction.

Property	Type	Default	Description
direction	"up" \| "down" \| "left" \| "right"	"down"	Scroll direction.
pixels	int	300	Number of pixels to scroll.

- action: "scroll"
  direction: "down"
  pixels: 500
  narration: "Scrolling to see more features."

Live ExampleNavigate & Scroll — generated from this config

config.yaml

scenarios:
  - name: "Navigate and Scroll"
    url: "https://fran-cois.github.io/demodsl/"
    browser: "webkit"
    viewport: { width: 1280, height: 720 }
    steps:
      - action: "navigate"
        url: "https://fran-cois.github.io/demodsl/"
        narration: "Navigate to the target URL."
        wait: 2.0
      - action: "scroll"
        direction: "down"
        pixels: 400
        narration: "Scroll down 400 pixels."
        wait: 1.5
      - action: "scroll"
        direction: "down"
        pixels: 600
        narration: "Continue scrolling."
        wait: 1.5
      - action: "scroll"
        direction: "up"
        pixels: 300
        narration: "Scroll back up."
        wait: 1.5

action: "wait_for"

Wait for an element to appear in the DOM.

Property	Type	Default	Description
locator.type	"css" \| "id" \| "xpath" \| "text"	"css"	Locator strategy.
locator.value	string	—	Required. The selector.
timeout	float	5.0	Maximum wait time in seconds.

- action: "wait_for"
  locator:
    type: "css"
    value: ".dashboard-loaded"
  timeout: 10.0
  narration: "Waiting for the dashboard to load."

⚠️If the element is not found within timeout seconds, the step throws an error and the scenario stops.

Live Examplewait_for — wait for elements before interacting

config.yaml

steps:
  - action: "navigate"
    url: "https://fran-cois.github.io/demodsl/docs"
    narration: "Navigate to the docs page."
    wait: 2.0
  - action: "wait_for"
    locator:
      type: "css"
      value: "nav a"
    timeout: 5.0
    narration: "Wait for the sidebar nav to load."
    wait: 1.5
  - action: "click"
    locator:
      type: "css"
      value: "a[href='#effects']"
    narration: "Click the effects link."
    wait: 2.0
  - action: "wait_for"
    locator:
      type: "css"
      value: "#effects"
    timeout: 5.0
    narration: "Wait for effects heading to appear."
    wait: 1.5

action: "screenshot"

Capture a screenshot of the current page.

Property	Type	Default	Description
filename	string	"screenshot.png"	Output filename. Saved to the workspace frames directory.

- action: "screenshot"
  filename: "final_state.png"
  narration: "Here's the final result."

Live ExampleMobile Viewport & Screenshot capture

config.yaml

scenarios:
  - name: "Mobile Capture"
    url: "https://fran-cois.github.io/demodsl/"
    browser: "webkit"
    viewport:
      width: 390
      height: 844
    steps:
      - action: "navigate"
        url: "https://fran-cois.github.io/demodsl/"
        narration: "Load in mobile viewport, 390x844."
        wait: 2.0
      - action: "scroll"
        direction: "down"
        pixels: 400
        narration: "See the responsive layout."
        wait: 1.5
      - action: "screenshot"
        filename: "mobile_capture.png"
        narration: "Take a screenshot."
        wait: 1.0

Locator Types

Four locator strategies are available for identifying elements:

Property	Type	Default	Description
css	—	—	CSS selector. Examples: "#id", ".class", "button[type=submit]"
id	—	—	Element ID (shorthand for #id). Example: "email-input"
xpath	—	—	XPath expression. Example: "//div[@class='card']"
text	—	—	Visible text content. Example: "Sign Up"

💡Prefer css selectors for stability. Use text locators for buttons or links where the visible text is more stable than the CSS structure.

Live ExampleAll locator types in action

config.yaml

steps:
  - action: "click"
    locator:
      type: "text"
      value: "Get Started"
    narration: "Text locator: click by visible text."
    wait: 2.0
  - action: "click"
    locator:
      type: "text"
      value: "Documentation"
    narration: "Text locator: click Documentation."
    wait: 2.0
  - action: "click"
    locator:
      type: "css"
      value: "a[href='#pipeline']"
    narration: "CSS locator: jump to pipeline."
    wait: 2.0
  - action: "scroll"
    direction: "down"
    pixels: 400
    narration: "Supports css, id, xpath, and text."
    wait: 1.5

effects

43 visual effects are available, split into five categories: browser effects (11 — injected as CSS/JS during capture), cursor trail variants (6 — animated trails following the cursor), fun / celebration effects (6 — confetti-style canvas overlays), post-processing effects (7 — applied to the rendered video via MoviePy), and camera & cinematic effects (13 — advanced camera movements and cinematic post-processing). Effects are attached to individual steps.

Property	Type	Default	Description
type	EffectType	—	Required. Effect name (see tables below).
duration	float \| null	null	Effect duration in seconds.
intensity	float \| null	null	Effect intensity (0.0–1.0).
color	string \| null	null	Effect color (hex). Used by highlight, glow, neon_glow.
speed	float \| null	null	Animation speed. Used by typewriter, camera_shake, rotate.
scale	float \| null	null	Zoom scale factor. Used by zoom_pulse, drone_zoom, ken_burns, zoom_to, elastic_zoom.
depth	int \| null	null	Parallax depth. Used by parallax.
direction	string \| null	null	Direction ("left", "right", "up", "down"). Used by slide_in, ken_burns, whip_pan, focus_pull.
target_x	float \| null	null	Normalized X position (0.0–1.0). Used by drone_zoom, zoom_to.
target_y	float \| null	null	Normalized Y position (0.0–1.0). Used by drone_zoom, zoom_to.
angle	float \| null	null	Rotation angle in degrees. Used by rotate.
ratio	float \| null	null	Aspect ratio (e.g. 2.35 for cinemascope). Used by letterbox.
preset	string \| null	null	Color grade preset ("warm", "cool", "desaturate", "vintage", "cinematic"). Used by color_grade.
focus_position	float \| null	null	Focus band position (0.0–1.0). Used by tilt_shift.

Browser Effects (real-time JS injection)

These effects inject CSS/JavaScript into the browser during capture, creating real-time visual overlays.

Property	Type	Default	Description
spotlight	duration(2), intensity(0.7)	—	Radial gradient spotlight overlay, darkens edges.
highlight	duration(2), color(#FFD700), intensity(0.8)	—	Glowing box-shadow on hovered elements.
confetti	duration(3), count(150), colors([list]), speed_min(1.5), speed_range(3.0)	—	Animated falling confetti particles (canvas).
typewriter	duration(2), caret_color(#333), blink_speed(0.7), bg_color, text_color, font_size(18), label	—	Blinking caret animation on input fields.
glow	duration(2), color(#6366f1)	—	Inner box-shadow glow around the viewport.
shockwave	duration(0.8), color(#FF5722), glow_color, border_width(4), max_size(600), glow(15)	—	Expanding ring animation from center.
sparkle	duration(3), count(80), color(#FFD700), min_size(2), max_size(8)	—	Random sparkling golden dots (canvas).
cursor_trail	duration(3), color(#a855f7), size(22), glow(14), fade_duration(1.2), max_dots(80)	—	Trailing particles following the cursor.
ripple	duration(0.6), color(#4FC3F7), glow_color, border_width(3), max_size(200), glow(12)	—	Click ripple effect on interactions.
neon_glow	duration(2), color(#FF00FF)	—	Neon-colored glow border around the viewport.
success_checkmark	duration(1.2), color(#4CAF50), size(140), glow(20), symbol(✓)	—	Animated green checkmark overlay.
frosted_glass	duration(3), intensity(0.5)	—	Frosted glass blur overlay.
morphing_background	duration(5), colors([list])	—	Animated gradient background morphing.
matrix_rain	duration(5), color(#00FF41), density(0.05), speed(1.0)	—	Matrix-style falling green characters.
text_highlight	duration(2), color(#FFD700)	—	Highlighted text background animation.
text_scramble	duration(2), speed(50)	—	Text scramble/decode animation.
magnetic_hover	duration(3), intensity(0.5)	—	Magnetic attraction effect on hover.
tooltip_annotation	duration(3), text, color(#333)	—	Tooltip annotation popup.
progress_bar	duration(3), color(#4CAF50), position(top), intensity(4)	—	Animated progress bar filling horizontally.
countdown_timer	duration(5), color(#333), position(center)	—	Countdown circle timer overlay.
callout_arrow	duration(3), text, color(#FF6B6B), target_x(0.5), target_y(0.5)	—	Arrow callout pointing to coordinates.

Live Examplespotlight — radial gradient overlay

config.yaml

effects:
  - type: "spotlight"
    intensity: 0.8
    duration: 2.0

Live Examplehighlight — glowing box-shadow on hover

config.yaml

effects:
  - type: "highlight"
    color: "#FFD700"
    intensity: 0.9
    duration: 2.0

Live Exampleconfetti — falling particles

config.yaml

effects:
  - type: "confetti"
    duration: 2.0
    count: 200
    colors: ["#FF6B6B", "#4ECDC4", "#45B7D1", "#FFA07A"]
    speed_min: 2.0

Live Exampletypewriter — blinking caret on inputs

config.yaml

effects:
  - type: "typewriter"
    duration: 2.0

Live Exampleglow — inner box-shadow glow

config.yaml

effects:
  - type: "glow"
    color: "#6366f1"
    duration: 2.0

Live Exampleshockwave — expanding ring animation

config.yaml

effects:
  - type: "shockwave"
    duration: 1.0
    color: "#FF5722"
    max_size: 800
    glow: 20

Live Examplesparkle — golden sparkling dots

config.yaml

effects:
  - type: "sparkle"
    duration: 2.0
    count: 100
    color: "#FFD700"
    max_size: 10

Live Examplecursor_trail — trailing particles

config.yaml

effects:
  - type: "cursor_trail"
    duration: 2.0
    color: "#a855f7"
    size: 22
    glow: 14
    max_dots: 80

Live Exampleripple — click ripple effect

config.yaml

effects:
  - type: "ripple"
    duration: 2.0

Live Exampleneon_glow — vivid neon border

config.yaml

effects:
  - type: "neon_glow"
    color: "#FF00FF"
    duration: 2.0

Live Examplesuccess_checkmark — animated green ✓

config.yaml

effects:
  - type: "success_checkmark"
    duration: 2.0

Cursor Trail Variants

Six animated cursor trail styles — each follows mouse movement with a unique visual style. All are browser-injected effects.

Property	Type	Default	Description
cursor_trail_rainbow	duration(3), size(18), hue_step(12), glow(12), fade_duration(1.4), lifetime(2200)	—	Rainbow-colored dots cycling through hues.
cursor_trail_comet	duration(3), color(rgba(168,85,247,1)), glow_color, layers(4), size(22), size_step(3), fade_duration(0.8)	—	Comet tail with size gradient (3 particles per move).
cursor_trail_glow	duration(3), color(#00BFFF), size(36), glow_inner(24), glow_outer(48), fade_duration(1.5), lifetime(2000), scale_end(2.5)	—	Soft glowing trail with radial gradient and box-shadow.
cursor_trail_line	duration(3), color(rgba(168,85,247,1)), max_points(60), min_width(2), max_width(7)	—	Connected SVG line segments following the cursor.
cursor_trail_particles	duration(3), count(6), min_size(8), size_range(6), spread(35), hue_base(180), hue_range(60), glow(8), fade_delay(200), lifetime(1400)	—	Particle burst on each mouse move (5 per event).
cursor_trail_fire	duration(3), sparks(5), min_size(10), size_range(12), glow(10), hue_base(10), hue_range(40), fade_delay(300), lifetime(1500)	—	Warm orange/red fire sparks rising and fading.

Live Examplecursor_trail_rainbow — rainbow cycling dots

config.yaml

effects:
  - type: "cursor_trail_rainbow"
    duration: 3.0
    size: 22
    hue_step: 15
    glow: 16

Live Examplecursor_trail_comet — size gradient tail

config.yaml

effects:
  - type: "cursor_trail_comet"
    duration: 3.0
    color: "rgba(168,85,247,1)"
    layers: 5
    size: 26

Live Examplecursor_trail_glow — soft glowing trail

config.yaml

effects:
  - type: "cursor_trail_glow"
    color: "#00BFFF"
    duration: 3.0

Live Examplecursor_trail_line — connected SVG segments

config.yaml

effects:
  - type: "cursor_trail_line"
    duration: 3.0
    color: "rgba(168,85,247,1)"
    max_points: 80
    max_width: 10

Live Examplecursor_trail_particles — particle burst

config.yaml

effects:
  - type: "cursor_trail_particles"
    duration: 3.0
    count: 8
    spread: 45
    hue_base: 200
    hue_range: 80

Live Examplecursor_trail_fire — fire sparks

config.yaml

effects:
  - type: "cursor_trail_fire"
    duration: 3.0
    sparks: 8
    hue_base: 0
    hue_range: 50

Fun / Celebration Effects

Six celebration-style canvas overlays for joyful moments. All auto-cleanup after their animation completes.

Property	Type	Default	Description
emoji_rain	duration(4), count(60), min_size(22), size_range(20), speed_min(1.5), speed_range(2.5), emojis([🎉,🔥,❤️,⭐,🚀,💯])	—	Rain of emojis (🎉🔥❤️⭐🚀💯) falling from the top.
fireworks	duration(3), initial_rockets(8), launch_interval(1200), particles_per_rocket(50), particle_speed_min(1.5), particle_speed_range(4), gravity(0.05), fade_rate(0.012)	—	Rockets launching and exploding into colorful particles.
bubbles	duration(4), count(45), min_radius(10), max_radius(35), speed_min(0.5), speed_range(1.5), hue_base(180), hue_range(60)	—	Translucent bubbles rising with sinusoidal wobble.
snow	duration(5), count(120), min_radius(3), max_radius(8), color(rgba(200,230,255,0.85)), glow_color, glow(4), speed_min(0.8), speed_max(2.8)	—	Snowflakes drifting down with gentle wind drift.
star_burst	duration(3), count(80), speed_min(2), speed_range(5), hue_base(40), hue_range(60), decay(0.006)	—	5-pointed stars exploding from the center.
party_popper	duration(3), count(55), colors([list]), min_size(8), size_range(10), speed_min(4), speed_range(7), gravity(0.12), fade_rate(0.003)	—	Confetti shapes (rect/circle/triangle) from both bottom corners.

Live Exampleemoji_rain — falling emojis 🎉🔥⭐

config.yaml

effects:
  - type: "emoji_rain"
    duration: 4.0
    count: 80
    emojis: ["🎉", "🔥", "❤️", "⭐", "🚀", "💯", "🎊"]
    speed_min: 2.0

Live Examplefireworks — rockets and explosions 🎆

config.yaml

effects:
  - type: "fireworks"
    duration: 5.0
    initial_rockets: 12
    particles_per_rocket: 80
    launch_interval: 800

Live Examplebubbles — translucent rising bubbles

config.yaml

effects:
  - type: "bubbles"
    duration: 4.0

Live Examplesnow — drifting snowflakes ❄️

config.yaml

effects:
  - type: "snow"
    duration: 6.0
    count: 150
    min_radius: 2
    max_radius: 10
    speed_min: 0.5
    speed_max: 3.0

Live Examplestar_burst — exploding stars ⭐

config.yaml

effects:
  - type: "star_burst"
    duration: 3.0
    count: 100
    hue_base: 0
    hue_range: 360

Live Exampleparty_popper — corner confetti 🎊

config.yaml

effects:
  - type: "party_popper"
    duration: 4.0
    count: 80
    gravity: 0.15
    colors: ["#FF6B6B", "#4ECDC4", "#45B7D1", "#FFA07A"]

Post-Processing Effects (MoviePy)

These effects are applied to the video during the apply_effects pipeline stage.

Property	Type	Default	Description
parallax	duration, depth	—	Subtle zoom for a depth illusion.
zoom_pulse	duration, scale	—	Pulsing zoom in/out following a sine wave.
fade_in	duration	—	Clip fades in from black.
fade_out	duration	—	Clip fades out to black.
vignette	duration, intensity	—	Dark vignette border around the frame.
glitch	duration, intensity	—	Random horizontal slice displacement.
slide_in	duration, direction	—	Slide-in entrance animation (implemented as crossfade).

Combining effects

steps:
  - action: "click"
    locator: { type: "css", value: "#cta" }
    narration: "Click the call to action!"
    effects:
      - type: "highlight"
        color: "#FFD700"
        duration: 1.5
      - type: "confetti"
        duration: 2.0
      - type: "zoom_pulse"
        scale: 1.2
        duration: 1.0

ℹ️Browser effects execute before the step action. Multiple effects on one step are applied sequentially, each waiting for its duration before the next.

Live Example5 browser effects: spotlight, highlight, glow, neon_glow, checkmark

config.yaml

steps:
  - action: "navigate"
    url: "https://fran-cois.github.io/demodsl/"
    narration: "Effects are injected via JS during capture."
    wait: 2.0
    effects:
      - type: "spotlight"
        duration: 2.0
        intensity: 0.8
  - action: "scroll"
    direction: "down"
    pixels: 500
    narration: "Highlight adds a glowing box-shadow."
    effects:
      - type: "highlight"
        duration: 2.0
        color: "#FFD700"
  - action: "scroll"
    direction: "down"
    pixels: 500
    narration: "Glow creates an inner glow."
    effects:
      - type: "glow"
        duration: 2.0
        color: "#6366f1"
  - action: "scroll"
    direction: "down"
    pixels: 500
    narration: "Neon glow adds a vivid border."
    effects:
      - type: "neon_glow"
        duration: 2.0
        color: "#FF00FF"
  - action: "screenshot"
    narration: "Success checkmark overlay."
    effects:
      - type: "success_checkmark"
        duration: 2.0

Camera & Cinematic Effects

13 advanced camera and cinematic effects for professional-looking demos. These are all post-processing effects applied via MoviePy — they simulate real camera movements and cinematic grading on the rendered video.

Camera Movement Effects

Property	Type	Default	Description
drone_zoom	scale, target_x, target_y	—	Smooth progressive zoom towards a target point — simulates a drone descent.
ken_burns	scale, direction	—	Classic documentary pan + zoom (slow push with lateral drift).
zoom_to	scale, target_x, target_y	—	Zoom to a specific point and hold — great for highlighting UI elements.
dolly_zoom	intensity	—	Vertigo / dolly-zoom: zoom in while widening the crop.
elastic_zoom	scale	—	Zoom with elastic overshoot bounce (ease-out-back).
camera_shake	intensity, speed	—	Subtle camera shake / handheld feel.
whip_pan	direction	—	Fast horizontal/vertical pan with motion blur — great for transitions.
rotate	angle, speed	—	Gentle animated rotation — subtle tilt for dynamic feel.

Live Exampledrone_zoom — smooth descent towards a target

config.yaml

effects:
  - type: "drone_zoom"
    scale: 1.4
    target_x: 0.5   # center horizontally
    target_y: 0.3   # focus on upper third

Live Exampleken_burns — classic documentary pan + zoom

config.yaml

effects:
  - type: "ken_burns"
    scale: 1.15
    direction: "right"  # left, right, up, down

Live Examplezoom_to — zoom and hold on a UI element

config.yaml

effects:
  - type: "zoom_to"
    scale: 1.8
    target_x: 0.5
    target_y: 0.4

Live Exampledolly_zoom — dramatic vertigo effect

config.yaml

effects:
  - type: "dolly_zoom"
    intensity: 0.3

Live Exampleelastic_zoom — bouncy zoom with overshoot

config.yaml

effects:
  - type: "elastic_zoom"
    scale: 1.3

Live Examplecamera_shake — subtle handheld feel

config.yaml

effects:
  - type: "camera_shake"
    intensity: 0.3
    speed: 8.0

Live Examplewhip_pan — fast transition with motion blur

config.yaml

effects:
  - type: "whip_pan"
    direction: "right"  # left, right, up, down

Live Examplerotate — gentle animated tilt

config.yaml

effects:
  - type: "rotate"
    angle: 3.0    # degrees
    speed: 1.0    # oscillations per clip

Cinematic Effects

Property	Type	Default	Description
letterbox	ratio	—	Cinematic black bars (e.g. 2.35:1 cinemascope).
film_grain	intensity	—	Analog film grain overlay.
color_grade	preset	—	Color grading presets: warm, cool, desaturate, vintage, cinematic.
focus_pull	direction, intensity	—	Rack focus: transition from sharp to blurry (or reverse).
tilt_shift	intensity, focus_position	—	Miniature / tilt-shift: sharp band in center, blurred edges.

Live Exampleletterbox — cinematic 2.35:1 black bars

config.yaml

effects:
  - type: "letterbox"
    ratio: 2.35   # cinemascope

Live Examplefilm_grain — analog film texture

config.yaml

effects:
  - type: "film_grain"
    intensity: 0.3

Live Examplecolor_grade — cinematic color grading

config.yaml

effects:
  - type: "color_grade"
    preset: "cinematic"  # warm, cool, desaturate, vintage, cinematic

Live Examplefocus_pull — rack focus transition

config.yaml

effects:
  - type: "focus_pull"
    direction: "out"   # in = blur→sharp, out = sharp→blur
    intensity: 0.5

Live Exampletilt_shift — miniature effect

config.yaml

effects:
  - type: "tilt_shift"
    intensity: 0.6
    focus_position: 0.5  # 0.0=top, 0.5=center, 1.0=bottom

💡Combine camera effects for professional results: pair letterbox + color_grade + film_grain for a cinematic look, or drone_zoom + vignette for a dramatic reveal.

Full cinematic combo example

steps:
  - action: "navigate"
    url: "https://example.com"
    narration: "A cinematic reveal of our product."
    effects:
      - type: "drone_zoom"
        scale: 1.4
        target_x: 0.5
        target_y: 0.3
      - type: "letterbox"
        ratio: 2.35
      - type: "color_grade"
        preset: "cinematic"
      - type: "film_grain"
        intensity: 0.2
      - type: "vignette"
        intensity: 0.4

pipeline

The pipeline defines the post-processing chain using a Chain of Responsibility pattern. Each stage is a single-key dictionary. Stages execute in order, passing context to the next.

Each stage is either critical (failure stops the pipeline) or optional (failure is logged and skipped).

Property	Type	Default	Description
restore_audio	optional	{ denoise, normalize }	Audio restoration: noise removal, loudness normalization.
restore_video	optional	{ stabilize, sharpen }	Video restoration: stabilization, sharpening.
apply_effects	optional	{}	Apply post-processing visual effects from step definitions.
generate_narration	critical	{}	Generate TTS audio clips and sync to video timeline.
composite_avatar	optional	{}	Overlay avatar clips on the video. Requires avatar config in the scenario.
burn_subtitles	optional	{}	Burn ASS subtitles into the video. Requires subtitle config (top-level or per-scenario).
render_device_mockup	optional	{}	Overlay video into a 3D device frame.
edit_video	critical	{}	Apply intro, outro, transitions, and watermark.
mix_audio	critical	{}	Mix voice narration with background music (ducking).
optimize	critical	{ format, codec, quality, target_size_mb }	Final encoding, compression, and format export.
fit_duration	optional	{ target_duration, strategy, min_speed, max_speed }	Adjust video speed so that the final video matches a target duration.

Pipeline syntax

pipeline:
  # Each stage is a single-key dict
  - restore_audio:
      denoise: true
      normalize: true
  - restore_video:
      stabilize: true
      sharpen: true
  - apply_effects: {}
  - generate_narration: {}
  - composite_avatar: {}
  - burn_subtitles: {}
  - render_device_mockup: {}
  - edit_video: {}
  - mix_audio: {}
  - fit_duration:
      target_duration: 60
      strategy: "any"
  - optimize:
      format: "mp4"
      codec: "h264"
      quality: "high"
      target_size_mb: 50

optimize stage parameters

Property	Type	Default	Description
format	string	"mp4"	Output format: "mp4", "webm", "gif".
codec	string	"h264"	Video codec: "h264", "h265", "vp9", etc.
quality	string	"high"	Encoding quality: "low", "medium", "high".
target_size_mb	int \| null	null	Target file size in MB. Overrides quality if set.

⚠️Pipeline stages with {} (empty dict) use all defaults. Each stage dict must have exactly one key — multiple keys in a single dict will raise a validation error.

💡You can reorder stages or omit optional ones. A minimal pipeline might be just generate_narration, edit_video, mix_audio, and optimize.

fit_duration stage parameters

Automatically adjusts video playback speed so that the final video matches a given target_duration in seconds. Useful when you need to produce a demo that fits a specific time slot (e.g. a 60-second social clip or a 3-minute explainer).

Property	Type	Default	Description
target_duration	float	—	Required. Target duration in seconds.
strategy	"any" \| "speed_up" \| "slow_down"	"any"	Direction constraint. "any" allows both speed up and slow down.
min_speed	float	0.25	Minimum speed factor (prevents extreme slow-motion).
max_speed	float	4.0	Maximum speed factor (prevents unwatchable fast-forward).

fit_duration example

pipeline:
  - generate_narration: {}
  - edit_video: {}
  - mix_audio: {}
  - fit_duration:
      target_duration: 60       # make the video exactly 60 seconds
      strategy: "any"            # speed up or slow down as needed
      min_speed: 0.5             # never slower than 0.5x
      max_speed: 3.0             # never faster than 3x

ℹ️Place fit_duration after edit_video and speed but before optimize so that intro/outro and manual speed changes are applied first, then the whole video is time-fitted.

output

Defines output filenames, formats, thumbnail generation, and social media export presets.

Property	Type	Default	Description
filename	string	"output.mp4"	Main output filename.
directory	string	"output/"	Output directory path.
formats	string[]	["mp4"]	Export formats: "mp4", "webm", "gif".
thumbnails	Thumbnail[]	null	Auto-generated thumbnail frames.
social	SocialExport[]	null	Platform-specific export presets.

output.thumbnails

Property	Type	Default	Description
timestamp	float	—	Required. Time in seconds to capture the thumbnail.

Generate platform-optimized versions automatically. Each preset re-encodes the video with platform-specific constraints.

Property	Type	Default	Description
platform	string	—	Required. Platform name (for labeling).
resolution	string \| null	null	Output resolution (e.g. "1920x1080").
bitrate	string \| null	null	Target bitrate (e.g. "8000k").
aspect_ratio	string \| null	null	Crop to aspect ratio (e.g. "1:1", "9:16").
max_duration	int \| null	null	Maximum duration in seconds (trims end).
max_size_mb	int \| null	null	Maximum file size in MB.

Full output example

output:
  filename: "demo.mp4"
  directory: "output/"
  formats:
    - "mp4"
    - "webm"
    - "gif"
  thumbnails:
    - timestamp: 0.0
    - timestamp: 5.0
    - timestamp: 10.0
  social:
    - platform: "youtube"
      resolution: "1920x1080"
      bitrate: "8000k"
    - platform: "instagram"
      resolution: "1080x1080"
      aspect_ratio: "1:1"
      max_duration: 60
    - platform: "twitter"
      resolution: "1280x720"
      max_duration: 140
      max_size_mb: 15

analytics Beta

Optional engagement tracking metadata embedded in the output.

Property	Type	Default	Description
track_engagement	bool	false	Track viewer engagement metrics.
heatmap	bool	false	Generate click/attention heatmap data.
click_tracking	bool	false	Track interactive click positions.

analytics:
  track_engagement: true
  heatmap: true
  click_tracking: true

Pre-flight Checks

Before launching the recording browser, DemoDSL probes the URLs your scenario depends on so you find out why a recording shows a challenge or empty page before waiting for the full render. All pre-flight checks are advisory: they never abort the demo. Network failures (DNS, timeout, offline) are treated as "passing" so demos still run on locked-down machines.

Page accessibility (anti-bot / WAF detection)

Many production sites are fronted by anti-bot or WAF services that serve a JavaScript challenge or a 403/429/503 to non-browser clients. Recording such a page captures the challenge instead of your real UI. DemoDSL fetches each scenario.url and everynavigate step URL with a short GET probe (only the first 16 KB of the body) and inspects the response for known fingerprints. When a block is detected a WARNINGis logged with the protection name, the HTTP status and the matching signal — the demo still runs so the recording acts as documentation of the issue.

Detected protections include:

Property	Type	Default	Description
cloudflare	WAF	—	cf-ray, cf-mitigated, __cf_bm, “Just a moment…”, Turnstile, Attention Required.
datadome	Anti-bot	—	x-dd-b / x-datadome headers, datadome cookie, geo.captcha-delivery.com markers.
akamai	WAF / Bot Manager	—	AkamaiGHost server, ak_bmsc / _abck cookies, “Access Denied” reference page.
imperva	WAF (Incapsula)	—	X-Iinfo header, visid_incap_ / incap_ses_ cookies, “Incapsula incident id”.
aws-waf	WAF	—	x-amzn-waf-action header, aws-waf-token cookie, AWSWAFCaptcha page.
f5-shape	WAF / bot defense	—	BIG-IP cookie + block, “The requested URL was rejected”, Shape JS.
sucuri	WAF	—	Server: Sucuri/Cloudproxy, Sucuri firewall block page.
kasada	Anti-bot	—	x-kpsdk-ct header, /kpsdk/ markers.
perimeterx	Anti-bot (HUMAN)	—	_px3 / _pxhd cookies, px-captcha / _pxCaptcha markers.
captcha	Generic gate	—	hCaptcha, reCAPTCHA, Cloudflare Turnstile, Arkose/FunCaptcha interstitials.
—	HTTP status	—	401, 403, 404, 429, 451, 5xx with friendlier reason text.

Sample log output for a Cloudflare-protected URL:

WARNING [page-precheck] https://example.com → not accessible · protected by cloudflare · HTTP 403 · cf-mitigated: challenge — the recorded demo is likely to show a challenge or error page. Consider using a different URL, a fixture/screenshot, or running the demo on an allow-listed network.

Recommended workarounds when a URL is blocked:

Run the demo from an IP that is on the site's allow-list (office network, VPN).
Replace the live URL with a static fixture, a local mirror, or a pre-recorded screenshot.
For partner sites, ask for a User-Agent / IP exemption for the demo runner.
If the page is only blocked for the precheck (HEAD/GET) but loads in a real browser, the warning is harmless — the recording will still succeed.

You can call the probe programmatically:

from demodsl.page_precheck import probe_page_accessible, precheck_urls

result = probe_page_accessible("https://example.com")
if not result.accessible:
    print(result.format_warning())
    # → "https://example.com → not accessible · protected by cloudflare · HTTP 403 · cf-mitigated: challenge"

# Batch probe with built-in WARNING logging
precheck_urls(["https://a.example", "https://b.example"])

Iframe embeddability (secondary windows)

When a scenario uses background.secondary_windows[].urlto embed a live site behind the main browser, DemoDSL probes each URL for headers that block iframing:

X-Frame-Options: DENY or SAMEORIGIN
Content-Security-Policy: frame-ancestors with a restrictive value (other than *, http:, https:)

When a window's URL is blocked, DemoDSL automatically records a short headless clip of the page in an isolated Playwright instance and substitutes a muted, looping <video> overlay for the iframe. If recording fails (or the helper is disabled), the window falls back to its background_color /screenshot. This step happens before the main browser is launched to avoid running two Chromium processes concurrently on memory-constrained hosts.

Recordings are cached on disk (keyed by URL + dimensions), so repeat runs are fast.

CLI Reference

demodsl run

Parse and execute a DemoDSL config file.

demodsl run <config> [OPTIONS]

Arguments:
  config    Path to the YAML or JSON config file.

Options:
  -o, --output-dir PATH  Output directory (default: output/)
  --dry-run              Validate and log all steps without executing
  --skip-voice           Skip TTS generation (development mode)
  --turbo                Fast preview: minimal waits, skip heavy post-processing
  -v, --verbose          Enable debug logging

💡Use --dry-run to validate your config and preview all steps without launching a browser or calling TTS APIs.

Turbo mode

Add --turbo to generate a fast preview. All browser waits are clamped to 50 ms, and heavy post-processing passes are skipped: avatar compositing, 3D device rendering, subtitle burning, post-effects (freeze frames, speed ramps), global speed re-encode, and watermark overlay.

turbo preview

demodsl run demo.yaml --turbo

The YAML config itself does not change — turbo is purely a runtime flag. Define avatars, subtitles, and effects as usual, then iterate quickly with --turbo and remove it for the final high-quality render.

Property	Type	Default	Description
Skipped			Avatars, 3D rendering, subtitles, post-effects, speed re-encode, watermark
Kept			Browser recording, narration (TTS), browser effects, basic video editing
Waits			All time.sleep() pauses clamped to 50 ms

demodsl validate

Validate a config file without executing any actions.

demodsl validate <config> [OPTIONS]

Arguments:
  config    Path to the YAML or JSON config file.

Options:
  -v, --verbose    Enable debug logging

Outputs a summary: title, version, number of scenarios, total steps, and pipeline stage count. Exits with code 1 on validation failure.

demodsl init

Generate a minimal config template.

demodsl init [OPTIONS]

Options:
  -o, --output PATH   Output file (default: demo.yaml)
                      Use .json extension for JSON output.

Examples:
  demodsl init                    # Creates demo.yaml
  demodsl init -o my-demo.yaml   # Custom filename
  demodsl init -o demo.json      # JSON format

Edge Cases & Gotchas

Minimal Config

The smallest valid config requires only metadata.title. Everything else has defaults or is optional:

Minimal valid config

metadata:
  title: "Empty Demo"

This will validate successfully but produce no output (no scenarios, no pipeline).

YAML vs JSON Detection

File format is detected by file extension only: .json → JSON parser, anything else → YAML parser. If you name a JSON file config.yaml, it will fail to parse.

Voice Provider Fallback

If the configured TTS engine's API key is missing, DemoDSL falls back to DummyVoiceProvider which generates silent audio clips. The dummy calculates duration from word count at ~150 words per minute. This is intentional behavior for local development.

Pipeline Stage Format

Each pipeline entry must be a single-key dictionary. Multiple keys in one entry will raise a validation error:

❌ Invalid — multiple keys

pipeline:
  - restore_audio: { denoise: true }
    restore_video: { sharpen: true }  # Error!

✅ Valid — one key per entry

pipeline:
  - restore_audio: { denoise: true }
  - restore_video: { sharpen: true }

Critical vs Optional Stages

If a critical stage fails (generate_narration, edit_video, mix_audio, optimize), the entire pipeline stops and raises an error. If an optional stage fails (restore_audio, restore_video, apply_effects, render_device_mockup), it logs a warning and the pipeline continues.

Effect Execution Order

Browser effects are injected before the step action and execute sequentially. If an effect has a duration, the engine sleeps for that duration before the next effect. All effects complete before the browser action fires.

Required Fields by Action

Property	Type	Default	Description
navigate	url	—	Raises ValueError if url is missing.
click	locator	—	Raises ValueError if locator is missing.
type	locator + value	—	Raises ValueError if either is missing.
scroll	(none)	—	Defaults: direction="down", pixels=300.
wait_for	locator	—	Raises ValueError if locator is missing. Default timeout: 5s.
screenshot	(none)	—	Default filename: "screenshot.png".

Viewport and Recording

The browser records video at the viewport resolution. For high-quality social media exports, set the scenario viewport to the maximum resolution you need — downscaling social presets is better than upscaling.

Ducking Without Music

If audio.background_music is not set or the file doesn't exist, the mix_audio stage skips music mixing entirely. Narration-only audio still works.

Empty Pipeline

If pipeline is an empty list or omitted, no post-processing runs. Raw browser recordings are copied directly to the output directory.

Multiple Scenarios

Multiple scenarios are executed sequentially. Each gets its own browser instance. Currently, the pipeline processes only the first scenario's video. Multi-scenario concatenation is handled by the edit_video stage.

Dry Run Behavior

With --dry-run, the engine validates the config, logs every step and effect with [DRY-RUN] prefix, but does not:

Launch a browser
Call any TTS API
Execute pipeline stages
Produce any output files

Environment Variables

Property	Type	Default	Description
ELEVENLABS_API_KEY	string	—	ElevenLabs TTS API key.
OPENAI_API_KEY	string	—	OpenAI API key. Required for openai engine.
GOOGLE_APPLICATION_CREDENTIALS	string	—	Path to Google Cloud service account JSON file.
AZURE_SPEECH_KEY	string	—	Azure Cognitive Services Speech subscription key.
AZURE_SPEECH_REGION	string	"eastus"	Azure region (e.g. eastus, westeurope).
AWS_ACCESS_KEY_ID	string	—	AWS access key for Polly.
AWS_SECRET_ACCESS_KEY	string	—	AWS secret key for Polly.
AWS_DEFAULT_REGION	string	"us-east-1"	AWS region for Polly.
COSYVOICE_API_URL	string	"http://localhost:50000"	CosyVoice API server URL.
COQUI_MODEL	string	"xtts_v2"	Coqui TTS model name (default: xtts_v2).
COQUI_LANGUAGE	string	"en"	Language code for Coqui TTS.
PIPER_BIN	string	"piper"	Path to piper binary.
PIPER_MODEL	string	—	Required. Path to Piper .onnx voice model.
LOCAL_TTS_URL	string	"http://localhost:8000"	Base URL for OpenAI-compatible local TTS server.
LOCAL_TTS_API_KEY	string	"not-needed"	API key for local server (if required).
LOCAL_TTS_MODEL	string	"tts-1"	Model name to pass to local server.
ESPEAK_BIN	string	"espeak-ng"	Path to eSpeak-NG binary.
CUSTOM_TTS_URL	string	—	Required. Full URL of your custom TTS HTTP endpoint.
CUSTOM_TTS_API_KEY	string	—	Bearer token for custom TTS (optional).
CUSTOM_TTS_RESPONSE_FORMAT	string	"mp3"	Audio format returned by the endpoint: "mp3" or "wav".
D_ID_API_KEY	string	—	D-ID API key for talking-head avatar generation.
HEYGEN_API_KEY	string	—	HeyGen API key for avatar video generation.

If the required environment variable for the selected engine is not set, DemoDSL automatically falls back to DummyVoiceProvider which generates silent audio clips. This allows development without API credentials.

DemoDSL v2.7.0 — MIT License — GitHub

Documentation

Overview

Config Format

metadata

voice

Supported Engines

Voice IDs by Engine

Voice Cloning (reference_audio)

audio

audio.background_music

audio.voice_processing

audio.effects

audio.effects.compression

device_rendering Beta

video

video.intro

video.transitions

video.watermark

video.outro

video.optimization

Recording Quality

subtitle

Subtitle Styles

Style Demos

Speed Presets

languages

overview

per-step translations

per-language voices

embedded vs sidecar

CLI usage

scenarios

scenarios[].viewport

scenarios[].cursor

scenarios[].glow_select

scenarios[].popup_card

scenarios[].avatar

Animation Styles (free)

steps

Common Fields (all actions)

action: "navigate"

action: "click"

action: "type"

action: "scroll"

action: "wait_for"

action: "screenshot"

Locator Types

effects

Browser Effects (real-time JS injection)

Cursor Trail Variants

Fun / Celebration Effects

Post-Processing Effects (MoviePy)

Camera & Cinematic Effects

Camera Movement Effects

Cinematic Effects

pipeline

optimize stage parameters

fit_duration stage parameters

output

output.thumbnails

output.social

analytics Beta

Pre-flight Checks

Page accessibility (anti-bot / WAF detection)

Iframe embeddability (secondary windows)

CLI Reference

demodsl run

Turbo mode

demodsl validate

demodsl init

Edge Cases & Gotchas

Minimal Config

YAML vs JSON Detection

Voice Provider Fallback

Pipeline Stage Format

Critical vs Optional Stages

Effect Execution Order

Required Fields by Action

Viewport and Recording

Ducking Without Music