Documentation

Complete configuration reference for DemoDSL v2.7.0

Overview

DemoDSL is a DSL-driven automated product demo video generator. You describe your demo in a single YAML or JSON configuration file covering browser automation, voice narration, visual effects, video editing, audio mixing, and multi-format export. DemoDSL then orchestrates the full pipeline to produce a polished video.

A configuration file has 10 top-level sections. Only metadata is required β€” every other section is optional and has sensible defaults.

Root structure
metadata:        # REQUIRED β€” title, description, author, version
voice:           # TTS engine configuration
audio:           # Background music, voice processing, effects
device_rendering: # 3D device mockup settings
video:           # Intro, outro, transitions, watermark
subtitle:        # Subtitle overlay styles and timing
scenarios:       # Browser automation steps
pipeline:        # Post-processing chain
output:          # Export filenames, formats, social presets
analytics:       # Engagement tracking

Config Format

DemoDSL accepts both YAML (.yaml / .yml) and JSON (.json) configuration files. The format is auto-detected from the file extension.

demo.yaml
metadata:
  title: "My Demo"
scenarios:
  - name: "Tour"
    url: "https://example.com"
    steps:
      - action: "navigate"
        url: "https://example.com"
demo.json
{
  "metadata": {
    "title": "My Demo"
  },
  "scenarios": [{
    "name": "Tour",
    "url": "https://example.com",
    "steps": [{
      "action": "navigate",
      "url": "https://example.com"
    }]
  }]
}
πŸ’‘Use demodsl init to generate a YAML template, or demodsl init -o demo.json for JSON.
Live ExampleYAML / JSON format switching
config.yaml
scenarios:
  - name: "Tab Switching"
    url: "https://fran-cois.github.io/demodsl/"
    browser: "webkit"
    viewport: { width: 1280, height: 720 }
    steps:
      - action: "scroll"
        direction: "down"
        pixels: 1800
        narration: "Scroll to the code example section."
        wait: 2.0
      - action: "click"
        locator:
          type: "text"
          value: "JSON"
        narration: "Click JSON tab to see JSON format."
        wait: 2.5
      - action: "click"
        locator:
          type: "text"
          value: "YAML"
        narration: "Switch back to YAML."
        wait: 2.0

metadata

The only required top-level section. Provides descriptive information about the demo.

PropertyTypeDefaultDescription
titlestringβ€”Required. The demo title used in logs and output metadata.
descriptionstring | nullnullOptional description for documentation.
authorstring | nullnullAuthor name.
versionstring | nullnullVersion string (e.g. "2.0.0").
Minimal valid config
metadata:
  title: "My Demo"
ℹ️title is the only truly required field in the entire config. Every other section and property has defaults or is optional.

voice

Configures the Text-to-Speech engine used to generate narration audio from the narration field in steps.

PropertyTypeDefaultDescription
engine"elevenlabs" | "google" | "azure" | "aws_polly" | "openai" | "custom""elevenlabs"TTS provider to use.
voice_idstring"josh"Voice identifier. Provider-specific.
speedfloat1.0Playback speed multiplier (0.5 = half speed, 2.0 = double).
pitchint0Pitch adjustment in semitones.
reference_audiostringnullPath to a .wav/.mp3 sample of your voice for voice cloning. Supported by: elevenlabs, coqui, cosyvoice, custom.
Example
voice:
  engine: "elevenlabs"
  voice_id: "josh"
  speed: 1.0
  pitch: 0

Supported Engines

PropertyTypeDefaultDescription
elevenlabsβ€”β€”High-quality neural TTS. Requires ELEVENLABS_API_KEY.
openaiβ€”β€”OpenAI TTS (tts-1-hd). Voices: alloy, echo, fable, onyx, nova, shimmer. Requires OPENAI_API_KEY.
googleβ€”β€”Google Cloud TTS (Wavenet). Requires GOOGLE_APPLICATION_CREDENTIALS (service account JSON path).
azureβ€”β€”Azure Cognitive Services Speech (Neural). Requires AZURE_SPEECH_KEY + AZURE_SPEECH_REGION.
aws_pollyβ€”β€”Amazon Polly (Neural). Requires AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY.
cosyvoiceβ€”β€”CosyVoice (Alibaba/Qwen). Local server. COSYVOICE_API_URL (default localhost:50000).
coquiβ€”β€”Coqui XTTS v2. Local inference via TTS library. COQUI_MODEL to override model.
piperβ€”β€”Piper TTS. Fast offline TTS via CLI. Requires PIPER_MODEL (path to .onnx).
local_openaiβ€”β€”Any OpenAI-compatible local server (vLLM, LocalAI, AllTalk…). LOCAL_TTS_URL.
espeakβ€”β€”eSpeak-NG β€” robotic vintage voice. Zero-dependency debug TTS. ESPEAK_BIN to override binary.
gttsβ€”β€”Google Translate TTS (gTTS) β€” free, no API key. pip install gtts.

Voice IDs by Engine

Each engine uses its own voice naming convention. Set voice_id to a valid identifier for your chosen engine:

PropertyTypeDefaultDescription
elevenlabsvoice_id"josh"ElevenLabs voice ID. Find IDs at elevenlabs.io/voices.
openaivoice_id"alloy"One of: alloy, echo, fable, onyx, nova, shimmer.
googlevoice_id"en-US-Wavenet-D"Full voice name (e.g. "en-US-Wavenet-D", "fr-FR-Wavenet-A").
azurevoice_id"en-US-JennyNeural"Full voice name. Must contain "Neural" for neural voices.
aws_pollyvoice_id"Matthew"Polly voice name (capitalized). E.g. "Joanna", "Matthew", "LΓ©a".
cosyvoicevoice_id"δΈ­ζ–‡ε₯³"Speaker name supported by your CosyVoice model.
coquivoice_id"speaker.wav"Path to a reference .wav for voice cloning, or a built-in speaker name.
pipervoice_id"en_US-lessac-medium.onnx".onnx model path, or same as PIPER_MODEL.
local_openaivoice_id"alloy"Voice name supported by your local server.
espeakvoice_id"en"eSpeak voice/language code. E.g. "en", "fr", "de", "en+whisper".
gttsvoice_id"en"Language code (ISO 639-1). E.g. "en", "fr", "es", "ja".
customvoice_id"default"Any string. Passed as-is in the JSON body to your endpoint.
⚠️If no API key is found for the selected engine, DemoDSL automatically falls back to a DummyVoiceProvider that generates silent audio clips sized to match the narration text (~150 words per minute). This is useful for development and dry-runs.
Live ExamplegTTS voice narration synced to actions
config.yaml
voice:
  engine: "gtts"
  voice_id: "en"
  speed: 1.0

scenarios:
  - name: "Narrated Tour"
    steps:
      - action: "navigate"
        url: "https://fran-cois.github.io/demodsl/"
        narration: >
          Welcome to DemoDSL. Every step can include
          a narration field converted to speech.
        wait: 3.0
      - action: "scroll"
        direction: "down"
        pixels: 600
        narration: >
          DemoDSL supports twelve voice engines,
          from ElevenLabs to local Piper and eSpeak.
        wait: 3.0
Custom TTS endpoint
voice:
  engine: "custom"
  voice_id: "my-voice"
  speed: 1.0

# Environment variables:
#   CUSTOM_TTS_URL=https://my-tts-server.com/synthesize
#   CUSTOM_TTS_API_KEY=sk-...          (optional)
#   CUSTOM_TTS_RESPONSE_FORMAT=mp3     (mp3 or wav)
ℹ️The custom engine POSTs a JSON body {text, voice_id, speed, pitch} to your endpoint and expects raw audio bytes in the response. This lets you integrate any TTS service with a simple HTTP wrapper.

Voice Cloning (reference_audio)

Set reference_audio to a path to your own voice recording (.wav or .mp3) and DemoDSL will clone your voice on engines that support it. This way, the narration uses your voice instead of a stock voice.

PropertyTypeDefaultDescription
elevenlabsβœ“Instant Voice CloningUploads your sample via the Add Voice API. The cloned voice is cached for the session.
coquiβœ“XTTS v2 speaker_wavPasses reference audio directly to tts_to_file(speaker_wav=...). Zero-shot cloning.
cosyvoiceβœ“Zero-shot modeSends base64-encoded reference audio with mode="zero_shot" in the API payload.
customβœ“Forwarded in JSONAdds a base64-encoded reference_audio field to the JSON payload for your endpoint.
openaiβœ—Not supportedOpenAI TTS does not support voice cloning.
googleβœ—Not supportedGoogle Cloud TTS does not support voice cloning.
azureβœ—Not supportedAzure TTS does not support voice cloning.
aws_pollyβœ—Not supportedAmazon Polly does not support voice cloning.
piperβœ—Not supportedPiper uses pre-trained .onnx models.
espeakβœ—Not supportedeSpeak is a formant synthesizer.
gttsβœ—Not supportedgTTS uses Google Translate voices.
Voice cloning with Coqui XTTS
voice:
  engine: "coqui"
  voice_id: "default"
  reference_audio: "samples/my_voice.wav"
  speed: 1.0
Voice cloning with ElevenLabs
voice:
  engine: "elevenlabs"
  voice_id: "josh"          # fallback if cloning fails
  reference_audio: "samples/my_voice.wav"
  speed: 1.0
ℹ️When reference_audio is set on an unsupported engine, a warning is logged and the field is ignored. The narration still generates using the standard voice_id.

audio

Controls background music, voice processing, and audio effects applied during the mix_audio pipeline stage.

audio.background_music

PropertyTypeDefaultDescription
filestringβ€”Required. Path to the audio file (MP3, WAV, OGG).
volumefloat0.3Base volume (0.0–1.0). Converted to dB internally.
ducking_mode"none" | "light" | "moderate" | "heavy""moderate"Volume reduction during narration.
loopbooltrueLoop the music to cover the entire video duration.

Ducking modes control how much the background music volume drops when narration is playing:

PropertyTypeDefaultDescription
noneβ€”0 dBNo ducking β€” music stays at full volume.
lightβ€”βˆ’6 dBSubtle reduction. Music still audible.
moderateβ€”βˆ’12 dBBalanced. Default for most demos.
heavyβ€”βˆ’20 dBNear-silent music during speech.

audio.voice_processing

PropertyTypeDefaultDescription
normalizebooltrueNormalize audio loudness.
target_dbfsint-20Target loudness in dBFS (decibels relative to full scale).
remove_silencebooltrueStrip leading/trailing silence from clips.
silence_thresholdint-40dBFS below which audio is considered silence.
enhance_clarityboolfalseApply EQ boost to voice presence frequencies.
enhance_warmthboolfalseApply low-end EQ warmth to voice.
noise_reductionboolfalseRemove background noise from recordings.

audio.effects

PropertyTypeDefaultDescription
eq_presetstring | nullnullEQ preset name (e.g. "podcast", "broadcast").
reverb_presetstring | nullnullReverb preset (e.g. "small_room", "hall").
compressionCompression | nullnullDynamic range compression settings.

audio.effects.compression

PropertyTypeDefaultDescription
thresholdint-20Compression threshold in dB.
ratiofloat3.0Compression ratio (e.g. 3.0 = 3:1).
attackint5Attack time in milliseconds.
releaseint50Release time in milliseconds.
Full audio example
audio:
  background_music:
    file: "audio/bg.mp3"
    volume: 0.3
    ducking_mode: "moderate"
    loop: true
  voice_processing:
    normalize: true
    target_dbfs: -20
    noise_reduction: true
  effects:
    eq_preset: "podcast"
    reverb_preset: "small_room"
    compression:
      threshold: -20
      ratio: 3.0
      attack: 5
      release: 50

device_rendering Beta

Wraps the captured browser video inside a 3D device mockup frame, processed during the render_device_mockup pipeline stage.

PropertyTypeDefaultDescription
devicestring"iphone_15_pro"Device model name.
orientation"portrait" | "landscape""portrait"Screen orientation.
quality"low" | "medium" | "high""high"Render quality level.
render_engine"eevee" | "cycles""eevee"Blender render engine. Eevee is faster, Cycles is more realistic.
camera_animationstring"orbit_smooth"Camera movement type around the device.
lightingstring"studio"Lighting preset.
Example
device_rendering:
  device: "iphone_15_pro"
  orientation: "portrait"
  quality: "high"
  render_engine: "eevee"
  camera_animation: "orbit_smooth"
  lighting: "studio"
ℹ️The render_device_mockup pipeline stage is optional. If it fails (e.g. Blender not installed), the pipeline continues with the raw video.

video

Controls video editing: intro/outro sequences, transitions between steps, watermark overlay, and output optimization. Processed during the edit_video pipeline stage.

video.intro

PropertyTypeDefaultDescription
durationfloat3.0Intro duration in seconds.
typestring"fade_in"Animation type for the intro.
textstring | nullnullMain title text overlay.
subtitlestring | nullnullSubtitle text below the title.
font_sizeint60Font size in pixels.
font_colorstring"#FFFFFF"Font color (hex).
background_colorstring"#1a1a1a"Background color (hex).

video.transitions

PropertyTypeDefaultDescription
type"crossfade" | "slide" | "zoom" | "dissolve""crossfade"Transition style between steps.
durationfloat0.5Transition duration in seconds.

video.watermark

PropertyTypeDefaultDescription
imagestringβ€”Required. Path to the watermark image (PNG recommended).
position"top_left" | "top_right" | "bottom_left" | "bottom_right" | "center""bottom_right"Watermark position on the video.
opacityfloat0.7Watermark opacity (0.0–1.0).
sizeint100Watermark size in pixels (longest side).

video.outro

PropertyTypeDefaultDescription
durationfloat4.0Outro duration in seconds.
typestring"fade_out"Animation type for the outro.
textstring | nullnullMain text overlay.
subtitlestring | nullnullSubtitle text.
ctastring | nullnullCall-to-action text (e.g. "Get Started").

video.optimization

PropertyTypeDefaultDescription
target_size_mbint | nullnullTarget file size. Bitrate is auto-calculated.
web_optimizedbooltrueMove moov atom for fast web streaming start.
compression_level"low" | "balanced" | "high""balanced"Encoding compression preset.
Full video example
video:
  intro:
    duration: 3.0
    type: "fade_in"
    text: "Product Name"
    subtitle: "v2.0"
    font_size: 60
    font_color: "#FFFFFF"
    background_color: "#1a1a1a"
  transitions:
    type: "crossfade"
    duration: 0.5
  watermark:
    image: "logo.png"
    position: "bottom_right"
    opacity: 0.7
    size: 100
  outro:
    duration: 4.0
    type: "fade_out"
    text: "Try it today!"
    cta: "Get Started"
  optimization:
    target_size_mb: 50
    web_optimized: true
    compression_level: "balanced"

Recording Quality

DemoDSL uses two recording backends depending on the browser. When browser: "chrome" is set, a high-quality CDP screenshot pipeline captures frames via a direct DevTools Protocol connection β€” completely bypassing Playwright's low-bitrate VP8 screencast. For WebKit and Firefox, an spp + hqdn3d deblocking filter is applied during export to smooth VP8 artefacts.

Native VP8 (webkit)~330 KB
CDP H.264 (chrome)~70 KB
Native (VP8)CDP (H.264)
Recording methodVP8 screencastCDP screenshots
VP8 artefactsYes (deblocked)None
File size~330 KB~70 KB
Total time~13 s~13 s
Supported browsersAllChromium only
πŸ’‘Set browser: "chrome" in your scenario to automatically use CDP recording β€” no extra config needed. WebKit and Firefox fall back to native VP8 with post-processing deblocking.

subtitle

Burns styled subtitles into the video, synced word-by-word to narration timing. Subtitles are generated as ASS files and composited via ffmpeg. Can be set at the top level (applies to all scenarios) or per-scenario.

PropertyTypeDefaultDescription
enabledbooltrueEnable subtitle overlay.
style"classic" | "tiktok" | "color" | "word_by_word" | "typewriter" | "karaoke" | "bounce" | "cinema" | "highlight_line" | "fade_word" | "emoji_react""classic"Subtitle display style (see table below).
speed"slow" | "normal" | "fast" | "tiktok""normal"Display speed preset β€” controls words per second.
font_sizeint48Font size in pixels.
font_familystring"Arial"Font family name.
font_colorstring"#FFFFFF"Primary text color (hex).
background_colorstring"rgba(0,0,0,0.6)"Background fill behind text (hex or rgba).
position"bottom" | "center" | "top""bottom"Vertical position on screen.
highlight_colorstring"#FFD700"Accent color for highlighted words.
max_words_per_lineint8Maximum words per subtitle line.
animation"none" | "fade" | "pop" | "slide""none"Text entrance animation.

Subtitle Styles

Each style preset configures defaults for font size, position, colors, and animation. User values always override the preset.

PropertyTypeDefaultDescription
classic42px, bottom, white on dark boxβ€”Traditional subtitle bar at the bottom. Clean, readable.
tiktok64px, center, bold word-by-wordβ€”Large centered text, one highlighted word at a time. Social media style.
color48px, bottom, word highlightβ€”Full line visible, current word changes to accent color.
word_by_word56px, center, single wordβ€”One word at a time, centered. Maximum emphasis.
typewriter44px, bottom, green on blackβ€”Characters appear letter by letter. Terminal/hacker aesthetic.
karaoke52px, bottom, progressive fillβ€”Words fill with color progressively, karaoke-bar style.
bounce60px, center, scale animationβ€”Words pop in with a bounce scale effect (120% β†’ 100%).
cinema38px, bottom, italic serifβ€”Elegant italic serif font with shadow. Film subtitle look.
highlight_line46px, bottom, dim/brightβ€”Current line is bright white, rest stays dimmed gray.
fade_word50px, center, fade-inβ€”Each word fades in with a smooth alpha transition.
emoji_react52px, bottom, emoji prefixβ€”Auto-picks a contextual emoji based on narration keywords.

Style Demos

Each video below shows a subtitle style in action on short sample narration text.

Live Exampleclassic β€” traditional bottom bar
config.yaml
subtitle:
  style: "classic"
  speed: "normal"
  font_size: 42
  position: "bottom"
Live Exampletiktok β€” bold centered word-by-word
config.yaml
subtitle:
  style: "tiktok"
  speed: "fast"
  font_size: 64
  position: "center"
  highlight_color: "#FFD700"
Live Examplecolor β€” current word highlight
config.yaml
subtitle:
  style: "color"
  speed: "normal"
  highlight_color: "#00FF88"
Live Exampleword_by_word β€” one word at a time
config.yaml
subtitle:
  style: "word_by_word"
  speed: "normal"
  font_size: 56
  position: "center"
Live Exampletypewriter β€” letter-by-letter reveal
config.yaml
subtitle:
  style: "typewriter"
  font_color: "#00FF00"
  background_color: "rgba(0,0,0,0.8)"
Live Examplekaraoke β€” progressive color fill
config.yaml
subtitle:
  style: "karaoke"
  highlight_color: "#FF4444"
  position: "bottom"
Live Examplebounce β€” scale-pop animation
config.yaml
subtitle:
  style: "bounce"
  font_size: 60
  position: "center"
Live Examplecinema β€” italic serif with shadow
config.yaml
subtitle:
  style: "cinema"
  font_family: "Georgia"
  font_size: 38
Live Examplehighlight_line β€” dim/bright current line
config.yaml
subtitle:
  style: "highlight_line"
  highlight_color: "#FFFFFF"
  font_color: "#888888"
Live Examplefade_word β€” smooth alpha fade-in
config.yaml
subtitle:
  style: "fade_word"
  font_size: 50
  position: "center"
Live Exampleemoji_react β€” contextual emoji prefix
config.yaml
subtitle:
  style: "emoji_react"
  font_size: 52
  highlight_color: "#FFD700"

Speed Presets

PropertyTypeDefaultDescription
slow1.5 wpsβ€”Slow pace β€” good for technical content or tutorials.
normal2.5 wpsβ€”Standard reading pace.
fast4.0 wpsβ€”Fast pace for experienced viewers.
tiktok6.0 wpsβ€”Very fast β€” matches TikTok/Reels pacing.
Top-level subtitle (all scenarios)
subtitle:
  enabled: true
  style: "tiktok"
  speed: "fast"
  font_size: 64
  highlight_color: "#FFD700"
  position: "center"

scenarios:
  - name: "Demo"
    url: "https://myapp.com"
    steps:
      - action: "navigate"
        url: "https://myapp.com"
        narration: "This text becomes a subtitle!"

pipeline:
  - generate_narration: {}
  - burn_subtitles: {}
  - edit_video: {}
Per-scenario subtitle override
scenarios:
  - name: "Intro"
    url: "https://myapp.com"
    subtitle:
      enabled: true
      style: "cinema"
      speed: "slow"
      font_family: "Georgia"
    steps:
      - action: "navigate"
        url: "https://myapp.com"
        narration: "An elegant introduction."
  - name: "Features"
    subtitle:
      style: "bounce"
      speed: "fast"
    steps:
      - action: "scroll"
        direction: "down"
        pixels: 500
        narration: "Fast-paced feature showcase!"
πŸ’‘Add burn_subtitles: {} to your pipeline to enable subtitle rendering. Subtitles are generated from the narration field of each step β€” no separate subtitle file needed.
ℹ️The emoji_react style automatically picks emojis based on narration keywords: πŸ‘† for "click", πŸ“œ for "scroll", ⚑ for "fast", 🎬 for "video", and more. A πŸ’¬ default is used when no keyword matches.

languages

Generate multi-track audio narration and multi-language subtitles in a single render. The same scenario is recorded once, then narration is synthesised in every requested language and either embedded as additional tracks in the final MP4 or written as sidecar files (narration_{lang}.mp3, subtitles_{lang}.ass).

overview

PropertyTypeDefaultDescription
defaultstring"en"Source language used by step narration: fields. BCP-47 (e.g. en, fr, en-US).
targetslist[string][]Additional languages to render.
voicesdict[str, VoiceConfig]{}Optional per-language voice override (engine, voice_id, etc.).
embedbooltrueWhen true, mux all audio + subtitle tracks into a single MP4. When false, write sidecar files next to the output.
burn_defaultboolfalseWhen true, also burn the default-language subtitles into the picture (useful for social clips).
audio_onlyboolfalseOnly generate per-language audio tracks (no subtitle tracks).
subtitle_onlyboolfalseOnly generate per-language subtitle tracks (no audio tracks).
demo_multilang.yaml
metadata:
  title: Multilang demo

voice:
  engine: gtts
  voice_id: fr

languages:
  default: fr
  targets: [en, de]
  embed: true
  voices:
    en: { engine: gtts, voice_id: en }
    de: { engine: gtts, voice_id: de }

scenarios:
  - name: tour
    url: https://example.com
    steps:
      - narration: Bienvenue sur notre site.
        narrations:
          en: Welcome to our website.
          de: Willkommen auf unserer Website.
        action: scroll
        amount: 400

per-step translations

Each step gains an optional narrations mapping. Keys are BCP-47 language codes; values are the translated narration text. When a translation is missing, the engine falls back to the basenarration field so a partial translation never blocks the render.

steps:
  - narration: Cliquez ici pour commencer.
    narrations:
      en: Click here to get started.
      de: Klicken Sie hier, um zu beginnen.
    action: click
    target: "#start"

per-language voices

The languages.voices map lets each language use its own TTS engine, voice id, speed, etc. Any field omitted in the override inherits from the top-level voice block.

voice:
  engine: elevenlabs
  voice_id: french_voice_id

languages:
  default: fr
  targets: [en, ja]
  voices:
    en:
      engine: elevenlabs
      voice_id: english_voice_id
    ja:
      engine: openai
      voice_id: nova

embedded vs sidecar

With embed: true (default), the final MP4 contains one AAC audio track per language (with proper language= metadata, default-disposition on the source language) and one mov_text subtitle track per language. Players such as VLC, QuickTime, YouTube and Vimeo expose them as selectable tracks.

With embed: false, the engine still produces a single MP4 (default-language audio burnt-in) plus sidecar files:narration_en.mp3, subtitles_en.ass, and so on for each target language. Useful when uploading to platforms that require external caption files.

ℹ️When languages is active, the regular subtitle burn-in is skipped to keep the picture clean β€” set languages.burn_default: true to re-enable burning of the default-language subtitles.

CLI usage

# Standard render β€” picks up languages: from the YAML
demodsl run demo_multilang.yaml

# Inspect the planned tracks without rendering
demodsl run demo_multilang.yaml --dry-run
⚠️ffmpeg is required for multi-track muxing. If muxing fails, the engine gracefully falls back to a single-track export.

scenarios

A list of browser automation scenarios. Each scenario captures a recording from a web application. Multiple scenarios are concatenated in the final video.

Live ExampleTwo scenarios in one config
config.yaml
scenarios:
  - name: "Landing Page Overview"
    url: "https://fran-cois.github.io/demodsl/"
    browser: "webkit"
    steps:
      - action: "navigate"
        url: "https://fran-cois.github.io/demodsl/"
        narration: "Scenario one: the landing page."
        wait: 2.0
      - action: "scroll"
        direction: "down"
        pixels: 800
        narration: "Scroll through features."
        wait: 2.0

  - name: "Docs Deep Dive"
    url: "https://fran-cois.github.io/demodsl/docs"
    browser: "webkit"
    steps:
      - action: "navigate"
        url: "https://fran-cois.github.io/demodsl/docs"
        narration: "Scenario two: the docs page."
        wait: 2.0
PropertyTypeDefaultDescription
namestringβ€”Required. Human-readable scenario name.
urlstringβ€”Required. Base URL for the scenario.
browser"chrome" | "firefox" | "webkit""chrome"Browser engine (Playwright).
viewportViewport1920Γ—1080Browser viewport dimensions.
cursorCursorConfignullVisible cursor overlay mode. Shows mouse movement and click effects.
glow_selectGlowSelectConfignullApple Intelligence-style animated glow highlight around clicked elements.
popup_cardPopupCardConfignullPopup card overlay synced with narration. Shows text and progressive item reveals.
avatarAvatarConfignullAnimated avatar overlay synced with narration audio. Free (animated) or paid (D-ID, HeyGen) providers.
subtitleSubtitleConfignullSubtitle overlay config (per-scenario override). Overrides top-level subtitle settings.
stepsStep[][]List of automation steps.

scenarios[].viewport

PropertyTypeDefaultDescription
widthint1920Viewport width in pixels.
heightint1080Viewport height in pixels.
πŸ’‘Common viewport sizes: 1920Γ—1080 (Full HD), 1280Γ—720 (HD), 390Γ—844 (iPhone 14), 1024Γ—768 (tablet).
Example
scenarios:
  - name: "Main Demo"
    url: "https://myapp.com"
    browser: "chrome"
    viewport:
      width: 1920
      height: 1080
    steps:
      - action: "navigate"
        url: "https://myapp.com"

scenarios[].cursor

Injects a visible fake cursor overlay captured in the recorded video. The cursor animates towards each target element before click/type actions and plays a visual effect on click.

PropertyTypeDefaultDescription
visiblebooltrueWhether the cursor is shown.
style"dot" | "pointer""dot"Cursor shape. Dot = circle, pointer = arrow SVG.
colorstring"#ef4444"Cursor color (hex).
sizeint20Cursor size in pixels.
click_effect"ripple" | "pulse" | "none""ripple"Visual effect on click.
smoothfloat0.4Animation duration in seconds (ease-out).
Live ExampleCursor overlay β€” visible mouse movement + click ripple
config.yaml
scenarios:
  - name: "Cursor Showcase"
    url: "https://fran-cois.github.io/demodsl/"
    browser: "webkit"
    cursor:
      visible: true
      style: "dot"
      color: "#ef4444"
      size: 20
      click_effect: "ripple"
      smooth: 0.4
    steps:
      - action: "click"
        locator:
          type: "text"
          value: "Get Started"
        narration: "Cursor moves to the button and clicks."
        wait: 2.0
      - action: "click"
        locator:
          type: "text"
          value: "Documentation"
        narration: "Smooth animation to each target."
        wait: 2.0

scenarios[].glow_select

Apple Intelligence-style animated gradient glow that highlights elements before click and type actions. The glow pulses with a rotating hue and fades out after the action.

PropertyTypeDefaultDescription
enabledbooltrueWhether glow-select is active.
colorsstring[]["#a855f7","#6366f1","#ec4899","#a855f7"]Gradient color stops for the glow border.
durationfloat0.8Hue rotation cycle duration in seconds.
paddingint8Extra padding around the element bounding box.
border_radiusint12Border radius of the glow overlay.
intensityfloat0.9Glow opacity (0–1).
πŸ’‘Combine cursor and glow_select for a polished demo experience. The cursor animates into the glowing element, then clicks.
Live ExampleGlow select β€” Apple Intelligence-style highlight on click
config.yaml
scenarios:
  - name: "Glow Select Showcase"
    url: "https://fran-cois.github.io/demodsl/"
    browser: "webkit"
    cursor:
      style: "dot"
      color: "#a855f7"
    glow_select:
      enabled: true
      colors: ["#a855f7","#6366f1","#ec4899","#a855f7"]
      duration: 0.8
      padding: 8
      border_radius: 12
    steps:
      - action: "click"
        locator:
          type: "text"
          value: "Get Started"
        narration: "Glow appears around the button."
        wait: 2.0
      - action: "click"
        locator:
          type: "text"
          value: "Documentation"
        narration: "Each element gets the glow treatment."
        wait: 2.0

scenarios[].popup_card

The popup_card mode injects styled overlay cards that appear synced with narration. When a step has a card field with a list of items, they are revealed progressively β€” each bullet appears one by one, timed to match the narrator.

PropertyTypeDefaultDescription
enabledbooleantrueEnable the popup card overlay.
position"bottom-right" | "bottom-left" | "top-right" | "top-left" | "bottom-center" | "top-center""bottom-right"Card position on screen.
theme"glass" | "dark" | "light" | "gradient""glass"Visual theme for the card.
max_widthnumber420Maximum card width in pixels.
animation"slide" | "fade" | "scale""slide"Entrance/exit animation style.
accent_colorstring"#818cf8"Accent color for bullets and progress bar.
show_iconbooleantrueShow emoji icon in the card header.
show_progressbooleantrueShow a progress bar synced with narration duration.

Each step can include a card object with:

PropertyTypeDefaultDescription
card.titlestringnullCard title text.
card.bodystringnullCard body/description text.
card.itemsstring[]nullBullet-point list. Revealed progressively when narration is present.
card.iconstringnullEmoji or short text shown in the header (e.g. "πŸš€").
Live ExamplePopup cards β€” synced text overlays with progressive item reveal
config.yaml
scenarios:
  - name: "Card Overlay Tour"
    url: "https://fran-cois.github.io/demodsl/"
    browser: "webkit"
    popup_card:
      enabled: true
      position: "bottom-right"
      theme: "glass"
      animation: "slide"
    steps:
      - action: "navigate"
        url: "https://fran-cois.github.io/demodsl/"
        narration: "Welcome to DemoDSL."
        card:
          title: "DemoDSL"
          body: "A DSL-driven automated demo generator."
          icon: "🎬"
      - action: "scroll"
        direction: "down"
        pixels: 600
        narration: "Six integrated phases."
        card:
          title: "Six Phases"
          icon: "⚑"
          items:
            - "Browser Automation"
            - "Voice Narration"
            - "Visual Effects"
            - "Video Composition"
            - "Audio Mixing"
            - "Multi-format Export"

scenarios[].avatar

An animated avatar overlay that reacts to narration audio in real time. The avatar lip-syncs to TTS amplitude and is composited on top of the video at the chosen corner. Two provider types are available: animated (free, Pillow-generated) and API-based (D-ID, HeyGen, SadTalker β€” paid or self-hosted).

PropertyTypeDefaultDescription
enabledbooltrueWhether the avatar overlay is active.
provider"animated" | "d-id" | "heygen" | "sadtalker""animated"Avatar generation engine. Animated is free, others require an API key.
imagestring | nullnullPath, URL (http/https), or preset name ("default", "robot", "circle"). URLs are downloaded and cached locally.
position"bottom-right" | "bottom-left" | "top-right" | "top-left""bottom-right"Corner position of the avatar on the video.
sizeint120Avatar diameter in pixels.
style"bounce" | "waveform" | "pulse" | "equalizer" | "xp_bliss" | "clippy" | "visualizer""bounce"Animation style (animated provider only). See table below.
shape"circle" | "rounded" | "square""circle"Avatar outline shape.
backgroundstring"rgba(0,0,0,0.5)"Background fill behind the avatar (CSS color or rgba).
background_shape"square" | "circle" | "rounded""square"Shape of the avatar background. Use circle for a fully round overlay.
api_keystring | nullnullAPI key for paid providers. Supports env-var syntax: "${D_ID_API_KEY}".
show_subtitleboolfalseDisplay narration text below the avatar box during playback.
subtitle_font_sizeint18Font size for the avatar subtitle text.
subtitle_font_colorstring"#FFFFFF"Font color for the avatar subtitle.
subtitle_bg_colorstring"rgba(0,0,0,0.7)"Background color for the avatar subtitle box.

Animation Styles (free)

These styles are available with the animated provider. Each generates a different visual animation from the narration audio waveform.

PropertyTypeDefaultDescription
bounceβ€”β€”A circle that scales up and down with audio amplitude. Simple and clean.
waveformβ€”β€”Radial wave ring that expands from the center with audio pulses.
pulseβ€”β€”Glowing disc with a pulsing aura effect. Subtle and professional.
equalizerβ€”β€”Neon equalizer bars (Windows XP era). Retro audio visualizer look.
xp_blissβ€”β€”Windows XP Bliss-inspired hills, sun and floating music notes.
clippyβ€”β€”Animated paperclip with googly eyes. A nostalgic Microsoft Office mascot.
visualizerβ€”β€”Circular spectrum analyzer with rainbow gradient bars.
pacmanβ€”β€”Pac-Man chomping dots with a colorful ghost. Arcade nostalgia.
space_invaderβ€”β€”Pixel-art Space Invaders alien with shields and cannon. Retro arcade.
mario_blockβ€”β€”Bouncing Mario "?" block that pops coins on loud audio. Iconic gaming.
nyan_catβ€”β€”Pixel-art cat on a rainbow trail with scrolling stars. Internet classic.
matrixβ€”β€”Cascading green Matrix code rain with avatar in the center.
pickle_rickβ€”β€”Pickle Rick with rat limbs, expressive eyes, and yelling mouth. Wubba lubba dub dub!
chrome_dinoβ€”β€”Chrome's offline T-Rex dinosaur with desert, cacti, and 'No internet' message.
marvinβ€”β€”Marvin the Paranoid Android with sad eyes and depressive quotes. H2G2 classic.
mac128kβ€”β€”Macintosh 128K with expressive face on green screen. Retro computing icon.
floppy_diskβ€”β€”3.5" floppy disk with face, label, and '1.44 MB' nostalgia.
bsodβ€”β€”Blue Screen of Death with progressive error text and sad :( emoticon.
bugdroidβ€”β€”Android's green Bugdroid robot with waving arms and antennae.
qr_codeβ€”β€”QR code pattern with expressive eyes in the center. 'SCAN ME!'
gpu_sweatβ€”β€”Sweating GPU with spinning fan, temperature display, and sweat drops.
rubber_duckβ€”β€”Yellow rubber duck debugging companion with judgmental speech bubbles.
fail_whaleβ€”β€”Twitter's Fail Whale carried by birds. 'Twitter is over capacity.'
server_rackβ€”β€”Overheating server rack with red eyes, smoke, blinking LEDs, and temp bar.
cursor_handβ€”β€”Windows pointing hand cursor that bosses you around. 'Click here!'
vhs_tapeβ€”β€”VHS cassette with spinning reels, label, and scanlines. 'Be kind, rewind!'
cloudβ€”β€”Cute but capricious cloud with rain, lightning, and data ownership jokes.
wifi_lowβ€”β€”Wi-Fi icon with one bar that stutters and cuts off mid-senβ€”
nokia3310β€”β€”The indestructible Nokia 3310 with Snake and warrior quotes.
cookieβ€”β€”Browser cookie with creepy eyes that knows your browsing habits.
modem56kβ€”β€”56k modem with blinking LEDs, dial-up sounds, and green waveform.
esc_keyβ€”β€”Panicked Escape key trying to break free β€” sweat drops & frantic quotes.
sad_macβ€”β€”Classic dead Macintosh with X-eyed icon, error codes & hardware trauma.
usb_cableβ€”β€”Tangled USB-A cable frustrated by 3-try insertion. Always wrong side.
hourglassβ€”β€”Windows hourglass that speaks very slowly while sand trickles down.
firewireβ€”β€”Forgotten FireWire 400 cable living in a drawer, reminiscing glory days.
ai_hallucinatedβ€”β€”Glitching robot mixing facts with recipes β€” spiral eye & glitch lines.
tamagotchiβ€”β€”Abandoned pixel egg pet asking why you haven't fed it since 1998.
lasso_toolβ€”β€”Obsessive Photoshop selection tool with marching ants on checkerboard.
battery_lowβ€”β€”Battery at 1% β€” red, blinking, talks fast then cuts off abruptly.
incognitoβ€”β€”Chrome Incognito detective with fedora & glasses. Sees nothing.
rainbow_wheelβ€”β€”Mac spinning rainbow wheel β€” hypnotic, unstoppable, rage-inducing.
error_404β€”β€”Lost 404 page wandering around with question marks, literally unfindable.
google_blobβ€”β€”Google’s old melted blob emoji, nostalgic for its expressive past.
bitβ€”β€”Binary bit (0/1) with matrix rain β€” answers only Yes or No.
pc_fanβ€”β€”Spinning PC fan screaming at full RPM when you open 3 Chrome tabs.
captchaβ€”β€”Twisted, illegible CAPTCHA yelling PROVE YOU’RE HUMAN!
bluetoothβ€”β€”Bluetooth logo desperately searching, always failing to pair.
registry_keyβ€”β€”Windows Registry key β€” bureaucratic folder controlling everything.
high_pingβ€”β€”999ms ping avatar with buffering spinner, responds 10 sec late.
scratched_cdβ€”β€”Scratched CD-ROM with rainbow reflections, st-st-stuttering speech.
kermitβ€”β€”Kermit sipping tea β€” but that's none of my business.
this_is_fineβ€”β€”Dog sitting in flames saying 'This is fine.'
trollfaceβ€”β€”Classic Trollface with a mocking grin β€” Problem?
no_idea_dogβ€”β€”Golden retriever at a computer β€” I have no idea what I'm doing.
surprised_pikachuβ€”β€”Pikachu with open mouth β€” feigned surprise at the obvious.
distracted_bfβ€”β€”Distracted boyfriend looking at the shiny new framework.
success_kidβ€”β€”Kid with clenched fist celebrating small victories.
expanding_brainβ€”β€”Luminous expanding brain β€” transcended enlightenment.
dogeβ€”β€”Shiba Inu with floating 'such wow' 'much code' words.
wiki_globeβ€”β€”Wikipedia puzzle globe with glasses β€” [citation needed].

bounce

Scales up/down with audio

waveform

Radial wave ring

pulse

Glowing aura effect

equalizer

Neon retro bars

xp_bliss

Windows XP hills & notes

clippy

Animated paperclip mascot

visualizer

Circular spectrum analyzer

pacman

Arcade chomper & ghost

space_invader

Pixel-art alien arcade

mario_block

Bouncing "?" block with coins

nyan_cat

Rainbow trail pixel cat

matrix

Cascading green code rain

pickle_rick

Pickle Rick with rat limbs

chrome_dino

Chrome's offline T-Rex

marvin

Paranoid Android, depressive quotes

mac128k

Macintosh 128K retro green screen

floppy_disk

3.5" floppy with 1.44 MB nostalgia

bsod

Blue Screen of Death :(

bugdroid

Android's green robot

qr_code

QR code pattern β€” SCAN ME!

gpu_sweat

Sweating GPU with spinning fan

rubber_duck

Debugging companion duck

fail_whale

Twitter's over capacity whale

server_rack

Overheating server with smoke

cursor_hand

Bossy pointing hand cursor

vhs_tape

VHS cassette β€” Be kind, rewind!

cloud

Cute capricious cloud with rain

wifi_low

One bar Wi-Fi, cuts off mid-senβ€”

nokia3310

Indestructible Nokia with Snake

cookie

Creepy browser cookie that knows all

modem56k

56k modem β€” psshhh-kkkk-ding

esc_key

Panicked Esc key β€” LET ME OUT!

sad_mac

Dead Macintosh with X eyes

usb_cable

Tangled USB β€” wrong side, again

hourglass

Slow hourglass β€” Please… wait…

firewire

Forgotten cable in a drawer

ai_hallucinated

Glitching robot mixing facts

tamagotchi

Abandoned pet since 1998

lasso_tool

Obsessive selection tool

battery_low

1% battery β€” dying fast

incognito

Chrome detective sees nothing

rainbow_wheel

Mac spinning wheel of doom

error_404

Lost page, literally unfindable

google_blob

Old melted blob emoji, nostalgic

bit

Binary 0/1 β€” answers Yes or No

pc_fan

Screaming fan β€” MAX RPM!

captcha

PROVE YOU'RE HUMAN!

bluetooth

Desperately searching, pairing failed

registry_key

Bureaucratic folder, controls all

high_ping

999ms β€” responds 10 sec late

scratched_cd

Sk-sk-skip! Stuttering CD

kermit

None of my business… *sips tea*

this_is_fine

Dog in flames β€” everything is fine

trollface

Problem? U mad bro?

no_idea_dog

I have no idea what I'm doing

surprised_pikachu

Feigned surprise :O

distracted_bf

Looking at the new framework

success_kid

Fist pump! It compiled!

expanding_brain

Transcended the codebase

doge

Such code. Much wow. Very deploy.

wiki_globe

[citation needed]

Free animated avatar (equalizer)
scenarios:
  - name: "Demo with Avatar"
    url: "https://myapp.com"
    avatar:
      enabled: true
      provider: "animated"
      style: "equalizer"
      position: "bottom-right"
      size: 100
      shape: "circle"
      background: "rgba(0,0,0,0.6)"
    steps:
      - action: "navigate"
        url: "https://myapp.com"
        narration: "The avatar reacts to this narration."
        wait: 2.0
Avatar with custom image from URL
scenarios:
  - name: "Demo with Custom Avatar"
    url: "https://myapp.com"
    avatar:
      enabled: true
      provider: "animated"
      image: "https://avatars.githubusercontent.com/u/22380190?v=4"
      style: "bounce"
      position: "bottom-right"
      size: 120
      shape: "circle"
    steps:
      - action: "navigate"
        url: "https://myapp.com"
        narration: "My avatar uses an image loaded from a URL."
        wait: 2.0
Paid D-ID avatar (talking head)
scenarios:
  - name: "Demo with Talking Head"
    url: "https://myapp.com"
    avatar:
      enabled: true
      provider: "d-id"
      image: "presenter.jpg"
      position: "bottom-left"
      size: 200
      api_key: "${D_ID_API_KEY}"
    steps:
      - action: "navigate"
        url: "https://myapp.com"
        narration: "A real talking-head avatar powered by D-ID."
        wait: 3.0
πŸ’‘Combine avatar with cursor and glow_select for a fully polished demo experience. Add composite_avatar to your pipeline to enable the overlay.
Avatar with inline subtitles
scenarios:
  - name: "Demo with Avatar Subtitles"
    url: "https://myapp.com"
    avatar:
      enabled: true
      provider: "animated"
      style: "clippy"
      position: "bottom-right"
      size: 100
      show_subtitle: true
      subtitle_font_size: 16
    steps:
      - action: "navigate"
        url: "https://myapp.com"
        narration: "Narration text appears right below the avatar."
        wait: 2.0
Live ExampleAvatar + subtitles β€” synced to narration
config.yaml
subtitle:
  enabled: true
  style: "cinema"
  speed: "normal"

scenarios:
  - name: "Site Tour"
    url: "https://fran-cois.github.io/demodsl/"
    avatar:
      enabled: true
      provider: "animated"
      style: "clippy"
      position: "bottom-right"
      size: 100
      shape: "circle"
    steps:
      - action: "navigate"
        url: "https://fran-cois.github.io/demodsl/"
        narration: "The avatar pulses to each narration."
        wait: 2.0
      - action: "scroll"
        direction: "down"
        pixels: 600
        narration: "Subtitles appear in cinema style."
        wait: 2.0

pipeline:
  - composite_avatar: {}
  - burn_subtitles: {}
  - edit_video: {}
  - mix_audio: {}
  - optimize: {}

steps

Steps define individual browser actions within a scenario. Each step has an action type and action-specific fields. All steps also support optional narration, wait, and effects.

Common Fields (all actions)

PropertyTypeDefaultDescription
action"navigate" | "click" | "type" | "scroll" | "wait_for" | "screenshot"β€”Required. The action type.
narrationstring | nullnullText-to-speech narration played during this step.
waitfloat | nullnullSeconds to wait after the action completes.
effectsEffect[]nullVisual effects to apply during this step.
cardCardContent | nullnullPopup card content (title, body, items, icon). Shown synced with narration when popup_card mode is enabled.

action: "navigate"

Navigate the browser to a URL.

PropertyTypeDefaultDescription
urlstringβ€”Required. The URL to navigate to.
- action: "navigate"
  url: "https://myapp.com/dashboard"
  narration: "Let's visit the dashboard."
  wait: 2.0

action: "click"

Click on an element identified by a locator.

PropertyTypeDefaultDescription
locator.type"css" | "id" | "xpath" | "text""css"Locator strategy.
locator.valuestringβ€”Required. The selector/identifier.
- action: "click"
  locator:
    type: "css"
    value: "#submit-btn"
  narration: "Click submit."
  effects:
    - type: "highlight"
      color: "#FFD700"
Live ExampleClick Actions β€” using text locators
config.yaml
scenarios:
  - name: "Click Interactions"
    url: "https://fran-cois.github.io/demodsl/"
    browser: "webkit"
    viewport: { width: 1280, height: 720 }
    steps:
      - action: "navigate"
        url: "https://fran-cois.github.io/demodsl/"
        narration: "Open the documentation site."
        wait: 2.0
      - action: "click"
        locator:
          type: "text"
          value: "Get Started"
        narration: "Click Get Started via text locator."
        wait: 1.5
      - action: "click"
        locator:
          type: "text"
          value: "GitHub β†’"
        narration: "Click the GitHub link."
        wait: 2.0

action: "type"

Type text into an input field.

PropertyTypeDefaultDescription
locator.type"css" | "id" | "xpath" | "text""css"Locator strategy.
locator.valuestringβ€”Required. The selector.
valuestringβ€”Required. The text to type.
- action: "type"
  locator:
    type: "id"
    value: "email"
  value: "user@example.com"
  effects:
    - type: "typewriter"
      speed: 0.1

action: "scroll"

Scroll the page in a direction.

PropertyTypeDefaultDescription
direction"up" | "down" | "left" | "right""down"Scroll direction.
pixelsint300Number of pixels to scroll.
- action: "scroll"
  direction: "down"
  pixels: 500
  narration: "Scrolling to see more features."
Live ExampleNavigate & Scroll β€” generated from this config
config.yaml
scenarios:
  - name: "Navigate and Scroll"
    url: "https://fran-cois.github.io/demodsl/"
    browser: "webkit"
    viewport: { width: 1280, height: 720 }
    steps:
      - action: "navigate"
        url: "https://fran-cois.github.io/demodsl/"
        narration: "Navigate to the target URL."
        wait: 2.0
      - action: "scroll"
        direction: "down"
        pixels: 400
        narration: "Scroll down 400 pixels."
        wait: 1.5
      - action: "scroll"
        direction: "down"
        pixels: 600
        narration: "Continue scrolling."
        wait: 1.5
      - action: "scroll"
        direction: "up"
        pixels: 300
        narration: "Scroll back up."
        wait: 1.5

action: "wait_for"

Wait for an element to appear in the DOM.

PropertyTypeDefaultDescription
locator.type"css" | "id" | "xpath" | "text""css"Locator strategy.
locator.valuestringβ€”Required. The selector.
timeoutfloat5.0Maximum wait time in seconds.
- action: "wait_for"
  locator:
    type: "css"
    value: ".dashboard-loaded"
  timeout: 10.0
  narration: "Waiting for the dashboard to load."
⚠️If the element is not found within timeout seconds, the step throws an error and the scenario stops.
Live Examplewait_for β€” wait for elements before interacting
config.yaml
steps:
  - action: "navigate"
    url: "https://fran-cois.github.io/demodsl/docs"
    narration: "Navigate to the docs page."
    wait: 2.0
  - action: "wait_for"
    locator:
      type: "css"
      value: "nav a"
    timeout: 5.0
    narration: "Wait for the sidebar nav to load."
    wait: 1.5
  - action: "click"
    locator:
      type: "css"
      value: "a[href='#effects']"
    narration: "Click the effects link."
    wait: 2.0
  - action: "wait_for"
    locator:
      type: "css"
      value: "#effects"
    timeout: 5.0
    narration: "Wait for effects heading to appear."
    wait: 1.5

action: "screenshot"

Capture a screenshot of the current page.

PropertyTypeDefaultDescription
filenamestring"screenshot.png"Output filename. Saved to the workspace frames directory.
- action: "screenshot"
  filename: "final_state.png"
  narration: "Here's the final result."
Live ExampleMobile Viewport & Screenshot capture
config.yaml
scenarios:
  - name: "Mobile Capture"
    url: "https://fran-cois.github.io/demodsl/"
    browser: "webkit"
    viewport:
      width: 390
      height: 844
    steps:
      - action: "navigate"
        url: "https://fran-cois.github.io/demodsl/"
        narration: "Load in mobile viewport, 390x844."
        wait: 2.0
      - action: "scroll"
        direction: "down"
        pixels: 400
        narration: "See the responsive layout."
        wait: 1.5
      - action: "screenshot"
        filename: "mobile_capture.png"
        narration: "Take a screenshot."
        wait: 1.0

Locator Types

Four locator strategies are available for identifying elements:

PropertyTypeDefaultDescription
cssβ€”β€”CSS selector. Examples: "#id", ".class", "button[type=submit]"
idβ€”β€”Element ID (shorthand for #id). Example: "email-input"
xpathβ€”β€”XPath expression. Example: "//div[@class='card']"
textβ€”β€”Visible text content. Example: "Sign Up"
πŸ’‘Prefer css selectors for stability. Use text locators for buttons or links where the visible text is more stable than the CSS structure.
Live ExampleAll locator types in action
config.yaml
steps:
  - action: "click"
    locator:
      type: "text"
      value: "Get Started"
    narration: "Text locator: click by visible text."
    wait: 2.0
  - action: "click"
    locator:
      type: "text"
      value: "Documentation"
    narration: "Text locator: click Documentation."
    wait: 2.0
  - action: "click"
    locator:
      type: "css"
      value: "a[href='#pipeline']"
    narration: "CSS locator: jump to pipeline."
    wait: 2.0
  - action: "scroll"
    direction: "down"
    pixels: 400
    narration: "Supports css, id, xpath, and text."
    wait: 1.5

effects

43 visual effects are available, split into five categories: browser effects (11 β€” injected as CSS/JS during capture), cursor trail variants (6 β€” animated trails following the cursor), fun / celebration effects (6 β€” confetti-style canvas overlays), post-processing effects (7 β€” applied to the rendered video via MoviePy), and camera & cinematic effects (13 β€” advanced camera movements and cinematic post-processing). Effects are attached to individual steps.

PropertyTypeDefaultDescription
typeEffectTypeβ€”Required. Effect name (see tables below).
durationfloat | nullnullEffect duration in seconds.
intensityfloat | nullnullEffect intensity (0.0–1.0).
colorstring | nullnullEffect color (hex). Used by highlight, glow, neon_glow.
speedfloat | nullnullAnimation speed. Used by typewriter, camera_shake, rotate.
scalefloat | nullnullZoom scale factor. Used by zoom_pulse, drone_zoom, ken_burns, zoom_to, elastic_zoom.
depthint | nullnullParallax depth. Used by parallax.
directionstring | nullnullDirection ("left", "right", "up", "down"). Used by slide_in, ken_burns, whip_pan, focus_pull.
target_xfloat | nullnullNormalized X position (0.0–1.0). Used by drone_zoom, zoom_to.
target_yfloat | nullnullNormalized Y position (0.0–1.0). Used by drone_zoom, zoom_to.
anglefloat | nullnullRotation angle in degrees. Used by rotate.
ratiofloat | nullnullAspect ratio (e.g. 2.35 for cinemascope). Used by letterbox.
presetstring | nullnullColor grade preset ("warm", "cool", "desaturate", "vintage", "cinematic"). Used by color_grade.
focus_positionfloat | nullnullFocus band position (0.0–1.0). Used by tilt_shift.

Browser Effects (real-time JS injection)

These effects inject CSS/JavaScript into the browser during capture, creating real-time visual overlays.

PropertyTypeDefaultDescription
spotlightduration(2), intensity(0.7)β€”Radial gradient spotlight overlay, darkens edges.
highlightduration(2), color(#FFD700), intensity(0.8)β€”Glowing box-shadow on hovered elements.
confettiduration(3), count(150), colors([list]), speed_min(1.5), speed_range(3.0)β€”Animated falling confetti particles (canvas).
typewriterduration(2), caret_color(#333), blink_speed(0.7), bg_color, text_color, font_size(18), labelβ€”Blinking caret animation on input fields.
glowduration(2), color(#6366f1)β€”Inner box-shadow glow around the viewport.
shockwaveduration(0.8), color(#FF5722), glow_color, border_width(4), max_size(600), glow(15)β€”Expanding ring animation from center.
sparkleduration(3), count(80), color(#FFD700), min_size(2), max_size(8)β€”Random sparkling golden dots (canvas).
cursor_trailduration(3), color(#a855f7), size(22), glow(14), fade_duration(1.2), max_dots(80)β€”Trailing particles following the cursor.
rippleduration(0.6), color(#4FC3F7), glow_color, border_width(3), max_size(200), glow(12)β€”Click ripple effect on interactions.
neon_glowduration(2), color(#FF00FF)β€”Neon-colored glow border around the viewport.
success_checkmarkduration(1.2), color(#4CAF50), size(140), glow(20), symbol(βœ“)β€”Animated green checkmark overlay.
frosted_glassduration(3), intensity(0.5)β€”Frosted glass blur overlay.
morphing_backgroundduration(5), colors([list])β€”Animated gradient background morphing.
matrix_rainduration(5), color(#00FF41), density(0.05), speed(1.0)β€”Matrix-style falling green characters.
text_highlightduration(2), color(#FFD700)β€”Highlighted text background animation.
text_scrambleduration(2), speed(50)β€”Text scramble/decode animation.
magnetic_hoverduration(3), intensity(0.5)β€”Magnetic attraction effect on hover.
tooltip_annotationduration(3), text, color(#333)β€”Tooltip annotation popup.
progress_barduration(3), color(#4CAF50), position(top), intensity(4)β€”Animated progress bar filling horizontally.
countdown_timerduration(5), color(#333), position(center)β€”Countdown circle timer overlay.
callout_arrowduration(3), text, color(#FF6B6B), target_x(0.5), target_y(0.5)β€”Arrow callout pointing to coordinates.
Live Examplespotlight β€” radial gradient overlay
config.yaml
effects:
  - type: "spotlight"
    intensity: 0.8
    duration: 2.0
Live Examplehighlight β€” glowing box-shadow on hover
config.yaml
effects:
  - type: "highlight"
    color: "#FFD700"
    intensity: 0.9
    duration: 2.0
Live Exampleconfetti β€” falling particles
config.yaml
effects:
  - type: "confetti"
    duration: 2.0
    count: 200
    colors: ["#FF6B6B", "#4ECDC4", "#45B7D1", "#FFA07A"]
    speed_min: 2.0
Live Exampletypewriter β€” blinking caret on inputs
config.yaml
effects:
  - type: "typewriter"
    duration: 2.0
Live Exampleglow β€” inner box-shadow glow
config.yaml
effects:
  - type: "glow"
    color: "#6366f1"
    duration: 2.0
Live Exampleshockwave β€” expanding ring animation
config.yaml
effects:
  - type: "shockwave"
    duration: 1.0
    color: "#FF5722"
    max_size: 800
    glow: 20
Live Examplesparkle β€” golden sparkling dots
config.yaml
effects:
  - type: "sparkle"
    duration: 2.0
    count: 100
    color: "#FFD700"
    max_size: 10
Live Examplecursor_trail β€” trailing particles
config.yaml
effects:
  - type: "cursor_trail"
    duration: 2.0
    color: "#a855f7"
    size: 22
    glow: 14
    max_dots: 80
Live Exampleripple β€” click ripple effect
config.yaml
effects:
  - type: "ripple"
    duration: 2.0
Live Exampleneon_glow β€” vivid neon border
config.yaml
effects:
  - type: "neon_glow"
    color: "#FF00FF"
    duration: 2.0
Live Examplesuccess_checkmark β€” animated green βœ“
config.yaml
effects:
  - type: "success_checkmark"
    duration: 2.0

Cursor Trail Variants

Six animated cursor trail styles β€” each follows mouse movement with a unique visual style. All are browser-injected effects.

PropertyTypeDefaultDescription
cursor_trail_rainbowduration(3), size(18), hue_step(12), glow(12), fade_duration(1.4), lifetime(2200)β€”Rainbow-colored dots cycling through hues.
cursor_trail_cometduration(3), color(rgba(168,85,247,1)), glow_color, layers(4), size(22), size_step(3), fade_duration(0.8)β€”Comet tail with size gradient (3 particles per move).
cursor_trail_glowduration(3), color(#00BFFF), size(36), glow_inner(24), glow_outer(48), fade_duration(1.5), lifetime(2000), scale_end(2.5)β€”Soft glowing trail with radial gradient and box-shadow.
cursor_trail_lineduration(3), color(rgba(168,85,247,1)), max_points(60), min_width(2), max_width(7)β€”Connected SVG line segments following the cursor.
cursor_trail_particlesduration(3), count(6), min_size(8), size_range(6), spread(35), hue_base(180), hue_range(60), glow(8), fade_delay(200), lifetime(1400)β€”Particle burst on each mouse move (5 per event).
cursor_trail_fireduration(3), sparks(5), min_size(10), size_range(12), glow(10), hue_base(10), hue_range(40), fade_delay(300), lifetime(1500)β€”Warm orange/red fire sparks rising and fading.
Live Examplecursor_trail_rainbow β€” rainbow cycling dots
config.yaml
effects:
  - type: "cursor_trail_rainbow"
    duration: 3.0
    size: 22
    hue_step: 15
    glow: 16
Live Examplecursor_trail_comet β€” size gradient tail
config.yaml
effects:
  - type: "cursor_trail_comet"
    duration: 3.0
    color: "rgba(168,85,247,1)"
    layers: 5
    size: 26
Live Examplecursor_trail_glow β€” soft glowing trail
config.yaml
effects:
  - type: "cursor_trail_glow"
    color: "#00BFFF"
    duration: 3.0
Live Examplecursor_trail_line β€” connected SVG segments
config.yaml
effects:
  - type: "cursor_trail_line"
    duration: 3.0
    color: "rgba(168,85,247,1)"
    max_points: 80
    max_width: 10
Live Examplecursor_trail_particles β€” particle burst
config.yaml
effects:
  - type: "cursor_trail_particles"
    duration: 3.0
    count: 8
    spread: 45
    hue_base: 200
    hue_range: 80
Live Examplecursor_trail_fire β€” fire sparks
config.yaml
effects:
  - type: "cursor_trail_fire"
    duration: 3.0
    sparks: 8
    hue_base: 0
    hue_range: 50

Fun / Celebration Effects

Six celebration-style canvas overlays for joyful moments. All auto-cleanup after their animation completes.

PropertyTypeDefaultDescription
emoji_rainduration(4), count(60), min_size(22), size_range(20), speed_min(1.5), speed_range(2.5), emojis([πŸŽ‰,πŸ”₯,❀️,⭐,πŸš€,πŸ’―])β€”Rain of emojis (πŸŽ‰πŸ”₯β€οΈβ­πŸš€πŸ’―) falling from the top.
fireworksduration(3), initial_rockets(8), launch_interval(1200), particles_per_rocket(50), particle_speed_min(1.5), particle_speed_range(4), gravity(0.05), fade_rate(0.012)β€”Rockets launching and exploding into colorful particles.
bubblesduration(4), count(45), min_radius(10), max_radius(35), speed_min(0.5), speed_range(1.5), hue_base(180), hue_range(60)β€”Translucent bubbles rising with sinusoidal wobble.
snowduration(5), count(120), min_radius(3), max_radius(8), color(rgba(200,230,255,0.85)), glow_color, glow(4), speed_min(0.8), speed_max(2.8)β€”Snowflakes drifting down with gentle wind drift.
star_burstduration(3), count(80), speed_min(2), speed_range(5), hue_base(40), hue_range(60), decay(0.006)β€”5-pointed stars exploding from the center.
party_popperduration(3), count(55), colors([list]), min_size(8), size_range(10), speed_min(4), speed_range(7), gravity(0.12), fade_rate(0.003)β€”Confetti shapes (rect/circle/triangle) from both bottom corners.
Live Exampleemoji_rain β€” falling emojis πŸŽ‰πŸ”₯⭐
config.yaml
effects:
  - type: "emoji_rain"
    duration: 4.0
    count: 80
    emojis: ["πŸŽ‰", "πŸ”₯", "❀️", "⭐", "πŸš€", "πŸ’―", "🎊"]
    speed_min: 2.0
Live Examplefireworks β€” rockets and explosions πŸŽ†
config.yaml
effects:
  - type: "fireworks"
    duration: 5.0
    initial_rockets: 12
    particles_per_rocket: 80
    launch_interval: 800
Live Examplebubbles β€” translucent rising bubbles
config.yaml
effects:
  - type: "bubbles"
    duration: 4.0
Live Examplesnow β€” drifting snowflakes ❄️
config.yaml
effects:
  - type: "snow"
    duration: 6.0
    count: 150
    min_radius: 2
    max_radius: 10
    speed_min: 0.5
    speed_max: 3.0
Live Examplestar_burst β€” exploding stars ⭐
config.yaml
effects:
  - type: "star_burst"
    duration: 3.0
    count: 100
    hue_base: 0
    hue_range: 360
Live Exampleparty_popper β€” corner confetti 🎊
config.yaml
effects:
  - type: "party_popper"
    duration: 4.0
    count: 80
    gravity: 0.15
    colors: ["#FF6B6B", "#4ECDC4", "#45B7D1", "#FFA07A"]

Post-Processing Effects (MoviePy)

These effects are applied to the video during the apply_effects pipeline stage.

PropertyTypeDefaultDescription
parallaxduration, depthβ€”Subtle zoom for a depth illusion.
zoom_pulseduration, scaleβ€”Pulsing zoom in/out following a sine wave.
fade_indurationβ€”Clip fades in from black.
fade_outdurationβ€”Clip fades out to black.
vignetteduration, intensityβ€”Dark vignette border around the frame.
glitchduration, intensityβ€”Random horizontal slice displacement.
slide_induration, directionβ€”Slide-in entrance animation (implemented as crossfade).
Combining effects
steps:
  - action: "click"
    locator: { type: "css", value: "#cta" }
    narration: "Click the call to action!"
    effects:
      - type: "highlight"
        color: "#FFD700"
        duration: 1.5
      - type: "confetti"
        duration: 2.0
      - type: "zoom_pulse"
        scale: 1.2
        duration: 1.0
ℹ️Browser effects execute before the step action. Multiple effects on one step are applied sequentially, each waiting for its duration before the next.
Live Example5 browser effects: spotlight, highlight, glow, neon_glow, checkmark
config.yaml
steps:
  - action: "navigate"
    url: "https://fran-cois.github.io/demodsl/"
    narration: "Effects are injected via JS during capture."
    wait: 2.0
    effects:
      - type: "spotlight"
        duration: 2.0
        intensity: 0.8
  - action: "scroll"
    direction: "down"
    pixels: 500
    narration: "Highlight adds a glowing box-shadow."
    effects:
      - type: "highlight"
        duration: 2.0
        color: "#FFD700"
  - action: "scroll"
    direction: "down"
    pixels: 500
    narration: "Glow creates an inner glow."
    effects:
      - type: "glow"
        duration: 2.0
        color: "#6366f1"
  - action: "scroll"
    direction: "down"
    pixels: 500
    narration: "Neon glow adds a vivid border."
    effects:
      - type: "neon_glow"
        duration: 2.0
        color: "#FF00FF"
  - action: "screenshot"
    narration: "Success checkmark overlay."
    effects:
      - type: "success_checkmark"
        duration: 2.0

Camera & Cinematic Effects

13 advanced camera and cinematic effects for professional-looking demos. These are all post-processing effects applied via MoviePy β€” they simulate real camera movements and cinematic grading on the rendered video.

Camera Movement Effects

PropertyTypeDefaultDescription
drone_zoomscale, target_x, target_yβ€”Smooth progressive zoom towards a target point β€” simulates a drone descent.
ken_burnsscale, directionβ€”Classic documentary pan + zoom (slow push with lateral drift).
zoom_toscale, target_x, target_yβ€”Zoom to a specific point and hold β€” great for highlighting UI elements.
dolly_zoomintensityβ€”Vertigo / dolly-zoom: zoom in while widening the crop.
elastic_zoomscaleβ€”Zoom with elastic overshoot bounce (ease-out-back).
camera_shakeintensity, speedβ€”Subtle camera shake / handheld feel.
whip_pandirectionβ€”Fast horizontal/vertical pan with motion blur β€” great for transitions.
rotateangle, speedβ€”Gentle animated rotation β€” subtle tilt for dynamic feel.
Live Exampledrone_zoom β€” smooth descent towards a target
config.yaml
effects:
  - type: "drone_zoom"
    scale: 1.4
    target_x: 0.5   # center horizontally
    target_y: 0.3   # focus on upper third
Live Exampleken_burns β€” classic documentary pan + zoom
config.yaml
effects:
  - type: "ken_burns"
    scale: 1.15
    direction: "right"  # left, right, up, down
Live Examplezoom_to β€” zoom and hold on a UI element
config.yaml
effects:
  - type: "zoom_to"
    scale: 1.8
    target_x: 0.5
    target_y: 0.4
Live Exampledolly_zoom β€” dramatic vertigo effect
config.yaml
effects:
  - type: "dolly_zoom"
    intensity: 0.3
Live Exampleelastic_zoom β€” bouncy zoom with overshoot
config.yaml
effects:
  - type: "elastic_zoom"
    scale: 1.3
Live Examplecamera_shake β€” subtle handheld feel
config.yaml
effects:
  - type: "camera_shake"
    intensity: 0.3
    speed: 8.0
Live Examplewhip_pan β€” fast transition with motion blur
config.yaml
effects:
  - type: "whip_pan"
    direction: "right"  # left, right, up, down
Live Examplerotate β€” gentle animated tilt
config.yaml
effects:
  - type: "rotate"
    angle: 3.0    # degrees
    speed: 1.0    # oscillations per clip

Cinematic Effects

PropertyTypeDefaultDescription
letterboxratioβ€”Cinematic black bars (e.g. 2.35:1 cinemascope).
film_grainintensityβ€”Analog film grain overlay.
color_gradepresetβ€”Color grading presets: warm, cool, desaturate, vintage, cinematic.
focus_pulldirection, intensityβ€”Rack focus: transition from sharp to blurry (or reverse).
tilt_shiftintensity, focus_positionβ€”Miniature / tilt-shift: sharp band in center, blurred edges.
Live Exampleletterbox β€” cinematic 2.35:1 black bars
config.yaml
effects:
  - type: "letterbox"
    ratio: 2.35   # cinemascope
Live Examplefilm_grain β€” analog film texture
config.yaml
effects:
  - type: "film_grain"
    intensity: 0.3
Live Examplecolor_grade β€” cinematic color grading
config.yaml
effects:
  - type: "color_grade"
    preset: "cinematic"  # warm, cool, desaturate, vintage, cinematic
Live Examplefocus_pull β€” rack focus transition
config.yaml
effects:
  - type: "focus_pull"
    direction: "out"   # in = blur→sharp, out = sharp→blur
    intensity: 0.5
Live Exampletilt_shift β€” miniature effect
config.yaml
effects:
  - type: "tilt_shift"
    intensity: 0.6
    focus_position: 0.5  # 0.0=top, 0.5=center, 1.0=bottom
πŸ’‘Combine camera effects for professional results: pair letterbox + color_grade + film_grain for a cinematic look, or drone_zoom + vignette for a dramatic reveal.
Full cinematic combo example
steps:
  - action: "navigate"
    url: "https://example.com"
    narration: "A cinematic reveal of our product."
    effects:
      - type: "drone_zoom"
        scale: 1.4
        target_x: 0.5
        target_y: 0.3
      - type: "letterbox"
        ratio: 2.35
      - type: "color_grade"
        preset: "cinematic"
      - type: "film_grain"
        intensity: 0.2
      - type: "vignette"
        intensity: 0.4

pipeline

The pipeline defines the post-processing chain using a Chain of Responsibility pattern. Each stage is a single-key dictionary. Stages execute in order, passing context to the next.

Each stage is either critical (failure stops the pipeline) or optional (failure is logged and skipped).

PropertyTypeDefaultDescription
restore_audiooptional{ denoise, normalize }Audio restoration: noise removal, loudness normalization.
restore_videooptional{ stabilize, sharpen }Video restoration: stabilization, sharpening.
apply_effectsoptional{}Apply post-processing visual effects from step definitions.
generate_narrationcritical{}Generate TTS audio clips and sync to video timeline.
composite_avataroptional{}Overlay avatar clips on the video. Requires avatar config in the scenario.
burn_subtitlesoptional{}Burn ASS subtitles into the video. Requires subtitle config (top-level or per-scenario).
render_device_mockupoptional{}Overlay video into a 3D device frame.
edit_videocritical{}Apply intro, outro, transitions, and watermark.
mix_audiocritical{}Mix voice narration with background music (ducking).
optimizecritical{ format, codec, quality, target_size_mb }Final encoding, compression, and format export.
fit_durationoptional{ target_duration, strategy, min_speed, max_speed }Adjust video speed so that the final video matches a target duration.
Pipeline syntax
pipeline:
  # Each stage is a single-key dict
  - restore_audio:
      denoise: true
      normalize: true
  - restore_video:
      stabilize: true
      sharpen: true
  - apply_effects: {}
  - generate_narration: {}
  - composite_avatar: {}
  - burn_subtitles: {}
  - render_device_mockup: {}
  - edit_video: {}
  - mix_audio: {}
  - fit_duration:
      target_duration: 60
      strategy: "any"
  - optimize:
      format: "mp4"
      codec: "h264"
      quality: "high"
      target_size_mb: 50

optimize stage parameters

PropertyTypeDefaultDescription
formatstring"mp4"Output format: "mp4", "webm", "gif".
codecstring"h264"Video codec: "h264", "h265", "vp9", etc.
qualitystring"high"Encoding quality: "low", "medium", "high".
target_size_mbint | nullnullTarget file size in MB. Overrides quality if set.
⚠️Pipeline stages with {} (empty dict) use all defaults. Each stage dict must have exactly one key β€” multiple keys in a single dict will raise a validation error.
πŸ’‘You can reorder stages or omit optional ones. A minimal pipeline might be just generate_narration, edit_video, mix_audio, and optimize.

fit_duration stage parameters

Automatically adjusts video playback speed so that the final video matches a given target_duration in seconds. Useful when you need to produce a demo that fits a specific time slot (e.g. a 60-second social clip or a 3-minute explainer).

PropertyTypeDefaultDescription
target_durationfloatβ€”Required. Target duration in seconds.
strategy"any" | "speed_up" | "slow_down""any"Direction constraint. "any" allows both speed up and slow down.
min_speedfloat0.25Minimum speed factor (prevents extreme slow-motion).
max_speedfloat4.0Maximum speed factor (prevents unwatchable fast-forward).
fit_duration example
pipeline:
  - generate_narration: {}
  - edit_video: {}
  - mix_audio: {}
  - fit_duration:
      target_duration: 60       # make the video exactly 60 seconds
      strategy: "any"            # speed up or slow down as needed
      min_speed: 0.5             # never slower than 0.5x
      max_speed: 3.0             # never faster than 3x
ℹ️Place fit_duration after edit_video and speed but before optimize so that intro/outro and manual speed changes are applied first, then the whole video is time-fitted.

output

Defines output filenames, formats, thumbnail generation, and social media export presets.

PropertyTypeDefaultDescription
filenamestring"output.mp4"Main output filename.
directorystring"output/"Output directory path.
formatsstring[]["mp4"]Export formats: "mp4", "webm", "gif".
thumbnailsThumbnail[]nullAuto-generated thumbnail frames.
socialSocialExport[]nullPlatform-specific export presets.

output.thumbnails

PropertyTypeDefaultDescription
timestampfloatβ€”Required. Time in seconds to capture the thumbnail.

output.social

Generate platform-optimized versions automatically. Each preset re-encodes the video with platform-specific constraints.

PropertyTypeDefaultDescription
platformstringβ€”Required. Platform name (for labeling).
resolutionstring | nullnullOutput resolution (e.g. "1920x1080").
bitratestring | nullnullTarget bitrate (e.g. "8000k").
aspect_ratiostring | nullnullCrop to aspect ratio (e.g. "1:1", "9:16").
max_durationint | nullnullMaximum duration in seconds (trims end).
max_size_mbint | nullnullMaximum file size in MB.
Full output example
output:
  filename: "demo.mp4"
  directory: "output/"
  formats:
    - "mp4"
    - "webm"
    - "gif"
  thumbnails:
    - timestamp: 0.0
    - timestamp: 5.0
    - timestamp: 10.0
  social:
    - platform: "youtube"
      resolution: "1920x1080"
      bitrate: "8000k"
    - platform: "instagram"
      resolution: "1080x1080"
      aspect_ratio: "1:1"
      max_duration: 60
    - platform: "twitter"
      resolution: "1280x720"
      max_duration: 140
      max_size_mb: 15

analytics Beta

Optional engagement tracking metadata embedded in the output.

PropertyTypeDefaultDescription
track_engagementboolfalseTrack viewer engagement metrics.
heatmapboolfalseGenerate click/attention heatmap data.
click_trackingboolfalseTrack interactive click positions.
analytics:
  track_engagement: true
  heatmap: true
  click_tracking: true

Pre-flight Checks

Before launching the recording browser, DemoDSL probes the URLs your scenario depends on so you find out why a recording shows a challenge or empty page before waiting for the full render. All pre-flight checks are advisory: they never abort the demo. Network failures (DNS, timeout, offline) are treated as "passing" so demos still run on locked-down machines.

Page accessibility (anti-bot / WAF detection)

Many production sites are fronted by anti-bot or WAF services that serve a JavaScript challenge or a 403/429/503 to non-browser clients. Recording such a page captures the challenge instead of your real UI. DemoDSL fetches each scenario.url and everynavigate step URL with a short GET probe (only the first 16Β KB of the body) and inspects the response for known fingerprints. When a block is detected a WARNINGis logged with the protection name, the HTTP status and the matching signal β€” the demo still runs so the recording acts as documentation of the issue.

Detected protections include:

PropertyTypeDefaultDescription
cloudflareWAFβ€”cf-ray, cf-mitigated, __cf_bm, β€œJust a moment…”, Turnstile, Attention Required.
datadomeAnti-botβ€”x-dd-b / x-datadome headers, datadome cookie, geo.captcha-delivery.com markers.
akamaiWAF / Bot Managerβ€”AkamaiGHost server, ak_bmsc / _abck cookies, β€œAccess Denied” reference page.
impervaWAF (Incapsula)β€”X-Iinfo header, visid_incap_ / incap_ses_ cookies, β€œIncapsula incident id”.
aws-wafWAFβ€”x-amzn-waf-action header, aws-waf-token cookie, AWSWAFCaptcha page.
f5-shapeWAF / bot defenseβ€”BIG-IP cookie + block, β€œThe requested URL was rejected”, Shape JS.
sucuriWAFβ€”Server: Sucuri/Cloudproxy, Sucuri firewall block page.
kasadaAnti-botβ€”x-kpsdk-ct header, /kpsdk/ markers.
perimeterxAnti-bot (HUMAN)β€”_px3 / _pxhd cookies, px-captcha / _pxCaptcha markers.
captchaGeneric gateβ€”hCaptcha, reCAPTCHA, Cloudflare Turnstile, Arkose/FunCaptcha interstitials.
β€”HTTP statusβ€”401, 403, 404, 429, 451, 5xx with friendlier reason text.

Sample log output for a Cloudflare-protected URL:

WARNING [page-precheck] https://example.com β†’ not accessible Β· protected by cloudflare Β· HTTP 403 Β· cf-mitigated: challenge β€” the recorded demo is likely to show a challenge or error page. Consider using a different URL, a fixture/screenshot, or running the demo on an allow-listed network.

Recommended workarounds when a URL is blocked:

You can call the probe programmatically:

from demodsl.page_precheck import probe_page_accessible, precheck_urls

result = probe_page_accessible("https://example.com")
if not result.accessible:
    print(result.format_warning())
    # β†’ "https://example.com β†’ not accessible Β· protected by cloudflare Β· HTTP 403 Β· cf-mitigated: challenge"

# Batch probe with built-in WARNING logging
precheck_urls(["https://a.example", "https://b.example"])

Iframe embeddability (secondary windows)

When a scenario uses background.secondary_windows[].urlto embed a live site behind the main browser, DemoDSL probes each URL for headers that block iframing:

When a window's URL is blocked, DemoDSL automatically records a short headless clip of the page in an isolated Playwright instance and substitutes a muted, looping <video> overlay for the iframe. If recording fails (or the helper is disabled), the window falls back to its background_color /screenshot. This step happens before the main browser is launched to avoid running two Chromium processes concurrently on memory-constrained hosts.

Recordings are cached on disk (keyed by URL + dimensions), so repeat runs are fast.

CLI Reference

demodsl run

Parse and execute a DemoDSL config file.

demodsl run <config> [OPTIONS]

Arguments:
  config    Path to the YAML or JSON config file.

Options:
  -o, --output-dir PATH  Output directory (default: output/)
  --dry-run              Validate and log all steps without executing
  --skip-voice           Skip TTS generation (development mode)
  --turbo                Fast preview: minimal waits, skip heavy post-processing
  -v, --verbose          Enable debug logging
πŸ’‘Use --dry-run to validate your config and preview all steps without launching a browser or calling TTS APIs.

Turbo mode

Add --turbo to generate a fast preview. All browser waits are clamped to 50 ms, and heavy post-processing passes are skipped: avatar compositing, 3D device rendering, subtitle burning, post-effects (freeze frames, speed ramps), global speed re-encode, and watermark overlay.

turbo preview
demodsl run demo.yaml --turbo

The YAML config itself does not change β€” turbo is purely a runtime flag. Define avatars, subtitles, and effects as usual, then iterate quickly with --turbo and remove it for the final high-quality render.

PropertyTypeDefaultDescription
SkippedAvatars, 3D rendering, subtitles, post-effects, speed re-encode, watermark
KeptBrowser recording, narration (TTS), browser effects, basic video editing
WaitsAll time.sleep() pauses clamped to 50 ms

demodsl validate

Validate a config file without executing any actions.

demodsl validate <config> [OPTIONS]

Arguments:
  config    Path to the YAML or JSON config file.

Options:
  -v, --verbose    Enable debug logging

Outputs a summary: title, version, number of scenarios, total steps, and pipeline stage count. Exits with code 1 on validation failure.

demodsl init

Generate a minimal config template.

demodsl init [OPTIONS]

Options:
  -o, --output PATH   Output file (default: demo.yaml)
                      Use .json extension for JSON output.

Examples:
  demodsl init                    # Creates demo.yaml
  demodsl init -o my-demo.yaml   # Custom filename
  demodsl init -o demo.json      # JSON format

Edge Cases & Gotchas

Minimal Config

The smallest valid config requires only metadata.title. Everything else has defaults or is optional:

Minimal valid config
metadata:
  title: "Empty Demo"

This will validate successfully but produce no output (no scenarios, no pipeline).

YAML vs JSON Detection

File format is detected by file extension only: .json β†’ JSON parser, anything else β†’ YAML parser. If you name a JSON file config.yaml, it will fail to parse.

Voice Provider Fallback

If the configured TTS engine's API key is missing, DemoDSL falls back to DummyVoiceProvider which generates silent audio clips. The dummy calculates duration from word count at ~150 words per minute. This is intentional behavior for local development.

Pipeline Stage Format

Each pipeline entry must be a single-key dictionary. Multiple keys in one entry will raise a validation error:

❌ Invalid β€” multiple keys
pipeline:
  - restore_audio: { denoise: true }
    restore_video: { sharpen: true }  # Error!
βœ… Valid β€” one key per entry
pipeline:
  - restore_audio: { denoise: true }
  - restore_video: { sharpen: true }

Critical vs Optional Stages

If a critical stage fails (generate_narration, edit_video, mix_audio, optimize), the entire pipeline stops and raises an error. If an optional stage fails (restore_audio, restore_video, apply_effects, render_device_mockup), it logs a warning and the pipeline continues.

Effect Execution Order

Browser effects are injected before the step action and execute sequentially. If an effect has a duration, the engine sleeps for that duration before the next effect. All effects complete before the browser action fires.

Required Fields by Action

PropertyTypeDefaultDescription
navigateurlβ€”Raises ValueError if url is missing.
clicklocatorβ€”Raises ValueError if locator is missing.
typelocator + valueβ€”Raises ValueError if either is missing.
scroll(none)β€”Defaults: direction="down", pixels=300.
wait_forlocatorβ€”Raises ValueError if locator is missing. Default timeout: 5s.
screenshot(none)β€”Default filename: "screenshot.png".

Viewport and Recording

The browser records video at the viewport resolution. For high-quality social media exports, set the scenario viewport to the maximum resolution you need β€” downscaling social presets is better than upscaling.

Ducking Without Music

If audio.background_music is not set or the file doesn't exist, the mix_audio stage skips music mixing entirely. Narration-only audio still works.

Empty Pipeline

If pipeline is an empty list or omitted, no post-processing runs. Raw browser recordings are copied directly to the output directory.

Multiple Scenarios

Multiple scenarios are executed sequentially. Each gets its own browser instance. Currently, the pipeline processes only the first scenario's video. Multi-scenario concatenation is handled by the edit_video stage.

Dry Run Behavior

With --dry-run, the engine validates the config, logs every step and effect with [DRY-RUN] prefix, but does not:

Environment Variables

PropertyTypeDefaultDescription
ELEVENLABS_API_KEYstringβ€”ElevenLabs TTS API key.
OPENAI_API_KEYstringβ€”OpenAI API key. Required for openai engine.
GOOGLE_APPLICATION_CREDENTIALSstringβ€”Path to Google Cloud service account JSON file.
AZURE_SPEECH_KEYstringβ€”Azure Cognitive Services Speech subscription key.
AZURE_SPEECH_REGIONstring"eastus"Azure region (e.g. eastus, westeurope).
AWS_ACCESS_KEY_IDstringβ€”AWS access key for Polly.
AWS_SECRET_ACCESS_KEYstringβ€”AWS secret key for Polly.
AWS_DEFAULT_REGIONstring"us-east-1"AWS region for Polly.
COSYVOICE_API_URLstring"http://localhost:50000"CosyVoice API server URL.
COQUI_MODELstring"xtts_v2"Coqui TTS model name (default: xtts_v2).
COQUI_LANGUAGEstring"en"Language code for Coqui TTS.
PIPER_BINstring"piper"Path to piper binary.
PIPER_MODELstringβ€”Required. Path to Piper .onnx voice model.
LOCAL_TTS_URLstring"http://localhost:8000"Base URL for OpenAI-compatible local TTS server.
LOCAL_TTS_API_KEYstring"not-needed"API key for local server (if required).
LOCAL_TTS_MODELstring"tts-1"Model name to pass to local server.
ESPEAK_BINstring"espeak-ng"Path to eSpeak-NG binary.
CUSTOM_TTS_URLstringβ€”Required. Full URL of your custom TTS HTTP endpoint.
CUSTOM_TTS_API_KEYstringβ€”Bearer token for custom TTS (optional).
CUSTOM_TTS_RESPONSE_FORMATstring"mp3"Audio format returned by the endpoint: "mp3" or "wav".
D_ID_API_KEYstringβ€”D-ID API key for talking-head avatar generation.
HEYGEN_API_KEYstringβ€”HeyGen API key for avatar video generation.

If the required environment variable for the selected engine is not set, DemoDSL automatically falls back to DummyVoiceProvider which generates silent audio clips. This allows development without API credentials.

DemoDSL v2.7.0 β€” MIT License β€” GitHub