28-day Challenge - Descrit AI

By

Hint: if you're on your phone turn it sideways ⤵️

Descript Mastery Course | Advanced Video & Audio Production Training

Descript Mastery Training Course

PRODUCTION • EDITING • AUTOMATION • PRODUCTION • EDITING • AUTOMATION • PRODUCTION • EDITING • AUTOMATION • PRODUCTION • EDITING • AUTOMATION • PRODUCTION • EDITING • AUTOMATION • PRODUCTION • EDITING • AUTOMATION • PRODUCTION • EDITING • AUTOMATION • PRODUCTION • EDITING • AUTOMATION •
PRODUCTION • EDITING • AUTOMATION • PRODUCTION • EDITING • AUTOMATION • PRODUCTION • EDITING • AUTOMATION • PRODUCTION • EDITING • AUTOMATION • PRODUCTION • EDITING • AUTOMATION • PRODUCTION • EDITING • AUTOMATION • PRODUCTION • EDITING • AUTOMATION • PRODUCTION • EDITING • AUTOMATION •
DESCRIPT

DESCRIPT MASTERY

Professional Video & Audio Production Program

MODULE 1: Foundation & Text-Based Editing

Master Descript's revolutionary text-based editing workflow and interface fundamentals for professional video and podcast production.

Why This Module Matters

Descript transforms video editing from a complex timeline puzzle into intuitive document editing. This foundational module teaches you the core paradigm shift that makes Descript 10x faster than traditional editors—editing by manipulating text instead of waveforms. By mastering these fundamentals, you'll reduce editing time from hours to minutes while maintaining broadcast-quality output.

Time Savings

70-80%

Learning Curve

2-3 Days

Skill Level

Beginner

Understanding Descript's Interface Architecture

The Three-Panel Paradigm

Descript's interface is built around three core panels that work in harmony. Understanding how these panels interact is essential for efficient editing.

Left Panel - Script View: This is your primary editing interface. The transcript appears as editable text, where every word corresponds directly to audio/video. When you delete text here, you're removing that spoken content from your media. This is where 80% of your editing happens.

Center Panel - Canvas/Preview: Your visual workspace showing the current video frame or waveform. For video projects, this displays your composition with layers. For audio-only projects, this shows the waveform. The canvas updates in real-time as you edit the script.

Right Panel - Properties & Layers: Context-sensitive controls that change based on what you've selected. This panel houses audio effects, video effects, layer management, and composition properties. It's your detailed control center.

Workspace Setup Best Practice:

For first-time users: Keep all three panels visible initially. Once comfortable, hide the right panel (Cmd/Ctrl + /) when doing pure transcript editing to maximize script space. Toggle it back when you need effects or layer controls.

Script View vs Timeline View

Descript offers two editing modes, each optimized for different tasks. Knowing when to use each is crucial for maximum efficiency.

When to Use Script View (Primary Mode):

  • Removing filler words, ums, ahs, and pauses
  • Cutting entire sections of spoken content
  • Reordering interview segments
  • Finding and replacing words across the entire project
  • Reading through content to identify weak sections
  • Adding word-by-word emphasis or corrections

When to Use Timeline View:

  • Precise audio fade adjustments at edit points
  • Fine-tuning video transitions frame-by-frame
  • Synchronizing B-roll with specific moments
  • Adjusting music bed levels under dialogue
  • Placing overlay graphics at exact timestamps
  • Working with non-transcribed audio (sound effects, music)

Workflow Pattern:

Start in Script View for content editing (removing bad takes, filler words, long pauses). Switch to Timeline View only when you need visual precision for B-roll placement, effects timing, or audio mixing. Toggle between views with Cmd/Ctrl + Shift + V.

Project Types and When to Use Each

Descript offers multiple project templates, each optimized for specific use cases.

Video Project: Use when your final output is video content. This enables the full canvas with composition layers, screen recording, webcam, and video file imports. Ideal for: YouTube videos, training content, social media videos, video podcasts.

Audio Project: Optimized for podcast workflows. Simpler interface focused on audio tracks and waveforms without video composition overhead. Ideal for: podcast episodes, audiobooks, audio-only interviews, radio segments.

Recording Project: For capturing new content directly in Descript. Choose this when starting a recording session rather than importing existing files. Supports screen recording, webcam, microphone, or system audio capture.

Project Selection Decision Tree:

Ask: "Does the final deliverable include video?" YES → Video Project NO → Audio Project Starting fresh recording? → Recording Project (then converts to Video/Audio project)

Mastering Text-Based Editing

The Core Editing Paradigm

Understanding how text manipulation affects media is fundamental to Descript mastery. Every action in the script has a direct, predictable impact on your audio/video.

Delete Text = Remove Media: When you select words in the script and press Delete, Descript removes that spoken audio/video segment. The media before and after automatically connects. This is how you eliminate mistakes, false starts, and rambling sections.

Cut/Copy/Paste Text = Rearrange Media: Standard text editing commands work on media. Select a paragraph, cut it (Cmd/Ctrl + X), move your cursor to a new location, and paste (Cmd/Ctrl + V). You've just reordered your video content by manipulating text.

Find & Replace = Batch Editing: Use Cmd/Ctrl + F to find specific words across your entire transcript. When you replace them, Descript can regenerate audio using Overdub (covered in Module 3) or simply find all instances for manual review.

Practical Example - Removing a Rambling Section:

Scenario: Your interview subject rambles for 2 minutes in the middle of an otherwise good answer. Traditional editing: Scrub through timeline, find start/end points, make precise cuts, adjust audio fades. Descript method: 1. Read the transcript 2. Select the rambling paragraph (click at start, shift+click at end) 3. Press Delete 4. Done - 10 seconds vs 5 minutes

Selection Techniques for Efficiency

How you select text dramatically impacts editing speed. Master these selection patterns to edit faster.

Word Selection: Double-click any word to select it. This selects the word plus its associated timing. Useful for: removing single filler words ("um," "uh," "like"), replacing mispronunciations, isolating specific terms.

Sentence Selection: Triple-click to select an entire sentence. The selection respects natural speech boundaries. Useful for: removing complete thoughts, reordering key points, batch deletion of weak statements.

Range Selection: Click at start point, hold Shift, click at endpoint. This works across multiple paragraphs. Useful for: removing large segments, moving entire sections, selecting multi-speaker exchanges.

Multi-Selection (Advanced): Hold Cmd/Ctrl and click multiple non-contiguous words. This selects separate words that aren't adjacent. Useful for: removing multiple filler words at once ("um... and uh... you know"), isolating repeated words across a section.

Speed Editing Exercise:

Challenge: Remove all instances of "you know" from a 10-minute transcript. Method: 1. Cmd/Ctrl + F to open Find 2. Type "you know" 3. Click through each instance (Cmd/Ctrl + G for next) 4. For each: Press Delete if it's filler, Skip if it's meaningful usage 5. Complete in under 3 minutes Traditional timeline editing: 20-30 minutes

Handling Silence and Pauses

Descript represents silence as gaps in the transcript. Learning to manage these gaps is essential for pacing control.

Viewing Silence: Enable "Show Gaps" in View menu (Cmd/Ctrl + Shift + G). Pauses appear as teal markers labeled with duration (e.g., "2.5s"). This reveals the pacing rhythm of your content.

Shortening Pauses: Click a gap marker to select it. Press Delete to remove entirely, or drag the marker edges to shorten the pause to your desired length. This tightens pacing without affecting the spoken content.

Adding Pauses: Place your cursor between words and press "Add Gap" or use the Add menu. Specify duration. This creates breathing room for dramatic effect or comprehension time.

Batch Gap Reduction: Select a large section of transcript, go to Edit → Shorten Word Gaps. Set maximum gap duration (try 1.2 seconds). Descript automatically reduces all pauses longer than this threshold, dramatically tightening pacing across entire segments.

Podcast Pacing Optimization:

Common Issue: Guest pauses 3-5 seconds between sentences, making content feel slow. Solution: 1. Select the guest's entire speaking section 2. Edit → Shorten Word Gaps 3. Set max gap to 1.5 seconds 4. Apply Result: All 3-5 second pauses become 1.5 seconds, maintaining natural rhythm while eliminating dead air. Reduces overall episode length by 10-15% without cutting content.

Advanced Transcript Manipulation

Correcting Transcription Errors

Descript's AI transcription is highly accurate but occasionally misinterprets words. Corrections improve both readability and searchability.

Inline Correction: Click any word to edit it directly. The correction appears in the script but doesn't change the underlying audio. This is perfect for proper nouns, technical terms, or brand names that were transcribed phonetically.

When to Correct vs Leave: Correct if the word affects meaning or professional appearance in captions/transcripts. Don't waste time correcting minor transcription quirks that won't be visible to audiences (like "gonna" vs "going to").

Building Vocabulary: When correcting specialized terms that will recur in future projects, add them to Descript's vocabulary (right-click → Add to Vocabulary). This trains the AI to correctly transcribe these terms in future projects.

Brand Name Correction Example:

Scenario: Technical podcast discusses "Kubernetes" but Descript transcribed it as "communities." Fix: 1. Find first instance (Cmd/Ctrl + F: communities) 2. Click word, type: Kubernetes 3. Right-click → Add to Vocabulary 4. Use Find & Replace (Cmd/Ctrl + H) to correct remaining instances 5. Future projects will correctly transcribe "Kubernetes"

Speaker Labels and Organization

Multi-speaker content requires proper speaker labeling for clarity and advanced editing features.

Automatic Detection: Descript attempts to detect different speakers and labels them as Speaker 1, Speaker 2, etc. This works best with distinct voice characteristics and good recording technique (separate tracks per speaker).

Manual Assignment: Click any speaker label to rename it (e.g., "Sarah," "John"). Once renamed, click other sections labeled "Speaker 1" and reassign them to "Sarah." Descript learns voice patterns and improves future auto-detection.

Speaker-Based Editing: Once speakers are labeled, you can isolate content by speaker. Right-click speaker name → Select All [Speaker Name]. This selects everything that person said, enabling bulk operations like applying consistent audio processing or removing entire speaker tracks.

Interview Cleanup Workflow:

Use Case: 60-minute interview, remove all interviewer questions to create highlight reel. Process: 1. Label speakers: "Host" and "Guest" 2. Cmd/Ctrl + F, search for "Host:" 3. Review each segment, decide if question adds context 4. Delete interviewer segments that don't add value 5. Result: 60-minute interview becomes 15-minute expert monologue

Using Comments for Collaboration

Comments enable async collaboration and self-review, crucial for client work and team projects.

Adding Comments: Select any text, click the comment icon (or Cmd/Ctrl + Shift + M). Type your note. Comments are timestamp-specific and visible to all project collaborators.

Comment Types: Use comments for different purposes: "Client approval needed here," "Find B-roll of this product," "Audio quality drops - re-record?," "Fact-check this claim," "Consider removing this section."

Resolution Workflow: Once addressed, mark comments as resolved. They disappear from active view but remain in comment history. This creates a clean audit trail of feedback and revisions.

Client Review Process:

Professional workflow for client-facing projects: 1. Create initial edit 2. Add internal comments: "Verify spelling," "B-roll needed," etc. 3. Share project link with client (Share → Invite) 4. Client adds comments on specific sections they want changed 5. You receive notifications for each comment 6. Make edits, reply to comments explaining changes 7. Mark resolved as you address feedback 8. Clear paper trail of all decisions and revisions

Essential Keyboard Shortcuts

Core Navigation Shortcuts

Memorizing these shortcuts transforms your editing speed from methodical to lightning-fast.

  • Space: Play/Pause (works from anywhere in the interface)
  • Cmd/Ctrl + Shift + V: Toggle between Script and Timeline view
  • Cmd/Ctrl + /: Hide/show right panel for more script space
  • Cmd/Ctrl + Shift + G: Show/hide word gaps (silence markers)
  • Cmd/Ctrl + F: Find text
  • Cmd/Ctrl + G: Find next instance
  • Tab: Jump to next edit point or gap
  • Shift + Tab: Jump to previous edit point or gap

Muscle Memory Exercise:

Practice this sequence 10 times: 1. Space (play) 2. Space (pause on word to remove) 3. Double-click word to select 4. Delete 5. Space (play to hear result) 6. Tab (jump to next gap) 7. Repeat Goal: Edit without looking at menus or mouse. This sequence alone handles 70% of podcast editing tasks.

Advanced Editing Shortcuts

These shortcuts handle complex editing operations in seconds.

  • Cmd/Ctrl + Shift + R: Remove selected audio while keeping video (or vice versa)
  • Cmd/Ctrl + D: Duplicate selected clip/layer
  • Cmd/Ctrl + B: Split clip at playhead (timeline view)
  • Cmd/Ctrl + Shift + D: Delete selected and close gap
  • Cmd/Ctrl + Shift + M: Add comment at current position
  • I: Set In point (timeline view)
  • O: Set Out point (timeline view)
  • Option/Alt + Delete: Ripple delete (removes selection and shifts everything after)

Pro Tip - Ripple Delete:

Regular Delete: Removes content, leaves gap (silence) Ripple Delete (Option/Alt + Delete): Removes content AND closes gap automatically When to use each: - Regular Delete: When you want to preserve timing/sync with music - Ripple Delete: When you want content to flow immediately (most cases) This one shortcut saves hours on long projects.

Monetization Opportunities

Video Editing Services Using Text-Based Workflow

The text-based editing skills you've learned in this module directly translate to professional video editing services. The speed advantage of Descript's workflow allows you to deliver faster turnarounds than traditional editors while maintaining competitive pricing or higher margins.

Service Package: Podcast & Interview Editing

Leverage Descript's text-based efficiency to offer rapid-turnaround podcast editing that clients can't get elsewhere.

Package Deliverables:

  • Complete transcript editing: filler word removal, long pause reduction, false start elimination
  • Speaker labeling and organization
  • Audio level balancing and enhancement (covered in Module 4)
  • Export in client's preferred format (MP3, WAV, AAC)
  • Clean transcript file for show notes/SEO
  • Timestamps for key topics/segments

Pricing Structure:

TIER 1 - Basic Podcast Edit - Up to 45 minutes of content - Filler word removal, pause tightening - Speaker labels, basic audio cleanup - Deliverables: Edited MP3 + transcript - Time investment: 1.5-2 hours - Price: $150-200 TIER 2 - Professional Podcast Edit - Up to 90 minutes of content - Everything in Tier 1, plus: - Content restructuring (moving segments for flow) - Detailed timestamp markers - Audiogram clips for social media (3-5 clips) - Time investment: 3-4 hours - Price: $300-400 TIER 3 - Premium Interview Package - Up to 2 hours of content - Everything in Tier 2, plus: - Multiple export versions (full episode + highlight reel) - Custom intro/outro music integration - Advanced audio enhancement - Client revision round included - Time investment: 5-6 hours - Price: $600-800 RETAINER - Weekly Podcast Editing - 4 episodes per month (up to 60 min each) - Tier 2 service level - Priority turnaround (24-48 hours) - Monthly price: $1,000-1,200

Why Clients Pay: Traditional podcast editors charge $100-150/hour and take 3-4 hours for a 60-minute episode. Your text-based workflow completes the same work in 1.5-2 hours, but you can charge based on value (per episode) rather than time. Clients get faster turnaround, you earn higher effective hourly rate ($100-200/hour vs. $25-40/hour).

Service Package: YouTube Content Editing

YouTube creators need fast turnarounds for algorithm consistency but often lack editing skills. Your Descript efficiency enables multi-client management.

Package Deliverables:

  • Raw footage to polished video: remove mistakes, long pauses, repetitions
  • Pacing optimization for viewer retention
  • Basic graphics/text overlays (templates)
  • SRT subtitle file for accessibility
  • Multiple format exports (16:9 full video, 9:16 Shorts, 1:1 Instagram)

Pricing Structure:

Per-Video Pricing: - 10-15 min video: $200-300 - 15-25 min video: $350-500 - 25-40 min video: $500-750 Monthly Retainer (4-8 videos): - 4 videos/month (10-15 min each): $800 - 8 videos/month: $1,400 - Priority editing + revisions included Why retainers work: Consistent income, you batch process similar content, clients get dedicated attention and faster turnaround.

Target Market & Positioning

Ideal Clients:

  • B2B podcasters (business advice, industry interviews) - high budget, value speed
  • Course creators - need fast turnaround for multiple lessons
  • YouTube educational channels - consistent upload schedules
  • Corporate communications teams - internal training videos, webinars
  • Real estate agents - property tours, client testimonials

Your Positioning: "Text-based video editing for 48-hour turnarounds. I use advanced AI-assisted workflows that cut editing time by 70% while maintaining broadcast quality. Perfect for content creators who need consistency without the traditional editor bottleneck."

Finding Clients: Join podcasting Facebook groups, post portfolio samples on Twitter/LinkedIn with #PodcastEditing, cold outreach to YouTube channels under 50K subs (growing but not yet able to hire in-house), list services on Upwork/Fiverr initially to build portfolio and testimonials.

MODULE 2: Advanced Composition & Layers

Master Descript's composition engine, layer management, and visual design system to create professional multi-layered video content with precision control.

Why Composition Mastery Matters

While text-based editing handles content structure, compositions control your visual presentation. This module teaches you to build complex, multi-layered videos with the same efficiency you brought to editing. Master templates, layer organization, and positioning systems to create consistent, professional visual branding across all content while maintaining rapid production speed.

Production Speed

5x Faster

Visual Quality

Broadcast

Reusability

100%

Understanding Compositions and Canvas

What is a Composition?

A composition in Descript is a visual container that holds and organizes all your video elements—video clips, images, text, shapes, and effects. Think of it as a Photoshop document for video: multiple layers stacked vertically, each layer representing a different visual element.

How Compositions Work: When you create a video project, Descript automatically creates a composition. The canvas (center panel) displays this composition. Every element you add—whether recorded video, imported footage, or graphics—becomes a layer within this composition.

Timeline Integration: Compositions exist at specific points in your timeline. You can have multiple compositions in a single project, each with different layouts. For example: intro composition (logo animation), main content composition (picture-in-picture), outro composition (call-to-action screen).

Layer Hierarchy: Layers stack from bottom to top. Bottom layers appear behind, top layers appear in front. This z-axis ordering determines visibility when elements overlap. Understanding this hierarchy is crucial for complex layouts.

Composition Anatomy:

Standard YouTube Video Composition (layered bottom to top): Layer 1 (Bottom): Background image or color Layer 2: Screen recording or main video footage Layer 3: Webcam video (picture-in-picture) Layer 4: Lower third graphic (name/title) Layer 5: Animated logo (corner branding) Layer 6 (Top): Captions/subtitles Each layer can be repositioned, resized, and timed independently while maintaining this visual stacking order.

Canvas Controls and Aspect Ratios

The canvas is your visual workspace where you arrange and preview all composition elements. Proper canvas configuration is essential for platform-specific content.

Accessing Canvas Settings: Click the composition name at the top of the canvas or go to Properties panel (right side). Here you control aspect ratio, resolution, and background properties.

Common Aspect Ratios and When to Use Each:

  • 16:9 (1920x1080): YouTube, Vimeo, horizontal social posts, traditional video. This is the standard widescreen format.
  • 9:16 (1080x1920): Instagram Reels, TikTok, YouTube Shorts, Stories. Vertical mobile-first format.
  • 1:1 (1080x1080): Instagram feed posts, LinkedIn posts, Twitter videos. Square format maximizes mobile screen real estate.
  • 4:5 (1080x1350): Instagram feed (vertical). Taller than square, more screen space than 1:1 without full 9:16.
  • Custom: Any resolution for specific client requirements or unique platforms.

Multi-Platform Workflow:

Scenario: Create one piece of content optimized for YouTube, Instagram, and TikTok. Strategy - Safe Zone Composition: 1. Record/edit in 16:9 (1920x1080) - YouTube native 2. Keep critical elements (face, text, key visuals) in center "safe zone" 3. Safe zone dimensions: 1080x1080 square in center of 16:9 frame 4. After final edit, duplicate project twice 5. Project 1: Keep as 16:9 for YouTube 6. Project 2: Change canvas to 9:16, reposition layers if needed (TikTok/Reels) 7. Project 3: Change canvas to 1:1 (Instagram feed) Result: Three platform-optimized versions from one edit session.

Background Management

Backgrounds set the foundation for your composition's visual identity. Professional content uses intentional backgrounds rather than default settings.

Solid Color Backgrounds: Click canvas, go to Properties → Background → Color. Choose from presets or use hex codes for brand colors. Ideal for: clean professional looks, corporate content, minimalist designs, when content should be the focus.

Gradient Backgrounds: Properties → Background → Gradient. Set start and end colors, adjust angle. Creates depth without distraction. Ideal for: modern aesthetics, tech content, when you need visual interest without competing with foreground elements.

Image Backgrounds: Drag image file onto canvas, then right-click → Send to Back (or drag to bottom layer). Adjust opacity if needed. Ideal for: branded content (company colors/patterns), topical content (relevant imagery), creating environmental context.

Video Backgrounds: Import video file, send to back layer, often applied with blur or opacity reduction. Ideal for: dynamic content, B-roll as environmental context, creating movement in static presentations.

Professional Background Choice Decision Tree:

Ask: "What's the content focus?" INFORMATIONAL/EDUCATIONAL: → Use solid color or subtle gradient → Minimal distraction from teaching content → Example: #F5F5F5 light gray or white BRAND-FOCUSED: → Use brand colors (solid or gradient) → Add subtle brand pattern/texture at low opacity → Example: Company primary color with 10% texture overlay DYNAMIC/ENTERTAINMENT: → Use relevant imagery or video background → Apply 50-70% opacity or blur effect to prevent competition with foreground → Example: Blurred city footage for tech reviews CLIENT WORK: → Always request brand guidelines → Use provided color palette → Ask for background assets (logos, patterns)

Advanced Layer Control

The Layers Panel

The Layers panel (right side, below Properties) is your command center for complex compositions. Efficient layer management separates amateur from professional work.

Viewing Layers: If the Layers panel isn't visible, go to View → Show Layers (or Cmd/Ctrl + Shift + L). The panel lists all layers in your composition from top (foreground) to bottom (background).

Layer Operations:

  • Renaming: Double-click any layer name to rename it. Use descriptive names: "Main_Camera," "ScreenRecording," "Logo_TopRight," not "Layer 1," "Layer 2."
  • Reordering: Drag layers up/down in the panel to change stacking order. Top layers appear in front of bottom layers.
  • Visibility Toggle: Click the eye icon to hide/show layers. Hidden layers remain in the project but don't appear in output.
  • Locking: Click the lock icon to prevent accidental changes. Lock background layers or finalized elements.
  • Grouping: Select multiple layers (Cmd/Ctrl + click), right-click → Group. Groups collapse for cleaner organization.

Professional Layer Naming Convention:

Use this naming pattern for complex projects: [Type]_[Position]_[Description] Examples: - BG_Main (background, main) - Video_Center_TalkingHead (video layer, center positioned) - Video_BottomRight_ScreenRec (video layer, bottom-right position) - Text_Lower_Name (text layer, lower third position) - Image_TopLeft_Logo (image layer, top-left position) - Shape_Full_Overlay (shape layer, full-screen overlay) Benefits: - Instantly understand what each layer is - Layers auto-sort by type when alphabetically organized - Easy to find specific elements in complex compositions (20+ layers) - Professional appearance when sharing projects with clients/team

Layer Positioning and Sizing

Precise positioning creates professional layouts. Descript offers multiple methods for positioning layers, each optimized for different scenarios.

Manual Positioning (Canvas): Click and drag any layer on the canvas to reposition. Handles appear at corners and edges for resizing. Hold Shift while dragging to constrain proportions. This is intuitive but imprecise.

Numeric Positioning (Properties Panel): Select layer, go to Properties → Position & Size. Set exact X/Y coordinates and width/height in pixels. This provides pixel-perfect placement for professional consistency.

Alignment Tools: Select layer, go to Layout menu or right-click → Align. Options: Align Left, Center Horizontal, Align Right, Align Top, Center Vertical, Align Bottom. Use these to snap elements to canvas edges or center.

Preset Positions: Descript offers quick position presets in the Properties panel: Top Left, Top Center, Top Right, Middle Left, Center, Middle Right, Bottom Left, Bottom Center, Bottom Right. One-click positioning for common layouts.

Standard Video Layouts - Exact Positioning:

For 1920x1080 (16:9) Canvas: PICTURE-IN-PICTURE (Bottom-Right): - Main video: Full screen (0, 0, 1920x1080) - Webcam: 384x216 positioned at (1486, 814) - Result: Small webcam in bottom-right with 50px margin SIDE-BY-SIDE INTERVIEW: - Person 1: 960x1080 at (0, 0) - left half - Person 2: 960x1080 at (960, 0) - right half - Result: Perfect 50/50 split TUTORIAL LAYOUT (Screen + Webcam): - Screen recording: 1344x756 at (0, 0) - top 70% - Webcam: 384x324 at (768, 756) - centered below - Result: Main content focus with instructor presence Save these as templates (covered next section) for instant reuse.

Layer Effects and Properties

Layer-specific effects enhance visual quality and professionalism without requiring external editing tools.

Opacity: Select layer → Properties → Opacity slider (0-100%). Reduce opacity for subtle overlays, watermarks, or blended backgrounds. At 0%, layer is invisible (same as hiding). At 50%, layer is semi-transparent.

Corner Radius: Properties → Corner Radius. Adds rounded corners to any layer. Set to 10-20px for subtle softening, 999px for perfect circles. Ideal for: modern aesthetics, softening webcam edges, creating circular profile displays.

Border: Properties → Border. Add colored borders around layers. Set color, width (pixels), and opacity. Use for: emphasizing elements, creating separation from background, branded accent colors.

Shadow: Properties → Shadow. Add drop shadows for depth. Adjust X/Y offset (direction), blur (softness), and opacity. Subtle shadows (2-3px offset, 30% opacity) add professional depth without being obvious.

Rotation: Properties → Rotation. Rotate layer in degrees. Use for: creative compositions, fixing crooked source footage, dynamic transitions.

Professional Webcam Enhancement:

Transform basic webcam to polished presentation layer: 1. Position webcam layer (use preset or custom position) 2. Corner Radius: 20px (softens hard edges) 3. Border: 3px, white (#FFFFFF), 80% opacity 4. Shadow: X:2, Y:2, Blur:8, Opacity:40% 5. Optional: Reduce opacity to 95% for slight background blending Result: Clean, professional webcam presentation that looks intentional rather than default. Before: Raw webcam rectangle After: Polished, dimensional presence with visual hierarchy This takes 30 seconds, instantly elevates production value.

Creating and Using Templates

Template System Overview

Templates are pre-configured compositions that you can reuse across projects. They're the secret to consistent branding and rapid production. Build once, reuse infinitely.

What Templates Store: Templates save your entire composition setup—layer arrangement, positioning, sizing, effects, colors, fonts, and placeholder content. When you apply a template, it recreates this exact setup in your new project.

Template Use Cases: Brand consistency (same layout across all videos), rapid production (new video in minutes, not hours), client work (one template per client for their brand), platform-specific formats (YouTube template, Instagram template, TikTok template), recurring content (podcast intro/outro, weekly show format).

When to Create Templates:

Create a template if you answer YES to any: 1. Will I create 3+ videos with this same layout? 2. Do I need consistent branding across content? 3. Am I creating recurring content (weekly show, podcast series)? 4. Do multiple team members need to use this design? 5. Do I work with clients who need brand consistency? Example: YouTube tutorial series - Same intro screen - Same main layout (screen + webcam position) - Same lower third for your name - Same outro with call-to-action Without template: Rebuild layout every video (20-30 min) With template: Apply template (30 seconds)

Creating Your First Template

Building effective templates requires strategic thinking about what varies versus what stays constant across videos.

Step-by-Step Template Creation:

  1. Build Ideal Composition: Create a composition with your perfect layout. Add all layers: background, positioning boxes for video/images, text placeholders, logo, branding elements.
  2. Use Placeholder Content: For varying content (main video, images), add temporary placeholders or leave empty layers with clear names like "INSERT_MAIN_VIDEO_HERE."
  3. Lock Static Elements: Lock layers that should never change (background, logo position, brand colors). This prevents accidental modification.
  4. Name Everything: Give every layer a descriptive name. Template users need to understand what goes where.
  5. Test the Layout: Add sample content to each placeholder to ensure sizing/positioning works correctly.
  6. Save as Template: Go to File → Save as Template. Give it a descriptive name and category.

YouTube Tutorial Template Example:

TEMPLATE: "YouTube_Tutorial_Standard" Layers (top to bottom): 1. Logo_TopRight (locked): Your channel logo, 150x150px, positioned (1720, 50) 2. Text_Lower_Description: Editable text layer for tutorial topic 3. Video_Center_Webcam: 640x360px, positioned (640, 360) - Replace with webcam footage 4. Video_Fullscreen_Screen: Full canvas - Replace with screen recording 5. BG_Gradient (locked): Brand gradient background Instructions for use: 1. Apply template to new project 2. Replace Video_Fullscreen_Screen with your screen recording 3. Replace Video_Center_Webcam with your webcam footage 4. Edit Text_Lower_Description with your topic 5. Logo and background stay consistent automatically Time savings: 25 minutes per video × 4 videos/month = 100 minutes saved monthly

Applying and Managing Templates

Knowing how to efficiently apply and modify templates accelerates your workflow dramatically.

Applying Templates: Create new project → Click "Start from template" → Select your template → Click "Use Template." Descript creates a new project with all template layers and settings intact.

Updating Templates: Open a project using the template → Make improvements → File → Update Template. This updates the master template, but doesn't affect existing projects (they remain unchanged).

Template Libraries: Organize templates into categories: "YouTube Formats," "Instagram Content," "Client Work," "Podcast Layouts." This organization prevents template overwhelm as your library grows.

Sharing Templates: Export templates (File → Export Template) to share with team members or clients. They can import (File → Import Template) to maintain consistency across distributed teams.

Professional Template Library Structure:

Recommended folder organization: YOUTUBE/ ├── Intro_Standard.template ├── Main_Tutorial_Screenshare.template ├── Main_Talking_Head.template ├── Outro_CTA.template INSTAGRAM/ ├── Reel_9x16_Standard.template ├── Feed_1x1_Quote.template ├── Story_Behind_Scenes.template CLIENTS/ ├── [ClientName]_YouTube.template ├── [ClientName]_Social.template PODCAST/ ├── Intro_Animated.template ├── Main_Interview_Split.template ├── Outro_Sponsors.template BUILD_BLOCKS/ ├── Lower_Third_Standard.template ├── Subscribe_Reminder.template ├── Logo_Watermark.template This structure enables: - Quick template location - Consistent naming across projects - Easy onboarding of team members - Professional client presentations

Advanced Media Layer Techniques

Video Layer Management

Video layers are your primary content carriers. Understanding how to manipulate them efficiently is essential for complex productions.

Adding Video Layers: Drag video files from your computer directly onto the canvas. Alternatively, record directly in Descript (screen + webcam) and layers auto-generate. Each video becomes a separate layer you can position independently.

Trimming Video Layers: In Timeline view, each video layer has independent timing. Drag the edges to trim start/end points. This is crucial for picture-in-picture scenarios where webcam and screen recording need different in/out points.

Fill vs Fit: Properties → Scaling. "Fill" crops video to fill the entire layer space (may crop edges). "Fit" shows the entire video (may show letterboxing). Choose based on content importance.

Crop Tool: Select layer → Properties → Crop. Manually adjust which portions of the video are visible. Use for: removing unwanted screen edges, focusing on specific areas, creating custom aspect ratios from source footage.

Picture-in-Picture Best Practices:

Common scenario: Screen recording as main content + webcam for presenter presence. Setup: 1. Add screen recording - set as full canvas layer (bottom) 2. Add webcam footage - position in corner (top layer) 3. Resize webcam: 25-30% of canvas height 4. Position: Bottom-right with 40-50px margins 5. Add subtle corner radius (15px) and shadow Webcam timing strategy: - Only show webcam during key explanations/introductions - Trim/hide webcam during detailed screen work (less distraction) - Use timeline view to set precise in/out points for webcam layer - Main screen recording plays continuously underneath Result: Dynamic presentation that emphasizes content when needed, personality when it adds value.

Image Layers and Graphics

Static images serve as logos, backgrounds, callouts, product shots, and brand elements. Strategic image layer use elevates production value.

Supported Formats: PNG (with transparency - ideal for logos/graphics), JPEG (no transparency - backgrounds/photos), GIF (animated - use sparingly), SVG (vector - perfect for logos, scales without quality loss).

Transparency Handling: PNG files with transparent backgrounds automatically blend with your composition. This is crucial for logos, brand elements, and callout graphics. Position them over video without white boxes.

Layer Duration: Image layers can span your entire video or appear briefly. In Timeline view, drag image layer edges to control when it appears. Use for: logo watermarks (full duration), product callouts (brief appearance), slide deck images (specific segments).

Professional Logo Watermark Setup:

Task: Add persistent channel logo without distraction Process: 1. Export logo as PNG with transparent background (300x300px minimum) 2. Drag PNG onto canvas 3. Resize to 80-120px (small but recognizable) 4. Position: Top-right or bottom-right corner 5. Add 30-40px margins from edges 6. Reduce opacity to 70-85% (present but not distracting) 7. Optional: Add subtle shadow for separation from video content 8. In timeline, extend layer to full video duration 9. Lock layer to prevent accidental movement Pro tip: Save this as a "Logo Watermark" template. Add to any video in 10 seconds.

Text Layers and Typography

Text layers communicate information, emphasize points, and establish brand identity. Professional text layer usage separates amateur from polished content.

Adding Text: Click "T" in toolbar or Insert → Text. A text box appears on canvas. Type your content, then configure styling in Properties panel.

Typography Controls:

  • Font: Choose from system fonts. For brand consistency, use the same 2-3 fonts across all content.
  • Size: Measured in pixels. Minimum 36px for mobile visibility, 48-72px for emphasis.
  • Weight: Light, Regular, Bold, etc. Use bold for emphasis, regular for body text.
  • Color: Click color picker for exact brand colors (use hex codes for consistency).
  • Alignment: Left, center, right. Center for titles/emphasis, left for body text.
  • Line Height: Space between text lines. 1.2-1.4x for tight layouts, 1.5-2x for readability.

Lower Third Name Card Setup:

Professional speaker identification: LAYER 1 - Background Shape: - Insert → Rectangle - Size: 400px × 100px - Position: Bottom-left, 60px from edges - Color: Brand color with 90% opacity - Corner radius: 8px LAYER 2 - Name Text: - Font: Bold, 32px - Color: White - Position: Inside rectangle, left-aligned with 20px padding - Content: "JOHN SMITH" LAYER 3 - Title Text: - Font: Regular, 24px - Color: White at 80% opacity - Position: Below name, same left alignment - Content: "Marketing Director" Timeline: Display for 5-8 seconds when speaker first appears, then fade out. Save as template: "Lower_Third_Speaker" for instant reuse with different names.

Shape Layers for Design

Shapes create visual structure, backgrounds for text, highlighting areas, and geometric design elements.

Available Shapes: Rectangle, Circle, Line. Access via Insert menu or toolbar. Each shape is a vector layer that scales without quality loss.

Rectangle Uses: Backgrounds for text (lower thirds, title cards), full-screen overlays with transparency (darkening video for text readability), borders and frames, geometric design elements.

Circle Uses: Highlighting specific areas (red circle around UI element), profile frames (circular mask for headshots), bullet points and decorative elements, loading animations.

Line Uses: Underlines for emphasis, separators between sections, arrows and pointers, creating geometric compositions.

Full-Screen Text Slide Creation:

Scenario: Create chapter title card between video segments Setup: 1. Insert → Rectangle 2. Size: Full canvas (1920x1080) 3. Color: Dark blue (#1E3A5F) at 100% opacity 4. Position: 0, 0 (covers entire frame) 5. Add text layer on top: - Font: Bold, 84px - Color: White - Alignment: Center horizontal + vertical - Content: "CHAPTER 2: Advanced Techniques" 6. Optional accent: - Insert → Line - Width: 400px, Height: 4px - Color: Accent color (#4A7BA7) - Position: Below text, centered Timeline: Display for 2-3 seconds, acts as chapter divider Saves as template for consistent chapter breaks throughout video.

Monetization Opportunities

Professional Video Production Services

The composition and layering mastery you've developed enables you to deliver broadcast-quality video production at a fraction of traditional costs. Your template-based workflow allows you to maintain high visual standards while serving multiple clients efficiently.

Service Package: Branded Video Content Production

Companies need consistent, professional video content but lack in-house resources. Your template system delivers their brand consistently across all videos.

Package Deliverables:

  • Custom template creation matching client brand guidelines (colors, fonts, logo placement)
  • Multi-platform templates (YouTube 16:9, Instagram 9:16, LinkedIn 1:1)
  • Complete video production: editing, layering, graphics, text overlays
  • Consistent visual branding across all content
  • Template files for client's internal use (optional upsell)
  • Revision round included

Pricing Structure:

SETUP PHASE - Custom Template Creation One-time fee per client - Brand analysis and template design: $800-1,200 - Includes: 3 master templates (YouTube, Instagram, LinkedIn) - Unlimited revisions until approval - Client owns templates for internal use (optional: +$400) PRODUCTION PHASE - Per Video - Single video (5-15 minutes): $300-500 - Video batch (4 videos/month): $1,000-1,400 - Video series (8+ videos): $1,800-2,400 RETAINER MODEL - Monthly Ongoing - 4 videos/month + template updates: $1,500 - 8 videos/month + template updates: $2,600 - 12 videos/month + priority service: $3,600 Time Investment: - Template creation: 4-6 hours (one-time) - Per video production: 2-3 hours (with template) - Effective hourly rate: $100-150/hour

Why Clients Pay: Traditional video production agencies charge $2,000-5,000 per video with 2-week turnarounds. Your template system delivers equivalent quality in 48-72 hours at 60-70% lower cost. Clients value speed, consistency, and cost-effectiveness over agency prestige for content marketing videos.

Service Package: Social Media Video Optimization

Content creators produce long-form content but lack time/skills to optimize for multiple platforms. You transform one video into platform-specific versions.

Package Deliverables:

  • Receive client's primary video (usually YouTube 16:9 format)
  • Create 3-5 platform-optimized versions with appropriate aspect ratios
  • Add platform-specific elements (Instagram-style captions, TikTok hooks)
  • Adjust layer positioning for each format (face/text in safe zones)
  • Custom thumbnails/cover frames for each platform
  • SRT subtitle files for accessibility

Pricing Structure:

REPURPOSING SERVICE - Per Video Input: One primary video (client provides edited version) Output: Multiple platform-specific versions Basic Package: $200 - 1 source video → 3 optimized versions - Formats: YouTube (16:9), Instagram Feed (1:1), Stories/Reels (9:16) - Basic repositioning and cropping - Turnaround: 48 hours Professional Package: $350 - 1 source video → 5 optimized versions - Formats: YouTube, Instagram Feed, Reels, TikTok, LinkedIn - Custom text overlays per platform - Platform-specific hooks (first 3 seconds optimized) - Thumbnail creation for each version - Turnaround: 72 hours Retainer: 4 videos/month = $1,200 (Professional package per video) Time Investment: 1.5-2 hours per video (with templates) Effective Rate: $100-175/hour

Service Package: Template Design & Licensing

Many creators want professional templates but don't have design skills. You create and sell ready-to-use templates.

Package Deliverables:

  • Pre-designed professional templates for specific niches
  • Complete documentation on using each template
  • Customization instructions (changing colors, fonts, logos)
  • Video tutorial showing template usage
  • Template files in .descript format

Pricing Structure:

TEMPLATE MARKETPLACE MODEL Individual Templates: $29-79 each - YouTube tutorial template pack (intro/main/outro): $49 - Instagram Reel templates (5 variations): $39 - Podcast video templates (interview layouts): $59 - Course content templates (lesson format): $69 Template Bundles: $149-299 - Complete YouTube Creator Bundle: $199 (10 templates) - Social Media Master Pack: $249 (15 templates, all platforms) - Business Content Suite: $299 (20 professional templates) Where to Sell: - Your own website (keep 100% revenue) - Creative Market / Gumroad (exposure, but 10-25% fees) - Direct to clients as add-on service Business Model: - Create template once: 3-4 hours investment - Sell indefinitely with zero marginal cost - Passive income stream alongside service work - One template at $49 × 50 sales = $2,450 (passive) Promotion Strategy: - Post template demos on YouTube/Twitter - Offer one free template for email list building - Create YouTube tutorials using your templates (indirect marketing) - Partner with content creator communities

Target Market & Positioning

Ideal Clients for Production Services:

  • B2B SaaS companies (explainer videos, feature demos, customer testimonials)
  • Online course creators (lesson videos, promotional content)
  • Real estate agencies (property tours, agent introductions)
  • Coaching/consulting businesses (thought leadership content)
  • Corporate training departments (internal training videos)

Ideal Clients for Repurposing Services:

  • YouTubers expanding to Instagram/TikTok
  • Podcasters adding video versions
  • Webinar hosts repurposing recordings
  • Conference speakers sharing presentation videos

Your Positioning: "Template-driven video production for consistent brand presence across platforms. I deliver broadcast-quality videos in 48 hours using proprietary workflows that eliminate the traditional production bottleneck. Perfect for businesses that need professional content at scale without agency timelines or budgets."

MODULE 3: AI Voice & Overdub Mastery

Master Descript's revolutionary Overdub technology for voice cloning, corrections, and AI-powered audio generation that eliminates costly re-recording sessions.

Why Overdub Changes Everything

Overdub is Descript's signature feature—AI voice cloning that generates realistic speech in your voice from typed text. This eliminates the need to re-record for script corrections, updates, or mistakes. One 10-minute training session creates unlimited audio in your voice. This technology transforms content creation economics: what previously required studio time, retakes, and post-production now happens instantly through text editing.

Re-Record Elimination

95%

Setup Time

10 Minutes

Cost Savings

$500+/video

Understanding Overdub Technology

What is Overdub?

Overdub is AI-powered voice synthesis that creates realistic speech from text input. You train the AI on your voice (or a selected voice), then generate new audio by typing. The AI matches tone, cadence, and accent, producing natural-sounding speech indistinguishable from real recordings in most contexts.

Core Use Cases: Fixing mistakes without re-recording (changed a word wrong, mispronounced something), updating content with new information (product names, statistics, dates), removing filler words cleanly (replace "um" with smooth audio), script corrections discovered in post-production, creating consistent narration across multiple videos, multilingual content with accent consistency.

How it Works Technically: The AI analyzes your voice training data to learn your speech patterns, pronunciation tendencies, pitch range, speaking rhythm, and accent characteristics. When you type new text, it generates audio matching these learned patterns. The result sounds like you recorded those exact words naturally.

Real-World Scenario:

Problem: You recorded a 20-minute tutorial mentioning "version 2.3" throughout. The product updated to "version 2.4" before publishing. Traditional Solution: - Re-record all segments mentioning version number - Match audio quality/room tone - Edit back into timeline - Time: 2-3 hours Overdub Solution: 1. Find & Replace: "2.3" → "2.4" 2. Select replacement text 3. Click "Overdub" button 4. Generate in your voice 5. Done Time: 2 minutes Cost Savings: Studio time ($100/hour), editing labor, project delays eliminated.

Creating Your Voice Clone

The voice training process captures your unique vocal characteristics. Quality training data determines output realism.

Training Requirements: Descript requires 10 minutes of clean, clear speech for basic training. For best results, record 30+ minutes across multiple sessions. The AI needs variety: different sentences, emotions, speaking speeds, and contexts to capture your full vocal range.

Recording Environment Setup:

  • Quiet Space: No background noise (AC, traffic, computer fans). Background sounds get baked into your voice model.
  • Consistent Microphone: Use the same mic you'll use for content. Voice characteristics change between microphones.
  • Proper Distance: 6-8 inches from mic. Consistent positioning across all training recordings.
  • Natural Delivery: Speak normally, not in "announcer voice." The AI clones your natural speaking pattern.
  • Emotional Range: Include enthusiastic sections, calm explanations, and conversational tones. Varied training = versatile output.

Optimal Training Session Workflow:

SESSION 1 - Foundation (10 minutes) Read provided Descript training script verbatim Purpose: Captures basic phonetic patterns SESSION 2 - Natural Speech (10 minutes) Explain your typical content topics naturally Example: "Today I'm going to show you how to..." Purpose: Captures your actual delivery style SESSION 3 - Variety (10 minutes) Include questions, exclamations, different paces Mix enthusiastic and calm deliveries Purpose: Expands emotional range Total Time: 30 minutes of training Result: High-quality voice model for professional use Pro Tip: Record training audio in the same environment where you'll create content. Room acoustics become part of your voice model.

Training Process Step-by-Step

Navigate to Account Settings → Overdub → Create New Voice. Follow the guided process.

  1. Name Your Voice: Use descriptive names like "John_Professional" or "Sarah_Casual" if creating multiple versions.
  2. Record or Upload: Either record directly in Descript or upload pre-recorded training files. Upload is better for controlled studio recordings.
  3. Review Recording Quality: Descript analyzes audio quality. If training data has issues (noise, inconsistent levels), you'll receive warnings. Address these before proceeding.
  4. Processing Time: AI training takes 5-30 minutes depending on audio length. You'll receive email notification when complete.
  5. Voice Preview: Test your new voice with sample text before using in projects. This ensures quality meets expectations.

Voice Quality Checklist:

Before approving your voice model, test these phrases: "This is a test of my AI voice clone." "In this tutorial, we'll explore advanced techniques." "Don't forget to subscribe and hit the notification bell." "Let's dive into today's topic." Listen for: ✓ Natural pronunciation ✓ Smooth word transitions ✓ Appropriate pacing ✓ Emotional consistency ✓ No robotic artifacts If any issues: Record additional training data focusing on problematic sounds or patterns.

Stock AI Voices

Descript offers pre-trained professional voices when you don't want to use your own voice or need variety.

When to Use Stock Voices: Character voices for storytelling, professional narration without personal branding, placeholder audio during editing (replace with real voice later), multilingual content, client work where you're behind the scenes, anonymous content creation.

Available Voice Types: Male and female voices, multiple accents (American, British, Australian), various ages and tones (authoritative, friendly, enthusiastic), professional narrator styles, conversational styles.

Stock Voice Limitations: Cannot be customized or retrained, limited emotional range compared to custom voices, same voice used by other Descript users, less authentic for personal brand content.

Stock Voice Use Cases:

APPROPRIATE: - Explainer video voiceover (corporate/neutral) - Character dialogue in animated content - Placeholder audio during client approval process - A/B testing different narrator styles - Quick projects without time for voice training INAPPROPRIATE: - Personal brand YouTube channel (lacks authenticity) - Podcast hosting (listeners expect real personality) - Thought leadership content (undermines credibility) - Content requiring emotional depth Decision: If your face/personality is on camera, use your real voice or custom Overdub. If you're invisible narrator, stock voices work well.

Professional Overdub Application

Word-Level Replacements

The most common Overdub use case—replacing individual words or phrases without disturbing surrounding audio.

Process: In your transcript, select the word(s) to replace. Type the correction directly. Click the Overdub button (or right-click → Overdub Selection). Choose your voice. Generate. The AI creates that word in your voice, matching the surrounding audio's tone and pace.

Best Practices for Natural Replacements:

  • Include surrounding words for context: Replace "the blue button" not just "blue" for better flow
  • Match original sentence structure: Keep similar word count and rhythm
  • Preview before finalizing: Listen to the edit in context
  • Adjust surrounding gaps: Overdub may need slight gap adjustment before/after for natural timing

Common Replacement Scenarios:

SCENARIO 1: Mispronunciation Original: "Let's look at the Kub-er-netes dashboard" Fix: Select "Kub-er-netes" → type "Kubernetes" → Overdub Result: Correct pronunciation in your voice SCENARIO 2: Script Change Original: "Available for ninety-nine dollars" Fix: Select phrase → type "Available for seventy-nine dollars" → Overdub Result: Updated pricing without re-recording SCENARIO 3: Factual Correction Original: "This was released in 2022" Fix: Select "2022" → type "2023" → Overdub Result: Accurate information SCENARIO 4: Removing Verbal Tic Original: "So, um, let me show you, uh, the dashboard" Fix: Delete "um" and "uh" → Overdub smooth transitions if gaps sound unnatural Result: Professional, polished delivery

Sentence and Paragraph Generation

Beyond word replacement, Overdub generates entirely new sentences—perfect for adding content discovered missing during editing.

When to Generate New Content: Forgot to mention key point during recording, need transitional sentences between sections, client requests additional explanation, creating updated version of old content, script expansion after initial recording.

Writing for Overdub: Write the way you naturally speak—use contractions ("don't" not "do not"), include natural pauses with punctuation, avoid complex compound sentences, match your typical speaking vocabulary, consider emphasis (Overdub respects sentence structure for natural flow).

Adding New Content Example:

Situation: Tutorial complete, but realized you didn't explain a prerequisite step. Location: Between introduction and main tutorial Gap: How to install required software Solution: 1. Place cursor at insertion point in transcript 2. Type new content: "Before we begin, make sure you have the software installed. You can download it from the official website. The installation process takes about five minutes." 3. Select the new text 4. Click Overdub 5. Generate in your voice 6. Review for natural flow with surrounding audio 7. Adjust gaps if needed (add 0.5s pause before, 0.3s after) Result: Seamlessly integrated new content without studio session. Time Saved: 30-60 minutes (setup, recording, editing) Cost Saved: $100+ in studio time

Overdub Quality Optimization

Achieving natural-sounding results requires understanding Overdub's strengths and limitations.

What Works Best: Short phrases (under 30 words per generation), natural sentence structure, your typical vocabulary, moderate pacing, standard pronunciation.

What's Challenging: Long complex sentences (over 50 words), extreme emotional range (yelling, whispering), rapid speech or unusual pacing, made-up words or unusual names, heavy regional dialect variations.

Improving Output Quality:

  • Break Long Sentences: Generate in shorter chunks, then combine
  • Add Pronunciation Guides: Use phonetic spelling in your typed text for difficult words
  • Regenerate with Variations: Try different word orders or phrasing if first attempt sounds unnatural
  • Blend with Real Audio: Use Overdub for small corrections within real recordings rather than full synthetic narration

Quality Troubleshooting:

PROBLEM: Overdub sounds robotic SOLUTIONS: - Shorten generated phrase length - Simplify sentence structure - Check if word is in your training vocabulary - Add more training data with similar context PROBLEM: Pacing doesn't match surrounding audio SOLUTIONS: - Manually adjust word gaps in generated section - Rewrite with different rhythm/structure - Break into multiple shorter generations PROBLEM: Pronunciation is incorrect SOLUTIONS: - Use phonetic spelling (e.g., "Koo-ber-net-eez" for Kubernetes) - Record that specific word/phrase in training session - Use alternate word if pronunciation is critical PROBLEM: Doesn't sound like "me" SOLUTIONS: - Record additional training data (more emotional range) - Use voice in contexts similar to training recordings - For extreme emotions, use real recording instead

Ethical Overdub Use

AI voice cloning carries ethical responsibilities. Professional use requires transparent practices.

Your Own Voice Ethics: Using Overdub on your own voice for your own content is ethically clear. You're simply improving efficiency. Disclosure isn't legally required but builds audience trust when appropriate.

Client Work Ethics: When creating content for clients using your voice, inform them you use AI assistance for corrections. Include this in your service agreement. Most clients appreciate the efficiency; transparency prevents future issues.

Third-Party Voices: Never create Overdub voices of other people without explicit written permission. This includes celebrities, colleagues, or clients. Legal and ethical violations can result in serious consequences.

Disclosure Best Practices: For content representing yourself, subtle disclosure maintains authenticity: "Edited using AI-assisted tools," or acknowledgment in video description. For content where voice is critical to message (testimonials, expert opinions), err toward full transparency.

Professional Disclosure Framework:

MINIMAL DISCLOSURE (Appropriate for): - Your own YouTube videos - Your podcast (fixing mistakes) - Your course content - Your marketing materials Method: Video description or about page mention Example: "Content created with assistance from AI editing tools" FULL DISCLOSURE (Required for): - Client testimonials - Expert witness statements - Legal/medical content - Journalistic content Method: On-screen or verbal disclosure Example: "Minor corrections made using AI voice synthesis" NO OVERDUB (Never use for): - Impersonating others - Fraud or deception - Content attributed to someone else - Situations requiring legal authentication Professional Standard: When in doubt, disclose. Transparency protects your reputation and builds trust.

Overdub in Production Workflows

The Script-First Workflow

Advanced creators write scripts, generate complete Overdub narration, then add visuals. This inverts traditional video production.

Traditional Workflow: Write outline → Record video → Edit → Realize script issues → Re-record → Final edit. Time: 8-12 hours for 10-minute video.

Script-First Workflow: Write complete script → Generate Overdub narration → Review/refine script → Add screen recording/visuals → Final polish. Time: 4-6 hours for 10-minute video.

Advantages: Perfect script before any visual work, no re-recording for script improvements, consistent pacing and tone, ability to A/B test different narration approaches, faster iteration during client approval process.

Script-First Production Process:

PHASE 1: Script Development (1-2 hours) - Write complete script in document - Read aloud to identify awkward phrasing - Time script (aim for natural speaking pace) - Revise until perfect PHASE 2: Narration Generation (30 min) - Copy script into Descript project - Generate entire narration with Overdub - Listen to full narration - Revise any unnatural-sounding sections - Adjust pacing with gap controls PHASE 3: Visual Production (2-3 hours) - Record screen/camera footage to match narration - Add B-roll, graphics, text overlays - Sync visuals to existing audio timeline - Much faster than traditional editing PHASE 4: Final Polish (30-60 min) - Fine-tune transitions - Add music/sound effects - Color grade if needed - Export Total Time: 4-6 hours Traditional Method: 8-12 hours Time Savings: 40-50% Ideal For: Tutorial content, explainer videos, course lessons, documentary-style content

Hybrid Recording Approach

Combine real recordings with strategic Overdub usage for optimal quality and efficiency.

When to Use Real Voice: Emotional content (personal stories, motivational speeches), spontaneous reactions and authentic moments, live interviews and conversations, content where "realness" is the value proposition.

When to Use Overdub: Factual corrections (dates, names, statistics), script refinements discovered in editing, filler word replacements, retakes of specific sentences (rather than full sections), updates to existing content.

Hybrid Workflow: Record primary content naturally and authentically. During editing, use Overdub for specific corrections and improvements rather than re-recording entire sections. This preserves authentic energy while eliminating inefficient retakes.

Hybrid Workflow Example - Podcast Episode:

RECORDING PHASE: - Record 60-minute conversation naturally - Don't stop for mistakes or filler words - Focus on authentic dialogue EDITING PHASE - Real Audio: - Keep all substantive conversation - Preserve authentic reactions and energy - Maintain natural back-and-forth flow EDITING PHASE - Overdub Corrections: - Remove/replace verbal tics (um, uh, you know) - Fix mispronunciations of guest names - Correct factual errors mentioned - Smooth awkward transitions - Fill gaps if removing content created confusion RESULT: - Authentic conversation preserved - Professional polish applied - No need for scripted, unnatural retakes - 30-40% time savings vs traditional editing Overdub Usage: 5-10% of total audio Real Recording: 90-95% of total audio Best of Both: Natural + Polished

Content Update Strategy

Overdub enables sustainable evergreen content that stays current without complete re-production.

The Evergreen Problem: Tutorial videos become outdated (UI changes, new features, updated processes). Traditional solution: Complete re-recording. Cost: High. Frequency: Limited.

Overdub Solution: Update only changed sections. Replace outdated sentences with current information. Maintain 90% of original content while keeping video current.

Update Workflow: Review content quarterly, identify outdated sections, rewrite affected sentences with current information, generate new audio with Overdub, replace old segments, re-export video.

Evergreen Content Maintenance:

SCENARIO: Software tutorial video from 2023 OUTDATED ELEMENTS (2024 update needed): - "Click the blue Settings button" (now green) - "Navigate to the Dashboard tab" (now called Overview) - "Premium plan costs $49/month" (now $59/month) - "This feature is in beta" (now fully released) TRADITIONAL UPDATE COST: - Re-record entire 15-minute tutorial: 3-4 hours - Cost: $300-400 in time OVERDUB UPDATE APPROACH: 1. Locate each outdated sentence in transcript 2. Rewrite with current information: - "Click the green Settings button" - "Navigate to the Overview tab" - "Premium plan costs $59/month" - "This feature is fully available" 3. Generate Overdub for each correction (5 min) 4. Replace old audio segments (10 min) 5. Re-export video (5 min) Total Time: 20 minutes Cost: Essentially free BUSINESS IMPACT: - Keep 10 tutorial videos current - Traditional: 30-40 hours annually - Overdub: 3-4 hours annually - Savings: $3,000-4,000 in production time Evergreen content becomes actually sustainable.

Monetization Opportunities

Voice Services and AI-Assisted Production

Overdub mastery enables services impossible with traditional production methods. Your ability to update content instantly, fix mistakes without re-recording, and maintain consistency across projects creates unique value propositions.

Service Package: Evergreen Content Maintenance

Offer ongoing content update services for clients with tutorial/educational content that requires periodic updates.

Package Deliverables:

  • Quarterly content review identifying outdated information
  • Script updates for changed elements
  • Overdub generation matching original voice/tone
  • Seamless audio replacement maintaining quality
  • Re-export of updated videos
  • Change log documenting all updates

Pricing Structure:

SETUP FEE - Initial Onboarding $400-600 per client - Voice analysis and Overdub setup - Content audit and documentation - Update workflow establishment QUARTERLY MAINTENANCE - Per Video Library Small Library (5-10 videos): $500/quarter Medium Library (11-25 videos): $900/quarter Large Library (26-50 videos): $1,500/quarter Enterprise (51+ videos): $2,500/quarter Per-Update Pricing (Alternative): Minor updates (1-3 changes): $80-120 Major updates (4-10 changes): $200-300 Complete refresh (10+ changes): $400-600 Time Investment Per Video: Review: 15 minutes Script updates: 10 minutes Overdub generation: 5 minutes Integration & QA: 10 minutes Total: 40 minutes average Effective Rate: $120-180/hour TARGET CLIENTS: - SaaS companies (product tutorials) - Online course platforms - Training content providers - Technical documentation with video - How-to content creators

Why Clients Pay: Traditional video updates require full re-production ($500-1,500 per video). Your Overdub-based service delivers equivalent results at 60-70% savings. Clients maintain current content without budget/time barriers that previously caused outdated libraries.

Service Package: Voiceover Production

Create professional voiceover content using your custom Overdub voice or stock voices for clients needing narration.

Package Deliverables:

  • Script refinement for natural delivery
  • Overdub voice generation (custom or stock)
  • Audio post-production (levels, clarity, pacing)
  • Delivery in required formats (WAV, MP3, AAC)
  • Revision round included
  • Commercial usage rights

Pricing Structure:

Per-Project Pricing (Based on Duration): Short Form (30-90 seconds): - Explainer videos, ads, social content - Price: $150-250 - Turnaround: 24 hours Medium Form (2-5 minutes): - Product demos, training modules - Price: $300-500 - Turnaround: 48 hours Long Form (6-15 minutes): - Course lessons, documentary narration - Price: $600-1,000 - Turnaround: 72 hours Extended (16-30 minutes): - Audiobook chapters, webinars - Price: $1,200-1,800 - Turnaround: 5 days Bulk Pricing (10+ videos): 20% discount on per-project rates Monthly retainer options available PREMIUM ADD-ONS: - Custom voice training (client's voice): +$600 - Rush delivery (24 hours): +50% - Multiple language versions: +$200 each - Script writing service: +$100-300 Time Investment Example (5-minute video): Script review/editing: 20 min Overdub generation: 10 min Audio polish: 20 min Review & revision: 10 min Total: 60 minutes Effective Rate: $300-500/hour (vs. traditional VO: $100-200/hour)

Service Package: Content Localization

Use Overdub for accent-consistent versions of content across English-speaking markets or as foundation for translation workflows.

Package Deliverables:

  • Script adaptation for target market (terminology, cultural references)
  • Overdub generation in appropriate accent/style
  • Lip-sync adjustment if video includes on-camera talent
  • Complete localized video with adapted narration
  • Quality assurance by native speakers

Pricing Structure:

ACCENT ADAPTATION (English variants): US → UK English: $250-400 per 5-minute video US → Australian English: $250-400 per 5-minute video Process: - Script review for terminology differences - Overdub generation with appropriate accent voice - Integration into existing video - Native speaker QA review Time Investment: 2-3 hours per video FOUNDATION FOR TRANSLATION: Prepare content structure for professional translation services - Clean transcript generation - Timing markers for sync - Cultural adaptation notes - Fee: $150-250 per video Partner with translation services for full localization Your role: Technical production Their role: Translation/cultural adaptation Combined offering: Complete localization service

Positioning & Marketing

Your Unique Value Proposition: "AI-assisted voice production for sustainable content strategies. I eliminate the traditional re-recording bottleneck, enabling you to keep video content current without breaking the budget. Using advanced voice synthesis technology, I deliver updates in days instead of weeks, at a fraction of traditional production costs."

Target Markets:

  • SaaS companies with product tutorial libraries
  • Online course creators (ongoing content updates)
  • Marketing agencies (client explainer videos)
  • Corporate training departments
  • YouTube educational channels (consistency across uploads)

Objection Handling: Some clients may hesitate about AI voices. Position as "AI-assisted" rather than "AI-generated." Emphasize: You're using your real voice, AI just eliminates re-recording, output is indistinguishable from traditional recording, cost savings enable more content/updates, industry-standard technology (used by major media companies).

MODULE 4: Studio Sound & Audio Enhancement

Transform amateur recordings into broadcast-quality audio using Descript's Studio Sound AI and professional audio processing tools.

Why Audio Quality Determines Success

Audiences tolerate average video quality but abandon content with poor audio. Studio Sound is Descript's one-click AI that removes background noise, echo, and room tone while enhancing voice clarity—transforming home recordings into studio-quality audio. This module teaches you to leverage these tools strategically, achieving professional sound without expensive equipment or acoustic treatment.

Audio Improvement

90%+

Processing Time

1 Click

Equipment Cost Saved

$2,000+

Studio Sound Technology

What Studio Sound Does

Studio Sound is AI-powered audio enhancement that analyzes your recording and automatically removes unwanted elements while preserving voice quality. It handles multiple issues simultaneously: background noise (HVAC, traffic, computer fans), room echo and reverb, inconsistent levels, mouth clicks and breathing sounds, low-quality microphone characteristics.

How It Works: The AI was trained on thousands of professional studio recordings paired with their raw source files. It learned to identify the difference between "amateur" and "professional" audio, then applies transformations to make your recording match studio characteristics. This happens in real-time processing with one click.

When Studio Sound Excels: Home office recordings with AC noise, interviews recorded in non-studio environments, podcast recordings with varying microphone quality, screen recordings with computer fan noise, outdoor recordings with wind or ambient sound, content recorded on budget equipment.

Studio Sound Limitations: Cannot fix severely distorted audio (clipping), cannot separate overlapping speakers effectively, cannot recover completely inaudible content, works best on speech—music/sound effects may sound unnatural, extreme processing can introduce slight artifacts.

Studio Sound Decision Framework:

ALWAYS USE Studio Sound for: - Home/office recordings - Interviews in untreated spaces - Recordings with noticeable background noise - Inconsistent audio levels across clips - Budget microphone recordings USE SELECTIVELY for: - Professional studio recordings (may not need it) - Music or sound effects (can sound over-processed) - Intentional ambient sound (atmospheric recordings) NEVER USE for: - Already heavily processed audio - Audio with extreme distortion - Content where room tone is intentional (ASMR, atmospheric) Quick Test: Toggle Studio Sound on/off and listen. If improvement is obvious with no artifacts, use it. If subtle or introduces weirdness, skip it.

Applying Studio Sound

Studio Sound can be applied at the track level or to individual clips. Understanding when to use each approach optimizes results.

Track-Level Application: Select the entire audio track in timeline, click the Studio Sound button in properties panel. This processes all audio in that track consistently. Ideal for: single-speaker recordings, consistent recording environment, podcast monologues, tutorial voiceovers.

Clip-Level Application: Select specific audio clips that need enhancement while leaving others untouched. Right-click clip → Apply Studio Sound. Ideal for: multi-speaker podcasts (different recording quality per person), recordings with mixed environments, interviews where only guest audio needs fixing, selectively enhancing problematic sections.

Intensity Control: Studio Sound offers intensity slider (0-100%). Default is 100% (maximum processing). Reduce to 50-75% if full processing sounds over-processed or unnatural. Higher intensity = more aggressive noise removal but potential for artifacts.

Studio Sound Application Strategy:

SCENARIO 1: Solo Podcast Recording Recording: Consistent environment, one microphone Approach: Track-level Studio Sound at 100% Result: Uniform quality throughout episode SCENARIO 2: Interview with Remote Guest Recording: Your audio is clean, guest audio is poor (home office background noise) Approach: - Leave your track untouched - Apply Studio Sound to guest track only at 100% Result: Balanced quality without over-processing good audio SCENARIO 3: On-Location Interview Recording: Some sections in quiet room, others with street noise Approach: - Identify noisy clips - Apply Studio Sound at 85% to noisy sections only - Leave clean sections natural Result: Smooth overall quality without robotic sound SCENARIO 4: Varied Recording Equipment Recording: Main mic sounds good, backup mic is poor quality Approach: - Studio Sound at 75% on backup mic audio - Studio Sound at 30% on main mic (light polish) Result: Consistent sonic character across sources

Studio Sound Best Practices

Maximizing Studio Sound effectiveness requires strategic application and awareness of its processing characteristics.

  • Apply Early in Workflow: Run Studio Sound before other audio processing (EQ, compression). This provides the cleanest foundation for further refinement.
  • Monitor for Artifacts: Listen carefully at normal playback speed. If voice sounds "underwater" or has digital artifacts, reduce intensity or try clip-level application.
  • Compare Before/After: Toggle Studio Sound on/off multiple times. Sometimes subtle improvements are hard to notice until you hear the original again.
  • Consider Content Type: Aggressive enhancement suits educational content where clarity matters most. Lighter touch preserves authenticity for personal/storytelling content.
  • Save Processing Time: Studio Sound processes audio, which takes time for long recordings. Apply to clips rather than full tracks if only portions need enhancement.

Common Studio Sound Issues & Solutions:

ISSUE: Voice sounds robotic or artificial SOLUTION: Reduce intensity to 60-75%, or apply to shorter clips rather than entire track ISSUE: Background noise still audible SOLUTION: Increase intensity to 100%, or combine with manual noise reduction (covered next section) ISSUE: Processing removed too much voice warmth SOLUTION: Reduce intensity to 40-50%, or use EQ to add back low-end warmth after Studio Sound ISSUE: Inconsistent processing across clips SOLUTION: Apply Studio Sound uniformly across all clips at same intensity, or normalize levels first ISSUE: Studio Sound makes voice sound distant SOLUTION: Reduce intensity, or add presence boost with EQ after processing Pro Tip: For most content, 100% Studio Sound works perfectly. Only reduce if you notice problems. Don't overthink it—the default is usually optimal.

Advanced Audio Processing

Volume and Level Management

Consistent audio levels are essential for professional content. Descript offers multiple tools for volume control.

Clip Volume: Select clip, adjust volume slider in properties panel. Measured in dB (decibels). 0dB is original level, negative values reduce, positive values boost. Use for: balancing multiple speakers, adjusting music bed levels, fixing quiet sections.

Track Volume: Adjust entire track's level uniformly. Affects all clips in that track. Use for: overall mix balance, creating consistent output levels, managing background music tracks.

Normalization: Select audio → Right-click → Normalize. This analyzes peak levels and adjusts to target loudness. Ideal for: balancing interview participants, evening out level variations, preparing audio for specific platforms (podcasts have different standards than YouTube).

Standard Level Settings:

LOUDNESS TARGETS BY PLATFORM: YOUTUBE VIDEO: - Target: -14 LUFS (integrated loudness) - Voice peaks: -6dB to -3dB - Music beds: -20dB to -18dB (under voice) PODCAST (STREAMING): - Target: -16 LUFS - Voice peaks: -6dB to -4dB - More dynamic range acceptable SOCIAL MEDIA (Instagram/TikTok): - Target: -12 LUFS (louder to compete) - Voice peaks: -3dB to -1dB - Aggressive limiting acceptable INTERVIEW MIX RATIOS: - Primary speaker: 0dB (reference level) - Secondary speaker: -2dB to 0dB (match primary) - Background music: -22dB to -18dB (supporting, not competing) - Sound effects: -12dB to -8dB (noticeable but not jarring) Quick Method: 1. Apply Studio Sound to all voice tracks 2. Normalize primary speaker to -16 LUFS 3. Match other speakers to primary by ear 4. Set music bed 20dB below voice level 5. Final limiter at -1dB ceiling (prevents clipping)

EQ (Equalization) Fundamentals

EQ shapes the tonal balance of audio—boosting or cutting specific frequency ranges. Strategic EQ makes voices clearer and more pleasant.

When to Use EQ: After Studio Sound processing, when voice lacks clarity or warmth, to reduce harsh frequencies, to differentiate multiple speakers (each gets slightly different EQ), to compensate for microphone characteristics.

Frequency Ranges Explained:

  • Low (80-250 Hz): Warmth and body. Boost for thin voices, cut for muddy/boomy sound.
  • Low-Mid (250-500 Hz): Fullness. Excessive can sound boxy. Cut slightly if voice sounds muffled.
  • Mid (500-2000 Hz): Voice clarity and presence. This is where speech intelligibility lives.
  • High-Mid (2000-4000 Hz): Articulation and definition. Boost for clarity, cut if harsh.
  • High (4000-8000 Hz): Brightness and air. Boost for crisp sound, cut for softness.

Universal Voice EQ Starting Point:

PODCAST/VOICE EQ PRESET (Apply in this order): 1. HIGH-PASS FILTER at 80 Hz - Removes rumble and low-frequency noise - No musical content below 80 Hz in voice 2. CUT 200-300 Hz by -2 to -3 dB - Reduces muddiness and boxiness - Creates space for clarity 3. BOOST 2000-3000 Hz by +2 to +3 dB - Enhances presence and intelligibility - Makes voice "forward" in mix 4. GENTLE BOOST 8000-10000 Hz by +1 to +2 dB - Adds airiness and polish - Creates professional "sheen" Access EQ in Descript: - Select audio track/clip - Properties panel → Audio Effects → EQ - Use parametric or graphic EQ - Adjust bands as specified above Result: Clear, professional voice that cuts through any mix. Adjust amounts based on individual voice characteristics.

Compression for Consistency

Compression evens out volume variations—loud parts become quieter, quiet parts become more audible. This creates consistent, professional-sounding audio.

Why Compression Matters: Natural speech varies dramatically in volume. Emphasize words spike louder, mumbled phrases drop quiet. Compression narrows this range, making everything audible without constant volume riding.

When to Compress: After Studio Sound and EQ, for content with dynamic speaking (enthusiasm creates volume spikes), to match levels between multiple speakers, for podcast/video where consistent intelligibility matters, before final limiting.

Compression Settings: Select track → Properties → Compression. Ratio controls aggressiveness (3:1 for gentle, 6:1 for obvious, 10:1 for limiting). Threshold determines when compression engages (-20dB for constant compression, -10dB for peaks only). Attack/Release control how fast compression responds (medium settings work for most voice).

Voice Compression Recipe:

STANDARD PODCAST COMPRESSION: - Ratio: 4:1 (moderate compression) - Threshold: -18dB (catches most dynamics) - Attack: 10ms (fast enough to catch peaks) - Release: 100ms (natural recovery) - Makeup Gain: Auto or +2 to +4dB Result: Smooth, consistent voice without pumping or artifacts AGGRESSIVE COMPRESSION (Energetic content): - Ratio: 6:1 - Threshold: -15dB - Attack: 5ms - Release: 50ms - Makeup Gain: +4 to +6dB Result: Very consistent, broadcast-style sound LIGHT COMPRESSION (Natural, authentic feel): - Ratio: 2:1 - Threshold: -24dB - Attack: 20ms - Release: 150ms - Makeup Gain: +1 to +2dB Result: Subtle smoothing, preserves natural dynamics Warning: Over-compression sounds lifeless and fatiguing. Start conservative, add more only if needed. Less is often more.

Limiting and Final Polish

Limiting is the final safety net—preventing distortion while maximizing loudness. It's the last process before export.

What Limiting Does: Acts as aggressive compression that absolutely prevents audio from exceeding a set ceiling. Think of it as a brick wall—nothing gets through. Use for: preventing clipping (distortion), achieving platform loudness standards, catching unexpected peaks, final quality control.

Limiter Settings: Ceiling: -1dB to -0.5dB (prevents digital clipping with small safety margin). This is the absolute maximum level. Everything above this ceiling gets compressed down. Apply to master output, not individual tracks.

Complete Audio Chain (Order Matters!):

AUDIO PROCESSING ORDER: STAGE 1: Cleanup - Studio Sound (removes noise, improves quality) - Manual noise reduction if needed STAGE 2: Tonal Shaping - EQ (frequency balance, clarity enhancement) STAGE 3: Dynamics Control - Compression (evening out volume variations) STAGE 4: Level Matching - Normalize or adjust volumes to target loudness STAGE 5: Final Safety - Limiter at -1dB ceiling (prevents clipping) WHY THIS ORDER? - Clean signal first (Studio Sound) - Shape tone before dynamics (EQ before compression) - Control dynamics before limiting (compression before limiter) - Limiter catches any remaining peaks Common Mistake: Applying processes in wrong order reduces quality. For example, compressing before EQ amplifies frequency imbalances. Limiting before compression wastes limiting headroom. Descript applies some of this automatically, but understanding the chain helps when manual intervention is needed.

Advanced Audio Scenarios

Music and Background Audio

Background music enhances mood but must be balanced carefully to avoid competing with speech.

Ducking: Automatically lowers music volume when speech is present, raises it during silence. This maintains musical presence without interference. Enable: Select music track → Properties → Ducking → Choose voice track to follow.

Manual Music Mixing: Set base music level at -20dB to -18dB below voice. During intro/outro without speech, boost music to -6dB to -3dB for impact. Use fade in/out for smooth transitions (2-3 seconds typical).

Music Selection Guidelines: Instrumental only (vocals compete with speech), consistent energy level (avoid dynamic builds during speech), appropriate mood (match content tone), royalty-free or licensed (avoid copyright issues).

Background Music Workflow:

STEP 1: Import Music - Add music track below voice track in timeline - Position to start before first speech STEP 2: Set Base Level - Reduce music volume to -20dB - Should be noticeable but not distracting during speech STEP 3: Intro/Outro Boost - Sections without speech: boost music to -6dB - Use volume automation or keyframes - 2-second fade transitions between levels STEP 4: Enable Ducking (Optional) - Properties → Ducking → Select voice track - Set duck amount: -8dB to -12dB - Attack: 0.3s (how fast it ducks) - Release: 1s (how fast it recovers) STEP 5: Final Check - Play full content - Voice should always be clearly intelligible - Music should enhance, not compete Pro Tip: Better to have music too quiet than too loud. Audience complaints about inaudible voice far outweigh complaints about soft music.

Multi-Speaker Balance

Interviews and conversations require careful level matching between speakers for professional results.

The Reference Speaker Method: Choose the speaker with best audio quality as reference at 0dB. Match all other speakers to this reference by ear. Slight differences are acceptable (humans don't speak at identical volumes), but perception should be balanced.

Compensation Strategies: If one speaker is significantly quieter after level matching, apply additional compression to that track only. This brings up quiet parts while controlling peaks. If one speaker sounds thin compared to others, use EQ to add low-end warmth matching the fuller speaker.

Two-Person Interview Balance:

SCENARIO: Podcast host + guest, recorded separately STEP 1: Individual Processing - Apply Studio Sound to each track (100%) - Apply voice EQ preset to both - Compress each track (4:1 ratio, -18dB threshold) STEP 2: Level Matching - Solo host track, normalize to -16 LUFS - Solo guest track, match by ear to host level - Fine-tune: Guest should sound equally present STEP 3: Tonal Matching - If guest sounds thinner: boost 150-250 Hz by 2dB - If host sounds harsh: cut 3000-4000 Hz by 1-2dB - Goal: Cohesive sonic character STEP 4: Final Balance - Play full conversation - Both speakers should sound like same recording session - No jarring level changes when switching speakers STEP 5: Master Processing - Apply limiter to master output (-1dB ceiling) - Final normalization to platform target Result: Professional, balanced conversation that sounds like a single cohesive recording despite separate sources.

Fixing Problem Audio

Sometimes audio has specific issues that require targeted solutions beyond Studio Sound.

Room Echo/Reverb: Studio Sound handles moderate reverb. For extreme cases: use additional de-reverb plugin, or re-record in better environment if critical content.

Plosives (P-Pops): Harsh "p" and "b" sounds. Solution: Use de-esser targeting low frequencies (80-150 Hz), or manually reduce volume of specific plosive instances, or high-pass filter more aggressively (100 Hz instead of 80 Hz).

Sibilance (Harsh S Sounds): Overly sharp "s" sounds. Solution: De-esser targeting 6000-8000 Hz, or EQ cut at sibilant frequencies.

Clipping/Distortion: Audio recorded too loud, causing digital distortion. Studio Sound cannot fix true clipping. Prevention is critical—monitor levels during recording, keep peaks below -6dB. Distorted audio requires re-recording.

Problem Audio Decision Tree:

ISSUE: Background noise TRY: Studio Sound at 100% IF STILL PRESENT: Noise reduction plugin (manual frequency targeting) LAST RESORT: Re-record ISSUE: Echo/reverb TRY: Studio Sound at 100% IF STILL PRESENT: Additional de-reverb effect LAST RESORT: Record in treated space ISSUE: Harsh sibilance TRY: De-esser plugin (6-8kHz, gentle reduction) ALTERNATIVE: EQ cut at 7kHz by -2 to -3dB ISSUE: Plosive pops TRY: High-pass filter at 100 Hz + de-esser at low freq ALTERNATIVE: Manual volume reduction on each plosive instance ISSUE: Mouth clicks/smacking TRY: Studio Sound often handles this ALTERNATIVE: Manual editing (remove click silences) ISSUE: Clipping/distortion TRY: Nothing can fix true clipping SOLUTION: Re-record with proper levels ISSUE: Distant/thin sound TRY: EQ boost 150-250 Hz (warmth) + 2-4 kHz (presence) TRY: Slight compression (brings up quiet parts) Prevention >>> Correction Always aim for clean source audio. No amount of processing can fix fundamentally bad recordings.

Monetization Opportunities

Audio Enhancement Services

Audio mastery differentiates professional services from amateur offerings. Your ability to transform poor recordings into broadcast quality enables premium service tiers and allows you to accept projects others must decline.

Service Package: Audio Rescue & Enhancement

Many clients have valuable content recorded in suboptimal conditions. They need audio rescue services to salvage otherwise unusable material.

Package Deliverables:

  • Complete audio analysis and problem identification
  • Studio Sound AI processing
  • Manual noise reduction for specific issues
  • EQ and compression for professional tone
  • Level balancing across entire project
  • Before/after audio samples for client approval

Pricing Structure:

AUDIO RESCUE PRICING (Per Hour of Content): Tier 1 - Standard Enhancement: $150-200/hour - Studio Sound processing - Basic EQ and compression - Level normalization - Suitable for: Good source audio needing polish Tier 2 - Problem Audio Correction: $250-350/hour - Everything in Tier 1, plus: - Manual noise reduction - Targeted frequency problem solving - Multi-track balancing - Suitable for: Home recordings, untreated spaces Tier 3 - Extreme Audio Rescue: $400-500/hour - Everything in Tier 2, plus: - Multiple processing passes - Clip-by-clip problem solving - Custom EQ per speaker/section - Suitable for: Conference recordings, poor equipment, extreme noise Minimum Project: $200 (covers projects under 90 minutes) Rush Service: +50% (24-hour turnaround) Standard Turnaround: 48-72 hours TARGET CLIENTS: - Conference organizers (recording quality varies) - Authors (audiobook recordings from home studios) - Businesses (internal training videos, poor AV setup) - Content creators (salvaging old content for repurposing)

Service Package: Podcast Mastering

Professional podcast mastering ensures consistent quality across episodes and competitive loudness with major shows.

Package Deliverables:

  • Multi-speaker level balancing
  • Consistent processing across all episodes
  • Platform-optimized loudness (-16 LUFS standard)
  • Final limiting and quality control
  • Multiple format exports (MP3, WAV)
  • Ongoing mastering preset for series consistency

Pricing Structure:

PER-EPISODE PRICING: - Up to 45 minutes: $75-100 - 46-90 minutes: $125-175 - 91-120 minutes: $175-225 MONTHLY RETAINER: - 4 episodes/month (up to 60 min each): $400 - 8 episodes/month: $700 - 12 episodes/month: $900 SETUP FEE (One-time): $200-300 per podcast - Custom processing chain creation - Host/guest voice profiling - Mastering preset development WHAT'S INCLUDED: - Complete audio enhancement workflow - Multi-speaker balancing - Music bed integration (if provided) - Platform-optimized output - 1 revision round per episode TIME INVESTMENT PER EPISODE: - Setup and import: 5 min - Processing application: 15 min - Review and fine-tuning: 15 min - Export and delivery: 5 min Total: 40 minutes per episode EFFECTIVE RATE: $112-300/hour VALUE PROPOSITION: "Professional podcast mastering delivering consistent, broadcast-quality audio that competes with major shows. Your episodes will sound polished and professional on any platform."

Positioning & Client Acquisition

Your Positioning: "Audio enhancement specialist using AI-assisted workflows and professional mastering techniques. I transform home recordings into studio-quality audio, enabling content creators to maintain professional standards without expensive equipment or facilities."

Finding Clients:

  • Reach out to podcasters with inconsistent audio quality (easy to identify by listening)
  • Offer free audio enhancement sample for first 5 minutes of their show
  • Join podcasting communities and offer expertise in discussion threads
  • Create before/after audio samples showcasing your capabilities
  • Partner with video editors who lack audio expertise (referral network)
  • List services on podcast-specific marketplaces and directories

Portfolio Building: Offer discounted services to 3-5 podcasts in exchange for testimonials and before/after rights. These case studies demonstrate tangible value and attract premium clients.

MODULE 5: Screen Recording & Collaboration

Master Descript's screen recording capabilities and collaboration features for seamless team workflows and client communication.

Why Integrated Recording and Collaboration Matter

Descript combines recording, editing, and collaboration in one platform—eliminating the fragmented workflow of separate tools. Record screen and webcam simultaneously, edit instantly with text-based tools, and collaborate with clients/team members without file transfers. This integration reduces production time by 60% and streamlines approval processes from days to hours.

Workflow Integration

All-in-One

Approval Speed

10x Faster

Tool Consolidation

5 → 1

Professional Screen Recording

Screen Recording Setup

Descript's screen recorder captures screen, webcam, and microphone simultaneously—ideal for tutorials, demos, and presentations. Unlike standalone screen recorders, recordings immediately become editable projects.

Starting a Recording: File → New Recording (or Cmd/Ctrl + Shift + R). Choose recording sources: screen only, screen + webcam, webcam only, or audio only. Select microphone and audio sources. Configure screen area (full screen, window, or custom area).

Recording Source Options:

  • Full Screen: Captures entire display. Use for: software demos, presentations, anything requiring full context.
  • Window: Captures specific application window. Use for: focused tutorials, when other windows contain sensitive info.
  • Custom Area: Define specific screen region. Use for: hiding taskbars/menus, focusing on specific UI areas.
  • Webcam: Simultaneously records your face. Position and size adjustable post-recording.

Optimal Recording Configuration:

SOFTWARE TUTORIAL: - Screen: Full screen (or specific window) - Webcam: Yes (bottom-right corner for presenter presence) - Microphone: Yes (narration) - System Audio: Optional (if demonstrating audio features) - Resolution: 1920x1080 minimum PRESENTATION RECORDING: - Screen: Full screen - Webcam: Yes (picture-in-picture) - Microphone: Yes - System Audio: No (unless playing media) REMOTE INTERVIEW: - Screen: Not needed (or show interviewer's window) - Webcam: Yes (full frame) - Microphone: Yes (high quality) - System Audio: Yes (capture remote audio) Pro Tip: Always do a 10-second test recording before important sessions. Check all sources are active and quality is acceptable.

Recording Best Practices

Quality recordings start with proper preparation and technique, not post-production fixes.

Pre-Recording Checklist:

  • Close Unnecessary Apps: Prevent notification popups and reduce CPU load for smooth recording.
  • Hide Personal Info: Close email, Slack, anything with private information visible.
  • Prepare Screen: Open applications you'll demonstrate, arrange windows optimally.
  • Clear Desktop: Cluttered desktops appear unprofessional. Use clean background or folder organization.
  • Zoom Appropriately: Text should be readable. Increase browser/app zoom to 125-150% if needed.
  • Disable Notifications: Turn on Do Not Disturb mode to prevent interruptions.

Professional Screen Recording Workflow:

PREPARATION (5 minutes): 1. Close all non-essential applications 2. Clear desktop or set clean wallpaper 3. Prepare browser tabs/windows in advance 4. Test microphone levels (speak at normal volume) 5. Position webcam if using (eye level, well-lit) 6. Enable Do Not Disturb mode RECORDING TECHNIQUE: - Speak clearly and at moderate pace - Narrate actions: "Now I'm clicking the Settings button" - Pause between major steps (easier to edit later) - Don't worry about mistakes—you'll edit in Descript - Move mouse deliberately (not erratically) - Highlight UI elements you're discussing with cursor POST-RECORDING: - Recording automatically opens in Descript as editable project - Transcription begins immediately - Start editing while transcription processes - Remove mistakes, pauses, filler words with text editing Time Saved vs Traditional Workflow: - No separate transcription tool needed - No file imports/syncing required - Edit while transcription happens - Instant project setup Result: Record → Edit → Publish in single environment

Screen Recording + Webcam Layout

When recording screen and webcam simultaneously, Descript creates separate layers you can position independently.

Default Behavior: Screen recording becomes full-frame layer. Webcam appears as picture-in-picture in corner. Both layers are fully editable—reposition, resize, or hide as needed.

Common Layout Patterns:

  • Corner Webcam: Screen full-frame, webcam small in corner (bottom-right typical). Use for: software tutorials where screen is primary focus.
  • Side-by-Side: Screen takes 70% width, webcam 30% on side. Use for: presentations, interviews, when presenter presence is important.
  • Webcam-Focused: Webcam full-frame, screen as small reference. Use for: reaction videos, commentary content.
  • Dynamic Switching: Toggle between layouts during video. Use for: varied content needs within single recording.

Post-Recording Layout Adjustment:

SCENARIO: Recorded tutorial with webcam in default corner position, want to improve layout. STEP 1: Access Composition - Recording creates composition automatically - Open Layers panel (right side) - See: ScreenRecording layer + Webcam layer STEP 2: Reposition Webcam - Select Webcam layer - Resize: 320x180 pixels (for 1920x1080 canvas) - Position: Bottom-right with 40px margins - Add corner radius: 12px (soften edges) - Add subtle shadow for depth STEP 3: Create Sections Without Webcam - Some sections don't need presenter (detailed screen work) - Timeline view: Trim webcam layer for those sections - Screen recording continues, webcam disappears - Professional focus on content when appropriate STEP 4: Add Intro/Outro with Full Webcam - Intro: Webcam full-frame (personal introduction) - Main content: Picture-in-picture layout - Outro: Webcam full-frame (call-to-action) Result: Dynamic, professional presentation that emphasizes appropriate elements throughout video.

System Audio Capture

Recording system audio captures sounds playing on your computer—essential for demos involving audio playback, video reviews, or music production tutorials.

When to Enable System Audio: Demonstrating audio features (music apps, sound design), reviewing videos/podcasts (capture their audio), recording webinars/presentations (capture presenter audio), game recordings (capture game sound).

When to Disable System Audio: General tutorials (unnecessary), when you want only your narration (cleaner audio), recording in noisy environment (system audio picks up notification sounds).

System Audio as Separate Track: Descript records system audio on dedicated track, independent from microphone. This enables individual volume control and processing—critical for balancing narration over demonstration audio.

System Audio Mixing:

SCENARIO: Tutorial demonstrating audio software with playback examples RECORDING SETUP: - Microphone: Your narration - System Audio: Software's output - Both record to separate tracks POST-RECORDING MIX: - Microphone Track: Primary (0dB reference) - System Audio Track: -15dB to -20dB (supporting, not competing) DUCKING APPLICATION: - Enable ducking on System Audio track - Follow: Microphone track - When you speak, system audio automatically reduces - When you pause, system audio comes forward RESULT: - Clear narration always intelligible - Demonstration audio present but not overwhelming - Professional balance without manual volume riding Alternative Approach (Manual): - Reduce system audio globally to -18dB - Boost to -6dB during sections you want audience to hear audio clearly - Return to -18dB when resuming narration

Team Collaboration Workflows

Sharing and Permissions

Descript's collaboration features enable real-time teamwork without file exports, version confusion, or email chains.

Sharing Methods: Click Share button (top-right) to access sharing options. Can Share (edit permissions), View Only (review without editing), or Publish (public link). Each method serves different collaboration needs.

Permission Levels Explained:

  • Owner: Full control. Can edit, delete, share, manage permissions. Usually the project creator.
  • Can Edit: Full editing access. Can modify content, add comments, export. Cannot delete project or change permissions.
  • Can Comment: Read-only with commenting. Can view, play, add comments. Cannot edit content. Ideal for clients/stakeholders.
  • Can View: Read-only without commenting. Can view and play only. Useful for final review before publishing.

Collaboration Strategy by Role:

CLIENT PROJECTS: Phase 1 - Draft Stage: - Project Owner: You (full control) - Team Members: Can Edit (editors, designers) - Client: No access yet (work in progress) Phase 2 - Client Review: - Project Owner: You - Team Members: Can Edit - Client: Can Comment (provides feedback) - Stakeholders: Can View (awareness, no input needed) Phase 3 - Revisions: - Apply client comments - Mark comments as resolved - Client maintains Can Comment access - Update until approval Phase 4 - Final Approval: - Client: Can View (final review) - All comments resolved - Approval documented in comments Phase 5 - Delivery: - Export final files - Optional: Publish (public link for client sharing) SECURITY NOTE: - Never give "Can Edit" to clients (prevents accidental changes) - Use "Can Comment" for all external feedback - Revoke access after project completion if needed

Comments for Feedback

Comments enable precise, timestamped feedback without back-and-forth emails or vague notes.

Adding Comments: Select text in transcript or click specific timestamp in timeline. Click comment icon or press Cmd/Ctrl + Shift + M. Type your comment. Comment appears at exact location for context.

Comment Types and Uses:

  • Edit Notes: "Remove this section" or "Rewrite for clarity"
  • Questions: "Is this data current?" or "Can we include B-roll here?"
  • Approvals: "This section looks great" or "Approved for final"
  • Technical Issues: "Audio quality drops here" or "Screen too small to read"
  • Content Requests: "Add transition here" or "Include link in description"

Professional Commenting Workflow:

CLIENT REVIEW PROCESS: STEP 1: Prepare for Review - Complete initial edit - Add internal comments for team: "TODO: Add B-roll" - Resolve internal comments before client sees project - Share with client (Can Comment permission) STEP 2: Client Reviews - Client watches/reads through project - Adds timestamped comments at specific points - Examples: - "Can we emphasize the ROI here?" - "Remove mention of old product name" - "This transition feels abrupt" STEP 3: Implement Feedback - You receive notification for each comment - Address each comment systematically - Reply to comment explaining changes made - Mark resolved when complete STEP 4: Second Review (if needed) - Client reviews updates - Comments on any remaining issues - Approves via comment: "All changes look good, approved" BENEFITS vs EMAIL FEEDBACK: - No confusion about which timestamp/section - Complete feedback history preserved - Visual reference (they see exact moment) - No version control issues - Faster iteration cycles Time Savings: 2-3 revision cycles in 2 days vs 1-2 weeks with email feedback

Version History and Recovery

Descript automatically tracks all project changes. Access previous versions at any time without manual saving or file duplication.

Accessing Versions: File → Version History. See timeline of all changes with timestamps and author attribution. Preview any version. Restore previous version if needed.

When Version History Saves You: Client changes mind after you've made extensive edits (revert to earlier state), accidental deletion of content (recover from before deletion), testing different edits (revert if new approach doesn't work), tracking who made specific changes (team accountability).

Version History Use Cases:

SCENARIO 1: Client Changes Direction - You edit based on initial feedback - Client: "Actually, I prefer the original flow" - Solution: File → Version History → Select version from yesterday → Restore - Result: Back to original in 30 seconds vs hours of re-editing SCENARIO 2: Experimental Editing - Current version is good - Want to try alternative structure - Make experimental changes freely - If experiment fails: revert to last good version - If experiment succeeds: keep new version SCENARIO 3: Team Collaboration Issue - Multiple editors working - Someone accidentally deletes section - Version History shows: John Doe deleted section at 2:47 PM - Restore version from 2:45 PM - Recovered without blame or lost work SCENARIO 4: Audit Trail - Client questions: "When did this change happen?" - Version History provides exact timestamp and author - Professional accountability and transparency Best Practice: Review Version History before major structural changes. Having restore point provides confidence to experiment.

Publishing and Public Sharing

Publish creates shareable link for public viewing—ideal for client presentations, portfolio pieces, or getting feedback from non-Descript users.

Publishing Options: Click Publish button. Choose what viewers can see: video only, video + transcript, allow downloading, enable comments. Generate unique link. Update published version anytime without changing link.

Published Link Use Cases: Client final review (send link instead of file), portfolio samples (showcase work publicly), stakeholder presentations (no login required), social proof (embed on website), team updates (share progress without granting edit access).

Publishing Strategy:

INTERNAL REVIEW: - Use: Share with Can Comment permission - Why: Full platform features, controlled access - Who: Team members, direct clients EXTERNAL STAKEHOLDER REVIEW: - Use: Published link with comments disabled - Why: No login required, professional appearance - Who: Client's stakeholders, executives who shouldn't edit PORTFOLIO SHOWCASE: - Use: Published link with transcript visible - Why: Demonstrates both video and editing capabilities - Who: Potential clients, job applications CLIENT FINAL DELIVERY: - Use: Published link + exported file - Why: Client can share internally without Descript access - Who: Client team, their stakeholders SECURITY CONSIDERATIONS: - Published links are semi-public (anyone with link can view) - Don't publish sensitive/confidential content - Can unpublish anytime to revoke access - Password protection available for sensitive shares Update Strategy: - Published links update when you modify project - Make changes → Republish → Same link, new content - No need to send new links after revisions

Production Workflow Integration

Record-to-Publish Workflow

Descript's integrated environment enables record-to-publish workflows that are impossible with fragmented tool chains.

Traditional Fragmented Workflow: Record (OBS/ScreenFlow) → Transfer files → Import to editor → Sync audio → Edit → Export → Send for review → Email feedback → Re-import → Make changes → Re-export → Final delivery. Time: 8-12 hours for 10-minute video.

Descript Integrated Workflow: Record → Auto-transcribe → Edit in same interface → Share link for review → Apply feedback in same project → Export final. Time: 3-5 hours for 10-minute video.

Optimized Tutorial Production:

PHASE 1: RECORDING (15 minutes) - Launch Descript recording - Record screen + webcam + narration simultaneously - Don't worry about mistakes (will edit) - Recording automatically becomes project PHASE 2: IMMEDIATE EDITING (45 minutes) - Transcription processes automatically - Edit while transcription completes - Remove mistakes by deleting text - Tighten pacing with gap reduction - Reposition webcam layer if needed - Add lower thirds or graphics PHASE 3: AUDIO ENHANCEMENT (10 minutes) - Apply Studio Sound (one click) - Adjust levels if needed - Add background music PHASE 4: CLIENT REVIEW (async) - Share project with Can Comment - Client reviews on their schedule - Adds timestamped feedback - You receive notifications PHASE 5: REVISIONS (30 minutes) - Address each comment - Mark resolved - Client reviews again if needed PHASE 6: FINAL EXPORT (5 minutes) - Export optimized for platform - Multiple formats if needed - Upload directly or deliver file TOTAL TIME: ~2 hours active work + async client review vs TRADITIONAL: 8-12 hours active work + days of email back-forth 60-75% time reduction

Multi-Platform Content Strategy

Create once in Descript, optimize for multiple platforms without separate projects or tool chains.

Single-Source Strategy: Record/edit master version in Descript. Duplicate project for each platform. Adjust canvas size and layer positioning per platform. Export all versions from same source material.

Multi-Platform Production Workflow:

MASTER PROJECT: YouTube (16:9, 1920x1080) - Record/edit complete content - Full-length format (10-15 minutes) - Comprehensive coverage of topic PLATFORM ADAPTATIONS: YouTube Shorts (9:16, 1080x1920): - Duplicate master project - Change canvas to 9:16 - Reposition layers for vertical - Extract 60-second highlight segment - Export Instagram Reels (9:16, 1080x1920): - Use same 9:16 project as Shorts - Trim to 30-60 seconds - Add text overlays (Instagram style) - Export TikTok (9:16, 1080x1920): - Use 9:16 project - Add hook in first 3 seconds - 15-60 second edit - Export LinkedIn (1:1, 1080x1080): - Duplicate master project - Change canvas to 1:1 - Adjust composition for square - 2-3 minute professional cut - Export Twitter/X (16:9, 1280x720): - Use master project - Export lower resolution - 2-3 minute highlight version RESULT FROM ONE RECORDING SESSION: - 1 full YouTube video - 1 YouTube Short - 1 Instagram Reel - 1 TikTok video - 1 LinkedIn post - 1 Twitter video Traditional Approach: Create each separately (6x work) Descript Approach: Create once, adapt efficiently (20% additional time for all platforms)

Monetization Opportunities

Collaboration-Enabled Services

Descript's collaboration features enable service offerings impossible with traditional tools. Your ability to provide real-time collaboration, instant feedback loops, and transparent workflows creates premium value for clients.

Service Package: White-Label Video Production

Agencies and businesses need video content but lack in-house expertise. Your collaborative workflow enables seamless client partnership.

Package Deliverables:

  • Complete video production from concept to final delivery
  • Client collaboration via shared Descript projects
  • Real-time feedback and revision cycles
  • Multi-platform optimization (YouTube, social media, website)
  • Transparent process (clients see work in progress)
  • Fast turnaround enabled by collaboration efficiency

Pricing Structure:

PROJECT-BASED PRICING: Tutorial/Educational Video (5-10 min): - Recording and editing - Client collaboration (2 revision rounds) - Multi-platform exports (YouTube + 2 social platforms) - Price: $800-1,200 - Turnaround: 5-7 days Product Demo (3-5 min): - Professional screen recording - Script development - Enhanced audio/visuals - Client review via Descript - Price: $600-900 - Turnaround: 3-5 days Testimonial/Case Study (2-3 min): - Interview recording/editing - B-roll integration - Multi-platform versions - Price: $500-750 - Turnaround: 3-5 days RETAINER MODEL: 4 videos/month (mixed types): $2,500-3,500 8 videos/month: $4,500-6,000 Priority turnaround included VALUE PROPOSITION: "Transparent video production with client collaboration at every step. Review work in progress, provide timestamped feedback, and see revisions instantly—all in one platform. No email confusion, no version control nightmares."

Service Package: Done-With-You Video Creation

Teach clients to record their own content while you handle editing and post-production. Collaboration features make this hybrid model efficient.

Package Deliverables:

  • Recording setup consultation and training
  • Client records raw footage (screen/webcam)
  • You import to Descript and edit professionally
  • Client reviews via shared project
  • Final polish and delivery

Pricing Structure:

SETUP PHASE (One-time): $300-500 - Recording equipment consultation - Descript training (1-hour session) - Template creation for their brand - Recording best practices guide MONTHLY PRODUCTION: Per Video: $200-350 - Client records and shares raw footage - You edit, enhance audio, add graphics - Client reviews in Descript - You finalize and deliver Monthly Retainer: $800-1,500 - 4-6 videos/month - Ongoing support and consultation - Template updates as needed TIME INVESTMENT PER VIDEO: - Client records: 30-45 min (their time) - Your editing: 60-90 min - Client review: async (their time) - Your revisions: 15-30 min EFFECTIVE RATE: $150-200/hour TARGET CLIENTS: - Solopreneurs (personal brand content) - Small business owners (marketing videos) - Course creators (lesson recordings) - Consultants (thought leadership content)

Positioning Strategy

Your Competitive Advantage: "Collaborative video production that eliminates traditional bottlenecks. Unlike agencies that work in black boxes, I involve clients in real-time review and revision—cutting approval cycles from weeks to days while maintaining complete creative control."

Client Acquisition: Demonstrate collaboration workflow in sales calls (share sample project, show commenting in action), offer pilot project at reduced rate to prove efficiency, emphasize time savings (quantify days saved vs traditional), provide case studies showing revision cycle speed.

MODULE 6: Export Optimization & Advanced Monetization

Master platform-specific export settings, optimization techniques, and build a sustainable Descript-based content business generating consistent revenue.

From Creation to Cash Flow

Perfect content means nothing if exported incorrectly or monetized poorly. This final module teaches platform-specific optimization ensuring maximum quality and performance, then reveals advanced monetization strategies that transform Descript skills into sustainable income streams. You'll learn to position services, price strategically, and build systems that generate $5,000-15,000+ monthly.

Export Quality

Platform-Perfect

Revenue Potential

$5K-15K/mo

Business Model

Scalable

Platform-Specific Export Optimization

Understanding Export Settings

Export settings determine final file quality, size, and compatibility. Wrong settings waste upload time, reduce quality, or cause playback issues.

Key Export Parameters:

  • Resolution: Pixel dimensions (1920x1080, 3840x2160, etc.). Higher = better quality but larger files.
  • Frame Rate: Frames per second (24, 30, 60 fps). Match source footage or platform requirements.
  • Codec: Compression method (H.264, H.265). H.264 = universal compatibility, H.265 = better compression.
  • Bitrate: Data rate determining quality/size balance. Higher = better quality, larger file.
  • Audio Settings: Codec (AAC standard), bitrate (128-320 kbps), sample rate (48kHz standard).

Export Settings by Platform:

YOUTUBE: Resolution: 1920x1080 (1080p) or 3840x2160 (4K) Frame Rate: 24, 30, or 60 fps (match source) Codec: H.264 Bitrate: 8-12 Mbps (1080p), 35-45 Mbps (4K) Audio: AAC, 256 kbps, 48kHz INSTAGRAM FEED (1:1): Resolution: 1080x1080 Frame Rate: 30 fps Codec: H.264 Bitrate: 5-8 Mbps Audio: AAC, 128 kbps, 48kHz Max Duration: 60 seconds INSTAGRAM REELS/TIKTOK (9:16): Resolution: 1080x1920 Frame Rate: 30 fps Codec: H.264 Bitrate: 5-8 Mbps Audio: AAC, 128 kbps, 48kHz Max Duration: 90 seconds (Reels), 10 min (TikTok) LINKEDIN: Resolution: 1920x1080 or 1080x1080 Frame Rate: 30 fps Codec: H.264 Bitrate: 5-10 Mbps Audio: AAC, 128 kbps PODCAST AUDIO: Format: MP3 or AAC Bitrate: 128 kbps (mono) or 192 kbps (stereo) Sample Rate: 44.1kHz or 48kHz Loudness: -16 LUFS

Descript Export Process

Descript simplifies exports with platform presets and custom control for advanced users.

Basic Export: File → Export → Choose format (Video or Audio). Select preset (YouTube, Instagram, etc.) or Custom. Choose destination and filename. Click Export. Descript processes and saves file.

Platform Presets: Descript includes optimized presets for major platforms. These handle resolution, codec, and bitrate automatically. Use presets for: standard uploads, when unsure about technical settings, fast exports without configuration.

Custom Settings: Advanced users can override presets. Access: Export → Custom Settings. Adjust resolution, frame rate, codec, bitrate manually. Use for: specific client requirements, non-standard platforms, file size optimization, maximum quality for archival.

Export Workflow Strategy:

STANDARD PROJECTS: Use Descript presets - Fastest export - Proven quality - Platform-optimized HIGH-STAKES PROJECTS: Custom settings with manual review - Client-specific requirements - Maximum quality needed - Specific file size targets MULTI-PLATFORM STRATEGY: Export once per platform - YouTube preset → 1080p master - Instagram preset → 1:1 version - TikTok preset → 9:16 version Queue all exports simultaneously ARCHIVAL MASTER: Custom: Maximum quality - Resolution: Source resolution - Codec: H.264 High Profile - Bitrate: Maximum (50+ Mbps) - Purpose: Future re-editing/re-exporting Time Management: - Exports process in background - Continue working on other projects - Batch export multiple videos overnight

Caption and Subtitle Export

Captions improve accessibility and engagement. Descript generates captions automatically from transcripts.

Caption Export Options: Burned-in (visible in video, cannot be disabled), SRT file (separate file, user-controllable), VTT file (web video standard), embedded (included in video file, toggle-able on platforms supporting captions).

When to Use Each: Burned-in captions for social media (auto-play without sound, captions ensure engagement). SRT/VTT files for YouTube, Vimeo (professional, accessible, multiple languages possible). Embedded captions for accessibility compliance, professional video platforms.

Caption Strategy by Platform:

SOCIAL MEDIA (Instagram, TikTok, LinkedIn): Format: Burned-in captions Style: Large, bold, high contrast Position: Center or lower-third Why: Auto-play muted, captions drive engagement YOUTUBE: Format: SRT file upload separately Why: User-controllable, SEO benefit, multi-language support Also: YouTube auto-generates, but yours are more accurate VIMEO/PROFESSIONAL: Format: VTT or embedded Why: Professional appearance, accessibility compliance CLIENT DELIVERABLES: Provide: Both burned-in version AND separate SRT Why: Maximum flexibility for client's distribution needs CAPTION STYLING (Burned-in): - Font: Sans-serif, bold - Size: Large (readable on mobile) - Color: White text, black background (high contrast) - Position: Lower-third (doesn't obscure faces) - Animation: Fade in/out per sentence Export Process: 1. Descript automatically transcribes 2. Review/correct transcript for accuracy 3. Export → Select caption format 4. Customize styling if burned-in 5. Export with captions included

Quality Control Before Export

Final review prevents embarrassing mistakes and ensures professional delivery.

Pre-Export Checklist:

  • Full Playthrough: Watch entire video start to finish. Check for awkward cuts, audio glitches, visual errors.
  • Audio Levels: Consistent volume throughout. No clipping (red meters). Background music balanced under voice.
  • Transcript Accuracy: All text spelled correctly (affects captions/SEO). Proper punctuation and capitalization.
  • Visual Elements: Graphics appear at correct times. Layers positioned properly. No unintended elements visible.
  • Branding: Logo present if required. Colors match brand guidelines. Consistent styling throughout.
  • Export Settings: Correct resolution for platform. Appropriate aspect ratio. Captions configured correctly.

Professional QC Process:

STEP 1: Internal Review (You) - Watch on large screen - Listen with quality headphones - Check every transition - Verify all text elements - Confirm timing of all graphics STEP 2: Test Export - Export 30-second sample - Review sample at actual export quality - Catches export-specific issues - Confirms settings are correct STEP 3: Fresh Eyes Review - Step away for 1+ hours - Return with fresh perspective - Catch things you missed initially STEP 4: Client Review (if applicable) - Share via Descript (Can Comment) - Collect all feedback - Make revisions - Re-review after changes STEP 5: Platform Test (Critical Projects) - Upload to private/unlisted - View on actual platform - Check mobile and desktop - Verify captions work - Confirm quality meets standards STEP 6: Final Export - All issues resolved - Settings confirmed correct - Export final master - Archive project files Common Issues Caught by QC: - Jump cuts that feel awkward - Background music too loud - Captions with spelling errors - Graphics appearing at wrong times - Audio glitches at edit points - Incorrect aspect ratio - Color/brightness inconsistencies Time Investment: 30-60 min QC saves hours of re-work and protects reputation.

Building a Sustainable Descript Business

Service Packaging Strategy

Successful Descript businesses offer clear, value-based packages rather than hourly rates. Packages set expectations and improve profitability.

The Three-Tier Model: Offer three service levels (Basic, Professional, Premium) with clear value differentiation. This gives clients choice while steering most toward profitable middle tier.

Complete Service Package Structure:

TIER 1: ESSENTIAL EDIT Price: $400-600 per video Includes: - Professional editing (text-based workflow) - Filler word removal - Studio Sound audio enhancement - Basic color correction - Single platform optimization (YouTube 16:9) - 1 revision round Turnaround: 5-7 days Best For: Budget-conscious clients, simple content TIER 2: PROFESSIONAL PRODUCTION (Most Popular) Price: $800-1,200 per video Includes: - Everything in Essential, plus: - Custom graphics and lower thirds - Advanced composition/layers - Multi-platform export (3 formats) - Background music integration - Burned-in captions - 2 revision rounds - Priority support Turnaround: 3-5 days Best For: Businesses, course creators, professional content TIER 3: PREMIUM COMPLETE Price: $1,500-2,500 per video Includes: - Everything in Professional, plus: - Script consultation/development - Screen recording assistance - AI voice corrections (Overdub) - Advanced motion graphics - Multi-platform optimization (5+ formats) - Thumbnail design - SEO optimization (titles, descriptions) - Unlimited revisions - 24-hour rush available Turnaround: 2-3 days Best For: High-value content, corporate clients, launches ADD-ON SERVICES (Extra Revenue): - Thumbnail design: +$50-100 - Additional platform version: +$100-150 - Rush delivery (24hr): +50% project cost - Caption translation: +$150 per language - Audiogram creation (5 clips): +$200 - YouTube SEO package: +$150

Retainer Model for Recurring Revenue

Retainers provide predictable income and deeper client relationships. Monthly packages create stable business foundation.

Retainer Benefits: Guaranteed monthly income, priority scheduling, deeper understanding of client needs, reduced sales effort (maintain vs acquire), compound efficiency (templates reused monthly), client loyalty and long-term relationships.

Retainer Package Structure:

STARTER RETAINER: $1,500-2,000/month Includes: - 4 videos per month (up to 10 min each) - Professional tier service level - Dedicated project manager (you) - Monthly strategy call - 48-hour standard turnaround - Rollover: 1 unused video to next month GROWTH RETAINER: $2,800-3,500/month Includes: - 8 videos per month (up to 10 min each) - Professional tier service level - Priority 24-hour turnaround - Bi-weekly strategy calls - Custom templates for brand consistency - Multi-platform optimization all videos - Rollover: 2 unused videos ENTERPRISE RETAINER: $5,000-8,000/month Includes: - 16 videos per month (up to 15 min each) - Premium tier service level - Dedicated account management - Weekly strategy sessions - Rush delivery available - Full content consultation - Team training (client's team uses Descript) - Rollover: 4 unused videos RETAINER ADVANTAGES: Client Side: - Predictable monthly cost - Priority access to your schedule - Consistent quality/branding - No per-project negotiations Your Side: - Stable monthly income ($1,500-8,000) - Easier scheduling and planning - Higher lifetime client value - Efficiency gains (reusable templates) Pricing Strategy: - 10-20% discount vs per-project pricing - Clients commit 3-6 months minimum - Auto-renewal encouraged - Annual payment: additional 10% discount Target Clients for Retainers: - Content marketing agencies - Course creators (ongoing lessons) - B2B SaaS (regular feature demos) - YouTube channels (consistent uploads) - Podcasters (video versions) Conversion Path: 1. Client hires for single project 2. Deliver exceptional results 3. Propose retainer for ongoing needs 4. Offer first month at discount 5. Auto-renew after initial period

Scaling Through Systems

Move from trading time for money to building scalable systems that increase income without proportional time increase.

Scalability Strategies: Template-based workflows (reduce per-project time), documented processes (enable delegation), tool mastery (work faster), selective client acceptance (higher-value projects only), automated communication (reduce admin time), batch production (efficiency gains).

Scaling Roadmap (Solo to Team):

PHASE 1: FOUNDATION (Months 1-3) Revenue: $2,000-5,000/month Workload: 40-60 hours/week Strategy: - Accept diverse projects (build portfolio) - Create templates for common scenarios - Document your workflow - Build client testimonials - Establish pricing structure PHASE 2: OPTIMIZATION (Months 4-6) Revenue: $5,000-8,000/month Workload: 40-50 hours/week Strategy: - Niche down to ideal clients - Increase rates 20-30% - Convert 2-3 clients to retainers - Systemize repetitive tasks - Create proposal templates PHASE 3: LEVERAGE (Months 7-12) Revenue: $8,000-12,000/month Workload: 30-40 hours/week (on revenue-generating work) Strategy: - Hire VA for admin tasks ($500-800/month) - Train junior editor for basic edits ($15-20/hour) - You focus on client relations + complex work - Raise rates another 20% - 50% of clients on retainers PHASE 4: SCALE (Year 2+) Revenue: $12,000-20,000+/month Workload: 20-30 hours/week (strategic work) Strategy: - Full-time editor ($3,000-4,000/month salary) - You handle sales, strategy, QC only - Target enterprise clients ($5K+ retainers) - Create done-for-you packages - Consider agency positioning SYSTEMS TO BUILD: - Client onboarding checklist - Project management workflow (Notion/Asana) - Template library (organized by type) - Communication templates (proposals, invoices) - Quality control checklist - Delivery process automation Time Allocation (Phase 3+): - Client acquisition: 20% - Strategy/planning: 20% - High-value editing: 30% - Quality control: 15% - Admin (delegated): 15%

Client Acquisition Strategies

Consistent client flow requires systematic lead generation and conversion processes.

Outbound Strategies: Direct outreach to potential clients, portfolio showcasing, cold email campaigns (personalized), LinkedIn connection + value-first engagement, partnership with complementary services (web designers, marketers).

Inbound Strategies: YouTube tutorials demonstrating expertise, social media presence (show before/afters), content marketing (blog about video production), SEO optimization (rank for "video editor near me"), referral program (incentivize existing clients).

90-Day Client Acquisition Plan:

MONTH 1: FOUNDATION Week 1-2: Portfolio Development - Edit 3 sample videos showcasing range - Create case studies with results - Build simple website or portfolio page - Set up business social media profiles Week 3-4: Initial Outreach - Identify 50 target clients (ideal profile) - Connect on LinkedIn with personalized notes - Join 5 relevant Facebook/LinkedIn groups - Engage authentically (comments, helpful answers) Goal: 5 discovery calls, 1-2 paid projects MONTH 2: MOMENTUM Week 1-2: Content Creation - Publish 2 YouTube tutorials (Descript tips) - Share before/after examples on social - Write 2 blog posts about video production - Engage in online communities daily Week 3-4: Direct Outreach - Email 100 potential clients (personalized) - Offer free video audit (15-min review) - Follow up with previous month's connections - Ask existing clients for referrals Goal: 8 discovery calls, 3-4 paid projects MONTH 3: SCALING Week 1-2: Authority Building - Guest post on industry blog - Podcast interview (video production topic) - Host free webinar (video tips) - Expand social media presence Week 3-4: Conversion Optimization - Refine pitch based on feedback - Increase prices (proven demand) - Create proposal template - Implement referral incentive program Goal: 10+ discovery calls, 5-6 paid projects, 1 retainer ONGOING ACTIVITIES: - Daily: 30 min social media engagement - Weekly: 2-3 pieces of content - Weekly: 10-15 outreach emails - Monthly: Strategy review and adjustment Success Metrics: - Month 1: $2,000-3,000 revenue - Month 2: $4,000-6,000 revenue - Month 3: $6,000-10,000 revenue - Ongoing growth from there

🎯 Your Path Forward

Congratulations on Completing Descript Mastery

You've mastered the complete Descript ecosystem—from text-based editing fundamentals to advanced composition, AI voice technology, audio enhancement, collaboration workflows, and monetization strategies. You now possess skills that separate professional content creators from amateurs.

What You've Learned:

  • Module 1: Text-based editing foundation that makes you 10x faster than traditional editors
  • Module 2: Advanced composition and layer management for broadcast-quality visuals
  • Module 3: AI voice cloning and Overdub for eliminating re-recording waste
  • Module 4: Studio Sound and audio processing for professional-quality output
  • Module 5: Screen recording and collaboration for seamless client workflows
  • Module 6: Export optimization and business systems for sustainable income

Your 30-Day Action Plan:

WEEK 1: PORTFOLIO BUILDING - Create 3 sample projects showcasing different styles - Document your process (before/after examples) - Set up basic website or portfolio page - Define your niche and ideal client profile WEEK 2: SYSTEMS SETUP - Create service packages (3-tier structure) - Develop proposal template - Set up project management system - Build template library for common projects WEEK 3: CLIENT ACQUISITION - Identify 30 potential clients - Reach out with personalized messages - Offer value-first (free audit, tips) - Join relevant online communities WEEK 4: DELIVERY & REFINEMENT - Land and deliver first paid projects - Collect testimonials - Refine pricing based on market response - Begin ongoing content marketing BEYOND 30 DAYS: - Consistent outreach (10-15 per week) - Regular content creation (1-2 pieces/week) - Ongoing skill refinement - Scale through systems and hiring Your Revenue Trajectory: - Month 1: $2,000-4,000 (foundation) - Month 3: $5,000-8,000 (momentum) - Month 6: $8,000-12,000 (optimization) - Month 12: $12,000-20,000+ (scale) This is achievable with Descript mastery + consistent execution.

Final Thoughts: From Skills to Income

Technical Descript mastery is only half the equation. Your success depends on positioning these skills as valuable services that solve real business problems.

Remember: Clients don't buy video editing—they buy: faster time-to-market for their content, professional brand image, engagement with their audience, freedom from production bottlenecks, scalable content systems.

Your Competitive Advantages: Speed (text-based editing is 5-10x faster), collaboration (real-time feedback vs email chains), versatility (recording + editing + audio in one tool), adaptability (quick updates via Overdub), professionalism (broadcast-quality output from home studio).

The Descript ecosystem eliminates traditional video production friction. You've learned to leverage this advantage. Now execute consistently, deliver exceptional results, and build systems that scale. Your first $10,000 month is not just possible—it's probable with proper execution.

Thank you for completing Descript Mastery. Your journey from learner to professional starts now. Go create exceptional content, deliver outstanding value, and build the business you deserve.