An unofficial AI song generation manual
Recent Project ExamplesQuick StartBasic API CallMinimum Viable CompositionJSON Structure Deep DiveGlobal Styles (Apply to Entire Song)Section AnatomyPerformance Notation in LyricsStyle EngineeringThe Power of Negative StylesBPM Sweet SpotsVocal Style MatrixAvoiding AI Weirdness❌ COMMON AI FAILS✅ PREVENTION STRATEGIES1. Natural Language Only2. Syllable Counting3. Concrete Imagery4. Musical DirectionsAdvanced Techniques1. Hook Engineering2. Energy Mapping3. Transition Smoothing4. Multilingual Integration5. Genre Fusion FormulaGenre TemplatesDark ElectronicHyperpop ChaosSultry R&BComedy RapPro Tips from TestingTroubleshootingIssue: Melody sounds "off"Issue: Energy drops mid-songIssue: Vocals unclearIssue: Wrong genre bleeding inIssue: Boring/repetitiveQuality ChecklistNext-Level Composition PsychologyHow to Write Songs That Don't Sound AI-GeneratedThe Uncanny Valley of AI MusicWhat Makes a Song Sound "AI"?Anti-Pattern #1: The Word Salad❌ AI-Generated Garbage✅ Human-Sounding AlternativeAnti-Pattern #2: The Rhythm Wrecker❌ Broken Scansion✅ Rhythmically ConsistentAnti-Pattern #3: The Fake Slang❌ AI Trying to Be Cool✅ Actual Slang That WorksAnti-Pattern #4: The Mood Whiplash❌ Emotional Chaos✅ Emotional JourneyThe Comedy ProblemHow to Be Funny Without Being Cringe❌ AI "Humor"✅ Actually FunnyThe Innuendo MatrixSubtlety LevelsDouble Meaning MasterclassSong Structure PsychologyThe Attention CurveThe Energy Map That WorksLyrical Frameworks That Never Fail1. The Story Arc2. The List Song3. The Conversation4. The ProgressionVocal Delivery SpecificationsWhat Actually WorksWhat Sounds RoboticThe Spanish Meat Cookbook 🌶️Actual Spanish Meats (Use These)Temperature/Preparation TermsMarx Foodservice Specific GuidelinesWhat WorksWhat Doesn'tTesting Your LyricsThe Speak-Aloud TestThe Cringe TestThe Energy TestCommon FixesProblem: "It sounds like a robot wrote this"Problem: "The rhythm is off"Problem: "It's not funny"Problem: "Too vulgar/obvious"Problem: "Energy is flat"The Ultimate Checklist
Good songs tell stories, create feelings, and make people move. They don't explain themselves, apologize for existing, or try too hard to be clever.
Recent Project Examples
Quick Start
Basic API Call
curl -X POST <https://api.elevenlabs.io/v1/music/detailed> \\ -H "Content-Type: application/json" \\ -H "xi-api-key: YOUR_KEY" \\ -d @your_composition.json \\ --output song.mp3
Minimum Viable Composition
{ "composition_plan": { "positive_global_styles": ["electronic pop", "125 BPM"], "negative_global_styles": ["slow", "acoustic"], "sections": [{ "section_name": "Full Song", "positive_local_styles": ["energetic"], "negative_local_styles": ["boring"], "duration_ms": 60000, "lines": ["Your lyrics here"] }] } }
JSON Structure Deep Dive
Global Styles (Apply to Entire Song)
"positive_global_styles": [ "genre", // Primary: "electronic pop", "dark techno", "hyperpop" "BPM", // Critical: "125 BPM", "140 BPM", "90 BPM" "vocal_style", // Character: "sultry", "aggressive", "theatrical" "production", // Sound: "bass-heavy", "synth-driven", "minimal" "mood" // Feeling: "chaotic", "mysterious", "playful" ]
Section Anatomy
{ "section_name": "Chorus", // Standard: Intro, Verse, Pre-Chorus, Chorus, Bridge, Outro "duration_ms": 24000, // 3000-120000ms per section "positive_local_styles": [], // Section-specific additions "negative_local_styles": [], // Section-specific exclusions "lines": [] // Lyrics with performance notes }
Performance Notation in Lyrics
"lines": [ "Main lyric line here", "[whispered] Soft delivery", "(ad-lib) Yeah! Uh!", "[falsetto] High note section", "(Marx! Marx!) Background chants", "[gasping] Breathless delivery" ]
Style Engineering
The Power of Negative Styles
Negative styles are MORE POWERFUL than you think. Use them to prevent:
- Genre bleeding:
"negative": ["country", "folk", "jazz"]
- Energy drops:
"negative": ["slow", "ballad", "ambient"]
- Vocal issues:
"negative": ["monotone", "spoken word", "whispered"]
BPM Sweet Spots
- 70-90: Slow burn, dark, menacing
- 90-110: Hip-hop, trip-hop, groovy
- 120-128: Pop, house, standard dance
- 140: Dubstep, trap, aggressive
- 160-180: Drum & bass, hardcore
Vocal Style Matrix
Style | Use For | Avoid With |
theatrical | Drama, humor | Minimal production |
breathy | Intimate, sexy | Heavy bass |
aggressive | Punk, metal | Slow tempos |
falsetto | Emotional peaks | Low-energy sections |
chanting | Hooks, hypnotic | Complex lyrics |
Avoiding AI Weirdness
❌ COMMON AI FAILS
- Non-words: "Satisfize", "Chorizo-ing"
- Awkward scansion: Lines that don't fit the beat
- Random melodic jumps: Unsingable intervals
- Energy mismatches: Soft vocals over aggressive beats
✅ PREVENTION STRATEGIES
1. Natural Language Only
// BAD "Marx-ifying your senses" "Chorizolicious fever dream" // GOOD "Marx is calling out your name" "Chorizo fever in my brain"
2. Syllable Counting
Count syllables to match rhythm:
- 4/4 time = multiples of 4 or 8 syllables work best
- Leave space for breath between phrases
- Test by speaking lyrics in rhythm
3. Concrete Imagery
// VAGUE (AI struggles) "Feelings of desire" // SPECIFIC (AI succeeds) "Behind the freezer door"
4. Musical Directions
Be explicit about delivery:
"positive_local_styles": [ "on-beat vocals", // Prevents off-beat weirdness "clear enunciation", // Prevents mumbling "melodic stability" // Prevents random pitch jumps ]
Advanced Techniques
1. Hook Engineering
Create earworms through:
- Repetition: Same phrase 2-3 times
- Call-response: Question → Answer pattern
- Sonic branding: Recurring sound/chant (e.g., "Marx! Marx!")
2. Energy Mapping
Intro ▁▁▂▂ (20% energy) Verse 1 ▂▂▃▃ (40% energy) Pre-Chorus ▃▃▄▅ (60% building) Chorus ▅▅▆▇ (85% peak) Verse 2 ▃▃▄▄ (50% energy) Bridge ▄▅▆▇ (70-90% climb) Final ▇▇██ (100% maximum) Outro ▆▄▂▁ (Fade out)
3. Transition Smoothing
Add transition cues:
"lines": [ "...", "(building to chorus)", // AI understands this "...", "[drums intensify]" // Production cue ]
4. Multilingual Integration
"lines": [ "English main line", "¡Español for emphasis!", "(French whisper: oh là là)" ]
5. Genre Fusion Formula
"positive_global_styles": [ "primary_genre", // 60% influence "secondary_genre", // 30% influence "accent_genre" // 10% spice ]
Genre Templates
Dark Electronic
{ "positive_global_styles": [ "dark techno", "140 BPM", "industrial", "heavy bass", "distorted vocals", "menacing" ], "negative_global_styles": [ "happy", "major key", "acoustic", "soft" ] }
Hyperpop Chaos
{ "positive_global_styles": [ "hyperpop", "150 BPM", "autotuned", "glitchy", "chaotic", "maximalist production" ], "negative_global_styles": [ "minimal", "organic", "subtle", "relaxed" ] }
Sultry R&B
{ "positive_global_styles": [ "R&B", "90 BPM", "sultry vocals", "smooth bass", "intimate", "late-night vibes" ], "negative_global_styles": [ "aggressive", "fast", "harsh", "childish" ] }
Comedy Rap
{ "positive_global_styles": [ "hip-hop", "95 BPM", "comedic delivery", "bouncy beat", "clear enunciation", "playful" ], "negative_global_styles": [ "serious", "mumble rap", "dark", "aggressive" ] }
Pro Tips from Testing
- Test Incrementally: Generate 30-second tests before full songs
- Version Control: Number your JSONs (v1, v2, v3...)
- A/B Testing: Make two versions with one variable changed
- Streaming Mode: Use
/v1/music/stream
for real-time preview
- Batch Generation: Queue multiple versions overnight
- Documentation: Comment your JSONs with
//
(API ignores these)
Troubleshooting
Issue: Melody sounds "off"
Fix: Add
"melodic stability"
and "on-beat vocals"
to positive stylesIssue: Energy drops mid-song
Fix: Add
"consistent energy"
globally, avoid "dynamic range"
Issue: Vocals unclear
Fix: Use
"clear enunciation"
and avoid "reverb-heavy"
Issue: Wrong genre bleeding in
Fix: Be aggressive with negative styles, list everything to avoid
Issue: Boring/repetitive
Fix: Vary your local styles per section, add
"progressive arrangement"
Quality Checklist
Before generating, verify:
BPM specified in global styles
All sections total 60-180 seconds
Lyrics scan naturally when spoken
Performance directions in brackets/parentheses
Negative styles prevent unwanted genres
Energy arc makes sense
No made-up words or AI-speak
Concrete, specific imagery
Next-Level Composition Psychology
The AI responds to:
- Confidence: Bold, declarative style descriptions work better
- Specificity: "125 BPM" > "fast tempo"
- Cultural references: It knows genre conventions
- Production terminology: Use real music production terms
- Emotional clarity: One clear mood > mixed emotions
Remember: The AI wants to make good music. Give it clear, professional direction and it will deliver.
Document Version 1.0 | Based on extensive testing with ElevenLabs Music API# 🚫 Songwriting Anti-Patterns & Solutions
How to Write Songs That Don't Sound AI-Generated
The Uncanny Valley of AI Music
What Makes a Song Sound "AI"?
- Prosody Violations - Words that don't match the rhythm
- Semantic Drift - Lyrics that lose coherent meaning
- Emotional Inconsistency - Mood swings that don't make sense
- Melodic Randomness - Notes that don't follow musical logic
- Energy Mismatches - Production that fights the vocals
Anti-Pattern #1: The Word Salad
❌ AI-Generated Garbage
"Mystical sensations flowing through existence Dancing particles of emotional persistence Universe calling with synthetic dreams Reality fracturing at quantum seams"
✅ Human-Sounding Alternative
"Late night in your apartment City lights through the window You're dancing in the kitchen To a song on the radio"
Rule: Use concrete scenes, not abstract concepts
Anti-Pattern #2: The Rhythm Wrecker
❌ Broken Scansion
"I'm desperately wanting to find you" (10 syllables) "Come to me" (3 syllables) "The universe is calling out for our love" (11 syllables) "Yeah" (1 syllable)
✅ Rhythmically Consistent
"I've been searching for you" (6) "Come and find me too" (5) "Every star above us" (6) "Knows our love is true" (5)
Rule: Count syllables, maintain patterns
Anti-Pattern #3: The Fake Slang
❌ AI Trying to Be Cool
"Vibing on that fleek sauce Getting all algorithmically boss My neural network's on fire Quantum entangled desire"
✅ Actual Slang That Works
"Got me feeling some type of way Your moves are crazy, I must say This energy's off the charts You're playing games with my heart"
Rule: Use real slang or none at all
Anti-Pattern #4: The Mood Whiplash
❌ Emotional Chaos
Verse: "I'm so depressed and lonely" Chorus: "PARTY TIME! WOO! YEAH!" Bridge: "Contemplating existence..." Outro: "RAGE AGAINST THE MACHINE!"
✅ Emotional Journey
Verse: "Starting to feel the rhythm" Chorus: "Now we're dancing freely" Bridge: "This moment's everything" Outro: "Never want this night to end"
Rule: Emotions should evolve, not randomly switch
The Comedy Problem
How to Be Funny Without Being Cringe
❌ AI "Humor"
- Random references ("Banana hammock Tuesday!")
- Forced wordplay ("Meat me at the meat meet")
- Over-explaining jokes ("That's funny because...")
✅ Actually Funny
- Unexpected truth ("Your mom likes my playlist")
- Clever innuendo (implied, not stated)
- Callback humor (reference earlier lines)
- Situational absurdity (believable but weird)
The Innuendo Matrix
Subtlety Levels
Level 1 - Too Obvious ❌
"I want to put my meat in your mouth"
Level 2 - Just Right ✅
"Serving up something hot tonight"
Level 3 - Too Vague ❌
"Things are happening with stuff"
Double Meaning Masterclass
Surface Meaning | Hidden Meaning | Line Example |
Food service | Sexual tension | "Order up, coming hot" |
Temperature | Arousal | "Thermometer's rising" |
Workplace | Roleplay | "Working overtime tonight" |
Spanish food | Passion | "That chorizo heat" |
Song Structure Psychology
The Attention Curve
0-8 sec: Hook them (MUST grab attention) 8-20 sec: Set scene (establish world) 20-45 sec: Build tension (create need) 45-60 sec: Release (satisfy with chorus) 60-90 sec: Develop (add complexity) 90-120 sec: Climax (peak energy) 120+ sec: Resolution (satisfying end)
The Energy Map That Works
Intro: ████░░░░░░ 40% Verse 1: ██████░░░░ 60% Pre-Chorus: ███████░░░ 70% Chorus: █████████░ 90% Verse 2: ██████░░░░ 60% Chorus: █████████░ 90% Bridge: ████████░░ 80% Final: ██████████ 100% Outro: ████░░░░░░ 40%
Lyrical Frameworks That Never Fail
1. The Story Arc
- Setup: Where/Who/When
- Conflict: The problem/desire
- Rising: Things intensify
- Climax: Peak moment
- Resolution: How it ends
2. The List Song
- Verse 1: List examples
- Chorus: The main point
- Verse 2: More examples
- Bridge: The twist
3. The Conversation
- Verse 1: You said...
- Verse 2: I said...
- Chorus: We both know...
- Bridge: But really...
4. The Progression
- Verse 1: Beginning
- Verse 2: Middle
- Bridge: Transformation
- Final: End state
Vocal Delivery Specifications
What Actually Works
"[whispered]" - Start of intimate sections "(ad-lib: yeah)" - Between main lines "[falsetto]" - Emotional peaks only "(harmony: ooh)" - Background vocals "[spoken]" - Breakdowns, not verses
What Sounds Robotic
"[randomly yelling]" "(constant ad-libs every line)" "[switching delivery mid-word]" "(unclear what this means)"
The Spanish Meat Cookbook 🌶️
Actual Spanish Meats (Use These)
- Chorizo - Spicy sausage
- Jamón ibérico - Premium ham
- Morcilla - Blood sausage
- Lomo - Pork loin
- Cecina - Cured beef
- Sobrasada - Spreadable sausage
- Butifarra - Catalan sausage
Temperature/Preparation Terms
- "Sizzling" > "Hot meating"
- "Grilled to perfection" > "Meat-ified"
- "Slow-cooked" > "Meat processing"
- "Marinated" > "Meat-soaked"
Marx Foodservice Specific Guidelines
What Works
- Marx as a chant/hook
- Specific location references (break room, freezer)
- Actual food service terminology
- College setting details
- Workplace hierarchy humor
What Doesn't
- Over-explaining the company
- Making up food service terms
- Forcing "Marx" into weird compounds
- Generic workplace references
Testing Your Lyrics
The Speak-Aloud Test
- Read your lyrics out loud
- Do they sound like something a human would say?
- Can you say them in rhythm?
- Do they make sense without music?
The Cringe Test
- Would you be embarrassed if someone found these lyrics?
- Would a stranger understand the humor?
- Is the innuendo clever or just vulgar?
- Would this work at a party?
The Energy Test
- Map energy levels 1-10 for each section
- Does the progression make sense?
- Are transitions smooth?
- Is the climax actually climactic?
Common Fixes
Problem: "It sounds like a robot wrote this"
Fix: Add specific details, real places, actual emotions
Problem: "The rhythm is off"
Fix: Count syllables, emphasize the right words
Problem: "It's not funny"
Fix: Use surprise, not randomness
Problem: "Too vulgar/obvious"
Fix: Imply, don't state; suggest, don't show
Problem: "Energy is flat"
Fix: Vary section dynamics, add builds and drops
The Ultimate Checklist
Before sending to ElevenLabs:
Every line sounds natural when spoken
Syllables match the beat structure
Energy progression makes sense
Humor lands without explanation
No made-up words or compounds
Specific, concrete imagery
Emotional consistency throughout
Performance directions are clear
No "AI-isms" or robot speak