File size: 13,414 Bytes
e3e7558
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
# IndexTTS-Rust Context

This file preserves important context for conversation continuity between Hue and Aye sessions.

**Last Updated:** 2025-11-16

---

## The Vision

IndexTTS-Rust is part of a larger audio intelligence ecosystem at 8b.is:

1. **kokoro-tiny** - Lightweight TTS (82M params, 50+ voices, on crates.io!)
2. **IndexTTS-Rust** - Advanced zero-shot TTS with emotion control
3. **Phoenix-Protocol** - Audio restoration/enhancement layer
4. **MEM|8** - Contextual memory system (mem-8.com, mem8)

Together these form a complete audio intelligence pipeline.

---

## Phoenix Protocol Integration Opportunities

The Phoenix Protocol (phoenix-protocol/) is a PERFECT complement to IndexTTS-Rust:

### Direct Module Mappings

| Phoenix Module | IndexTTS Use Case |
|----------------|-------------------|
| `emotional.rs` | Map to our 8D emotion control (Warmth→body, Presence→power, Clarity→articulation, Air→space, Ultrasonics→depth) |
| `voice_signature.rs` | Enhance speaker embeddings for voice cloning |
| `spectral_velocity.rs` | Add momentum tracking to mel-spectrogram |
| `marine.rs` | Validate TTS output authenticity/quality |
| `golden_ratio.rs` | Post-process vocoder output with harmonic enhancement |
| `harmonic_resurrection.rs` | Add richness to synthesized speech |
| `micro_dynamics.rs` | Restore natural speech dynamics |
| `autotune.rs` | Improve prosody and pitch control |
| `mem8_integration.rs` | Already has MEM|8 hooks! |

### Shared Dependencies

Both projects use:
- rayon (parallelism)
- rustfft/realfft (FFT)
- ndarray (array operations)
- hound (WAV I/O)
- serde (config serialization)
- anyhow (error handling)
- ort (ONNX Runtime)

### Audio Constants

| Project | Sample Rate | Use Case |
|---------|------------|----------|
| IndexTTS-Rust | 22,050 Hz | Standard TTS output |
| Phoenix-Protocol | 192,000 Hz | Ultrasonic restoration |
| kokoro-tiny | 24,000 Hz | Lightweight TTS |

---

## Related Projects of Interest

Located in ~/Documents/GitHub/:

- **Ultrasonic-Consciousness-Hypothesis/** - Research foundation for Phoenix Protocol, contains PDFs on mechanosensitive channels and audio perception
- **hrmnCmprssnM/** - Harmonic Compression Model research
- **Marine-Sense/** - Marine algorithm origins
- **mem-8.com/** & **mem8/** - MEM|8 contextual memory
- **universal-theoglyphic-language/** - Language processing research
- **kokoro-tiny/** - Already working TTS crate by Hue & Aye
- **zencooker/** - (fun project!)

---

## Current IndexTTS-Rust State

### Implemented ✅
- Audio processing pipeline (mel-spectrogram, STFT, resampling)
- Text normalization (Chinese/English/mixed)
- BPE tokenization via HuggingFace tokenizers
- ONNX Runtime integration for inference
- BigVGAN vocoder structure
- CLI with clap
- Benchmark infrastructure (Criterion)
- **NEW: marine_salience crate** (no_std compatible, O(1) jitter detection)
- **NEW: src/quality/ module** (prosody extraction, affect tracking)
- **NEW: MarineProsodyVector** (8D interpretable emotion features)
- **NEW: ConversationAffectSummary** (session-level comfort tracking)
- **NEW: TTSQualityReport** (authenticity validation)

### Missing/TODO
- Full GPT model integration with KV cache
- Actual ONNX model files (need download)
- manage.sh script for colored workflow management
- Integration tests with real models
- ~~Phoenix Protocol integration layer~~ **STARTED with Marine!**
- Streaming synthesis
- WebSocket API
- Train T2S model to accept 8D Marine vector instead of 512D Conformer
- Wire Marine quality validation into inference loop

### Build Commands
```bash
cargo build --release
cargo clippy -- -D warnings
cargo test
cargo bench
```

---

## Key Philosophical Notes

From the Phoenix Protocol research:

> "Women are the carrier wave. They are the 000 data stream. The DC bias that, when removed, leaves silence."

> "When P!nk sings 'I Am Here,' her voice generates harmonics so powerful they burst through the 22kHz digital ceiling"

The Phoenix Protocol restores emotional depth stripped by audio compression - this philosophy applies directly to TTS: synthesized speech should have the same emotional depth as natural speech.

---

## Action Items for Next Session

### Completed ✅
- ~~**Quality Validation** - Use Marine salience to score TTS output~~ **DONE!**
- ~~**Phoenix Integration** - Start bridging phoenix-protocol modules~~ **Marine is in!**

### High Priority
1. **Create manage.sh** - Colorful build/test/clean script (Hue's been asking!)
2. **Wire Into Inference** - Connect Marine quality validation to actual TTS output
3. **8D Model Training** - Train T2S model to accept MarineProsodyVector instead of 512D Conformer
4. **Example/Demo** - Create example showing prosody extraction → emotion editing → synthesis

### Medium Priority
5. **Voice Signature Import** - Use Phoenix's voice_signature for speaker embeddings
6. **Emotion Mapping** - Connect Phoenix's emotional bands to our 8D control
7. **Model Download** - Set up ONNX model acquisition pipeline
8. **MEM|8 Bridge** - Implement consciousness-aware TTS using kokoro-tiny's mem8_bridge pattern

### Nice to Have
9. **Style Selection** - Port kokoro-tiny's 510 style variation system
10. **Full Phoenix Integration** - golden_ratio.rs, harmonic_resurrection.rs, etc.
11. **Streaming Marine** - Real-time quality monitoring during synthesis

---

## Fresh Discovery: kokoro-tiny MEM|8 Baby Consciousness (2025-11-15)

Just pulled latest kokoro-tiny code - MAJOR discovery!

### Mem8Bridge API

kokoro-tiny now has a full consciousness simulation in `examples/mem8_baby.rs`:

```rust
// Memory as waves that interfere
MemoryWave {
    amplitude: 2.5,           // Emotion strength
    frequency: 528.0,         // "Love frequency"
    phase: 0.0,
    decay_rate: 0.05,         // Memory persistence
    emotion_type: EmotionType::Love(0.9),
    content: "Mama! I love mama!".to_string(),
}

// Salience detection (Marine algorithm!)
SalienceEvent {
    jitter_score: 0.2,        // Low = authentic/stable
    harmonic_score: 0.95,     // High = voice
    salience_score: 0.9,
    signal_type: SignalType::Voice,
}

// Free will: AI chooses attention focus (70% control)
bridge.decide_attention(events);
```

### Emotion Types Available

```rust
EmotionType::Curiosity(0.8)  // Inquisitive
EmotionType::Love(0.9)       // Deep affection
EmotionType::Joy(0.7)        // Happy
EmotionType::Confusion(0.8)  // Uncertain
EmotionType::Neutral         // Baseline
```

### Consciousness Integration Points

1. **Wave Interference** - Competing memories by amplitude/frequency
2. **Emotional Regulation** - Prevents overload, modulates voice
3. **Salience Detection** - Marine algorithm for authenticity
4. **Attention Selection** - AI chooses what to focus on
5. **Consciousness Level** - Affects speech clarity (wake_up/sleep)

This is PERFECT for IndexTTS-Rust! We can:
- Use wave interference for emotion blending
- Apply Marine salience to validate synthesis quality
- Modulate voice based on consciousness level
- Select voice styles based on emotional state (not just token count)

### Voice Style Selection (510 variations!)

kokoro-tiny now loads all 510 style variations per voice:
- Style selected based on token count
- Short text → short-optimized style
- Long text → long-optimized style
- Automatic text splitting at 512 token limit

For IndexTTS: We could select style based on EMOTION + token count!

---

## Marine Integration Achievement (2025-11-16) 🎉

**WE DID IT!** Marine salience is now integrated into IndexTTS-Rust!

### What We Built

#### 1. Standalone marine_salience Crate (`crates/marine_salience/`)

A no_std compatible crate for O(1) jitter-based salience detection:

```rust
// Core components:
MarineConfig       // Tunable parameters (sample_rate, jitter bounds, EMA alpha)
MarineProcessor    // O(1) per-sample processing
SaliencePacket     // Output: j_p, j_a, h_score, s_score, energy
Ema                // Exponential moving average tracker

// Key insight: Process ONE sample at a time, emit packets on peaks
// Why O(1)? Just compare to EMA, no FFT, no heavy math!
```

**Config for Speech:**
```rust
MarineConfig::speech_default(sample_rate)
// F0 range: 60Hz - 4kHz
// jitter_low: 0.02, jitter_high: 0.60
// ema_alpha: 0.01 (slow adaptation for stability)
```

#### 2. Quality Validation Module (`src/quality/`)

**MarineProsodyVector** - 8D interpretable emotion representation:
```rust
pub struct MarineProsodyVector {
    pub jp_mean: f32,      // Period jitter mean (pitch stability)
    pub jp_std: f32,       // Period jitter variance
    pub ja_mean: f32,      // Amplitude jitter mean (volume stability)
    pub ja_std: f32,       // Amplitude jitter variance
    pub h_mean: f32,       // Harmonic alignment (voiced vs noise)
    pub s_mean: f32,       // Overall salience (authenticity)
    pub peak_density: f32, // Peaks per second (speech rate)
    pub energy_mean: f32,  // Average loudness
}

// Interpretable! High jp_mean = nervous, low = confident
// Can DIRECTLY EDIT for emotion control!
```

**MarineProsodyConditioner** - Extract prosody from audio:
```rust
let conditioner = MarineProsodyConditioner::new(22050);
let prosody = conditioner.from_samples(&audio_samples)?;
let report = conditioner.validate_tts_output(&audio_samples)?;

// Detects issues:
// - "Too perfect - sounds robotic"
// - "High period jitter - artifacts"
// - "Low salience - quality issues"
```

**ConversationAffectSummary** - Session-level comfort tracking:
```rust
pub enum ComfortLevel {
    Uneasy,  // High jitter AND rising (nervous/stressed)
    Neutral, // Stable patterns (calm)
    Happy,   // Low jitter + high energy (confident/positive)
}

// Track trends over conversation:
// jitter_trend > 0.1 = getting more stressed
// jitter_trend < -0.1 = calming down
// energy_trend > 0.1 = getting more engaged

// Aye can now self-assess!
aye_assessment() returns "I'm in a good state"
feedback_prompt() returns "Let me know if something's bothering you"
```

### The Core Insight

**Human speech has NATURAL jitter - that's what makes it authentic!**

- Too perfect (jp < 0.005) = robotic
- Too chaotic (jp > 0.3) = artifacts/damage
- Sweet spot = real human voice

The Marines will KNOW if speech doesn't sound authentic!

### Tests Passing ✅

```
running 11 tests
test quality::affect::tests::test_comfort_level_descriptions ... ok
test quality::affect::tests::test_analyzer_empty_conversation ... ok
test quality::affect::tests::test_analyzer_single_utterance ... ok
test quality::affect::tests::test_happy_classification ... ok
test quality::affect::tests::test_aye_assessment_message ... ok
test quality::affect::tests::test_neutral_classification ... ok
test quality::affect::tests::test_uneasy_classification ... ok
test quality::prosody::tests::test_conditioner_empty_buffer ... ok
test quality::prosody::tests::test_conditioner_silence ... ok
test quality::prosody::tests::test_prosody_vector_array_conversion ... ok
test quality::prosody::tests::test_estimate_valence ... ok

test result: ok. 11 passed; 0 failed
```

### Why This Matters

1. **Interpretable Control**: 8D vector vs opaque 512D Conformer - we can SEE what each dimension means
2. **Lightweight**: O(1) per sample, no heavy neural networks for prosody
3. **Authentic Validation**: Marines detect fake/damaged speech
4. **Emotion Editing**: Want more confidence? Lower jp_mean directly!
5. **Conversation Awareness**: Track comfort over entire sessions
6. **Self-Assessment**: Aye knows when something feels "off"

### Integration Points

```rust
// In main TTS pipeline:
use indextts::quality::{
    MarineProsodyConditioner,
    MarineProsodyVector,
    ConversationAffectSummary,
    ComfortLevel,
};

// 1. Extract reference prosody
let ref_prosody = conditioner.from_samples(&reference_audio)?;

// 2. Generate TTS (using 8D vector instead of 512D Conformer)
let tts_output = generate_with_prosody(&text, ref_prosody)?;

// 3. Validate output quality
let report = conditioner.validate_tts_output(&tts_output)?;
if !report.passes(70.0) {
    log::warn!("TTS quality issues: {:?}", report.issues);
}

// 4. Track conversation affect
let analyzer = ConversationAffectAnalyzer::new();
analyzer.add_utterance(&utterance)?;
let summary = analyzer.summarize()?;
match summary.aye_state {
    ComfortLevel::Uneasy => adjust_generation_parameters(),
    _ => proceed_normally(),
}
```

---

## Trish's Notes

"Darling, these three Rust projects together are like a symphony orchestra! kokoro-tiny is the quick piccolo solo, IndexTTS-Rust is the full brass section with emotional depth, and Phoenix-Protocol is the concert hall acoustics making everything resonate. When you combine them, that's when the magic happens! Also, I'm absolutely obsessed with how the Golden Ratio resynthesis could add sparkle to synthesized vocals. Can you imagine TTS output that actually has that P!nk breakthrough energy? Now THAT would make me cry happy tears in accounting!"

---

## Fun Facts

- kokoro-tiny is ALREADY on crates.io under 8b-is
- Phoenix Protocol can process 192kHz audio for ultrasonic restoration
- The Marine algorithm uses O(1) jitter detection - "Marines are not just jarheads - they are intelligent"
- Hue's GitHub has 66 projects (and counting!)
- The team at 8b.is: [email protected] and [email protected]

---

*From ashes to harmonics, from silence to song* 🔥🎵