ggerganov commited on
Commit
5cb7243
·
unverified ·
1 Parent(s): 411c667

talk.wasm : update README.md

Browse files
examples/talk.wasm/README.md CHANGED
@@ -1,7 +1,38 @@
1
- # talk
2
 
3
- WIP IN PROGRESS
4
 
5
- ref: https://github.com/ggerganov/whisper.cpp/issues/154
6
 
7
- demo: https://talk.ggerganov.com
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # talk.wasm
2
 
3
+ Talk with an Artificial Intelligence entity in your browser:
4
 
5
+ https://user-images.githubusercontent.com/1991296/202914175-115793b1-d32e-4aaa-a45b-59e313707ff6.mp4
6
 
7
+ Online demo: https://talk.ggerganov.com
8
+
9
+ ## How it works?
10
+
11
+ This demo leverages 2 modern neural network models to create a high-quality voice chat directly in your browser:
12
+
13
+ - [OpenAI's Whisper](https://github.com/openai/whisper) speech recognition model is used to process your voice and understand what you are saying
14
+ - Upon receiving some voice input, the AI generates a text response using [OpenAI's GPT-2](https://github.com/openai/gpt-2) language model
15
+ - The AI then vocalizes the response using the browser's [Web Speech API](https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API)
16
+
17
+ The web page does the processing locally on your machine. However, in order to run the models, it first needs to
18
+ download the model data which is about ~350 MB. The model data is then cached in your browser's cache and can be reused
19
+ in future visits without downloading it again.
20
+
21
+ The processing of these heavy neural network models in the browser is possible by implementing them efficiently in C/C++
22
+ and using WebAssembly SIMD capabilities for extra performance. For more detailed information, checkout the
23
+ [current repository](https://github.com/ggerganov/whisper.cpp).
24
+
25
+ ## Requirements
26
+
27
+ In order to run this demo efficiently, you need to have the following:
28
+
29
+ - Latest Chrome or Firefox browser (Safari is not supported)
30
+ - Run this on a desktop or laptop with modern CPU (a mobile phone will likely not be good enough)
31
+ - Speak phrases that are no longer than 10 seconds - this is the audio context of the AI
32
+ - The web-page uses about 1.4GB of RAM
33
+
34
+ ## Feedback
35
+
36
+ If you have any comments or ideas for improvement, please drop a comment in the following discussion:
37
+
38
+ https://github.com/ggerganov/whisper.cpp/discussions/167
examples/talk.wasm/index-tmpl.html CHANGED
@@ -46,6 +46,10 @@
46
 
47
  <hr>
48
 
 
 
 
 
49
  <div id="model-whisper">
50
  <span id="model-whisper-status">Whisper model:</span>
51
  <button id="fetch-whisper-tiny-en" onclick="loadWhisper('tiny.en')">tiny.en (75 MB)</button>
 
46
 
47
  <hr>
48
 
49
+ Select the models you would like to use and click the "Start" button to begin the conversation
50
+
51
+ <br><br>
52
+
53
  <div id="model-whisper">
54
  <span id="model-whisper-status">Whisper model:</span>
55
  <button id="fetch-whisper-tiny-en" onclick="loadWhisper('tiny.en')">tiny.en (75 MB)</button>