Spaces:
Running
Running
talk.wasm : update README.md
Browse files
examples/talk.wasm/README.md
CHANGED
|
@@ -1,7 +1,38 @@
|
|
| 1 |
-
# talk
|
| 2 |
|
| 3 |
-
|
| 4 |
|
| 5 |
-
|
| 6 |
|
| 7 |
-
demo: https://talk.ggerganov.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# talk.wasm
|
| 2 |
|
| 3 |
+
Talk with an Artificial Intelligence entity in your browser:
|
| 4 |
|
| 5 |
+
https://user-images.githubusercontent.com/1991296/202914175-115793b1-d32e-4aaa-a45b-59e313707ff6.mp4
|
| 6 |
|
| 7 |
+
Online demo: https://talk.ggerganov.com
|
| 8 |
+
|
| 9 |
+
## How it works?
|
| 10 |
+
|
| 11 |
+
This demo leverages 2 modern neural network models to create a high-quality voice chat directly in your browser:
|
| 12 |
+
|
| 13 |
+
- [OpenAI's Whisper](https://github.com/openai/whisper) speech recognition model is used to process your voice and understand what you are saying
|
| 14 |
+
- Upon receiving some voice input, the AI generates a text response using [OpenAI's GPT-2](https://github.com/openai/gpt-2) language model
|
| 15 |
+
- The AI then vocalizes the response using the browser's [Web Speech API](https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API)
|
| 16 |
+
|
| 17 |
+
The web page does the processing locally on your machine. However, in order to run the models, it first needs to
|
| 18 |
+
download the model data which is about ~350 MB. The model data is then cached in your browser's cache and can be reused
|
| 19 |
+
in future visits without downloading it again.
|
| 20 |
+
|
| 21 |
+
The processing of these heavy neural network models in the browser is possible by implementing them efficiently in C/C++
|
| 22 |
+
and using WebAssembly SIMD capabilities for extra performance. For more detailed information, checkout the
|
| 23 |
+
[current repository](https://github.com/ggerganov/whisper.cpp).
|
| 24 |
+
|
| 25 |
+
## Requirements
|
| 26 |
+
|
| 27 |
+
In order to run this demo efficiently, you need to have the following:
|
| 28 |
+
|
| 29 |
+
- Latest Chrome or Firefox browser (Safari is not supported)
|
| 30 |
+
- Run this on a desktop or laptop with modern CPU (a mobile phone will likely not be good enough)
|
| 31 |
+
- Speak phrases that are no longer than 10 seconds - this is the audio context of the AI
|
| 32 |
+
- The web-page uses about 1.4GB of RAM
|
| 33 |
+
|
| 34 |
+
## Feedback
|
| 35 |
+
|
| 36 |
+
If you have any comments or ideas for improvement, please drop a comment in the following discussion:
|
| 37 |
+
|
| 38 |
+
https://github.com/ggerganov/whisper.cpp/discussions/167
|
examples/talk.wasm/index-tmpl.html
CHANGED
|
@@ -46,6 +46,10 @@
|
|
| 46 |
|
| 47 |
<hr>
|
| 48 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
<div id="model-whisper">
|
| 50 |
<span id="model-whisper-status">Whisper model:</span>
|
| 51 |
<button id="fetch-whisper-tiny-en" onclick="loadWhisper('tiny.en')">tiny.en (75 MB)</button>
|
|
|
|
| 46 |
|
| 47 |
<hr>
|
| 48 |
|
| 49 |
+
Select the models you would like to use and click the "Start" button to begin the conversation
|
| 50 |
+
|
| 51 |
+
<br><br>
|
| 52 |
+
|
| 53 |
<div id="model-whisper">
|
| 54 |
<span id="model-whisper-status">Whisper model:</span>
|
| 55 |
<button id="fetch-whisper-tiny-en" onclick="loadWhisper('tiny.en')">tiny.en (75 MB)</button>
|