Rise of the ChatBots (2) - They can hear and speak
- Published on

- Introduction
- Integrate Speech-to-Text with ChatGPT Library
- Integrate Speech-to-Text with direct API call (Rust)
- Integrate Text-to-Speech with ChatGPT Library
- Integrate Text-To-Speech with direct API call (Rust)
Introduction
Integrating Speech-to-Text and Text-to-Speech functionalities with OpenAI's powerful AI models can enhance applications by providing capabilities to convert spoken language into written text and vice versa. With the recent updates to OpenAI's library, developers can now easily incorporate these features into their software using Python and Rust programming languages.
Get ready to unlock new avenues of interaction within your projects by harnessing the power of speech recognition and synthesis with OpenAI's cutting-edge technology.
Integrate Speech-to-Text with ChatGPT Library
OpenAI has just updated its library, and in Python, it is now possible to transcribe an audio file with just a few lines of code.
import os
from openai import OpenAI
from dotenv import load_dotenv
# Load environment variable
load_dotenv()
# Create the client OpenAI
client = OpenAI()
# Load your API key from an environment variable or secret management service
client.api_key = os.getenv("OPENAI_API_KEY");
# Declare the path for your file
file = "question.m4a"
# Open the file
audio_file= open(file, "rb")
# Transcription
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
# You can choose your output language!
language="en"
)
# Print the result
print(transcript.text)
You will notice that it is possible to choose the output language... For example, you can speak in French and the result of the transcription will come out in English.
Integrate Speech-to-Text with direct API call (Rust)
The below code aims at sending an audio file to OpenAI's servers for transcription and handling responses appropriately in their respective programming languages with proper error handling mechanisms in place. You can check out the full source code from my GitHub repo : https://github.com/claziosi/RustAI
// Define structures for deserializing response JSON.
// Note: The actual structure may vary depending on OpenAI's API response format.
#[derive(serde::Deserialize)]
struct TranscriptionResponse {
text: String,
}
const API_URL: &str = "https://api.openai.com/v1/audio/transcriptions";
pub async fn transcription(file_path: &Path) -> Result<String, Box<dyn std::error::Error>> {
let api_key = std::env::var("OPENAI_API_KEY").expect("OPENAI_API_KEY not set");
let model_name = "whisper-1";
// Read the file content into a byte vector
let file_content = tokio::fs::read(file_path).await?;
// Create a multipart form
let part = multipart::Part::bytes(file_content)
.file_name("audio.m4a")
.mime_str("audio/m4a")?; // Make sure to set the correct MIME type for your audio file
let form = multipart::Form::new()
.part("file", part)
.text("model", model_name.to_string());
// Build client and make the request
let client = Client::new();
let response = client.post(API_URL)
.bearer_auth(api_key)
.multipart(form)
.send()
.await?;
if response.status().is_success() {
if let Ok(transcription_response) = response.json::<TranscriptionResponse>().await {
return Ok(transcription_response.text.clone());
} else {
Err("Failed to parse JSON response".into())
}
} else {
Err(format!("Error making request: {:?}", response.status()).into())
}
}
Integrate Text-to-Speech with ChatGPT Library
The Audio API provides a speech
endpoint based on our TTS (text-to-speech) model. It comes with 6 built-in voices and can be used to:
- Narrate a written blog post
- Produce spoken audio in multiple languages
- Give real time audio output using streaming
source: https://platform.openai.com/docs/guides/text-to-speech
from pathlib import Path
from openai import OpenAI
client = OpenAI()
speech_file_path = Path(__file__).parent / "speech.mp3"
response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input="Today is a wonderful day to build something people love!"
)
response.stream_to_file(speech_file_path)
Integrate Text-To-Speech with direct API call (Rust)
This code defines an asynchronous function called text_to_speech
, which sends a JSON payload to OpenAI's text-to-speech endpoint and saves the resulting audio stream as an .mp3
file.
use std::{path::{Path, PathBuf}, error::Error, fs::File, io::Write};
use reqwest::{Client, multipart};
use tokio;
// Define a structure for the request body.
#[derive(serde::Serialize)]
struct TextToSpeechRequest {
model: String,
input: String,
voice: String,
}
pub async fn text_to_speech(input_text: &str, voice: &str,
) -> Result<PathBuf, Box<dyn Error>> {
const API_URL: &str = "https://api.openai.com/v1/audio/speech";
let api_key = std::env::var("OPENAI_API_KEY").expect("OPENAI_API_KEY not set");
// Prepare the request body.
let body = TextToSpeechRequest {
model: "tts-1".to_string(),
input: input_text.to_string(),
voice: voice.to_string(),
};
// Create an HTTP client instance.
let client = Client::new();
// Perform the POST request.
let response_bytes = client
.post(API_URL)
.bearer_auth(api_key)
.json(&body)
.send()
.await?
.error_for_status()? // Ensure we have a successful response code (e.g., 2xx).
.bytes()
.await?;
// Write the received bytes into an MP3 file.
let output_path = PathBuf::from("speech.mp3");
let mut file = File::create(&output_path)?;
file.write_all(&response_bytes)?;
Ok(output_path)
}
We are now able to talk to bots and hear their responses.