Make your own ChatGPT-based Voice Assistant

In this tutorial, we will make our own ChatGPT based voice assistant using ESP32. By reading this article carefully and performing each step correctly, you will easily succeed in making this project.

Hardware Preparation

Let’s start with the hardware part. To make this project, you will need all the following components.

  • ESP32 development board
  • I2S microphone
  • I2S audio amplifier
  • Speaker
  • Infrared sensor module
  • TP4056 Battery Charger Module
  • Battery

Connect all components correctly according to the schematic. Next, design a custom printed circuit board (PCB) for it. Cut a large hole in the PCB so that the speaker can be easily fixed to it.

You can download any DFM design analysis software for Windows, which will analyze your PCB file in detail and generate a report. The report contains all the details and problems of the PCB. With the help of this software, you can modify the PCB to solve the problem, and then to the manufacturer.

After getting the PCB, start soldering all the parts one by one. Only one male header pin was used so that the components could be reused after the project. In this project, there is only one surface mount component, the HD7333 voltage regulator IC. after soldering all the components, the final project looks very simple and compact.

Getting the API key

Now that the hardware part is done, let’s get the API keys and write the code. There are two API keys to get, one from Google Cloud and the other from OpenAI.

Google Cloud API key

To get the API key from Google Cloud, you need to visit and sign in with your Google account. Next, search for “speech to text” and click on the link, and you’ll see that you can start with a free $300 API credit. After clicking “Free Trial”, select your country and enter your company name.

At this point you’ll need to provide your credit card information, and Google will charge a small fee to verify it, so don’t worry. Once you have provided your card details and entered your CVV number and OTP, the first step is complete. In the second step, you’ll need to provide some other information, such as proof of credit card ownership and other documents.

In the past, it was enough to provide the credit card, but now the rules have changed. Now you have to provide documentation to prove that the credit card is yours, as well as proof of identity. As proof of the card, prepare a document showing the front and back of your credit card. Upload both sides at the same time, because only your name is on the front side, and the card number is on the back. You can hide important information such as expiration date, CVV code and only show the last four digits of the card number. As a proof of identity, upload your PAN card, as they need an identity proof that is exactly the same as the name on the credit card.

Once you’ve uploaded both documents, click submit and wait a few days for review. Refresh the page and go to your Google Cloud account homepage. Now let me show you the general process of enabling the Speech to Text API. You need to search for “Speech to Text”, click “Speech to Text API”, and then click “Enable”. Then go to “Credentials” to create credentials and select “API Key”. It will automatically generate an API key for you, you need to copy it and save it on your computer, because it will be used for encoding.

OpenAI API key

Now, we have finished the steps to get the speech-to-text API key from Google Cloud. The next step is to get the OpenAI API key. You need to visit and sign in with your Google account. After that, fill in the basic details and click “Agree”. If you go to the “Use” section, you’ll see that you’ll initially get a free $5 credit to use their API.

Then go to the “API Key” page, where we first need to verify the cell phone number to get the key. Once the verification is done, give the key a name like “ChatGPT”. You will get a secret key, again copy it and save it to your computer as you will not be able to view it again.

Once you’ve got the key, let’s verify that everything is OK with OpenAI. Open the Postman application and click on File > New > HTTP Request. In the “Header” section, add two key-value pairs: Content-Type is set to application/json; Authorization is set to “Bearer” plus the API key you just got.

If you paste the previously used content in the “Body”, which includes fields for the model name, hint, temperature and token, and hit send, see what happens. Now you get a “Model has been discarded” error. This is a very common error that occurs recently, and the solution is to visit the official OpenAI page. As you can see, the old model was closed on January 4, and we need to replace it with the new one. Copy it and paste it into the text and try again. This time we get a full response from OpenAI, which means everything is fine. In the future, if this new model also closes, you now know how to replace it and keep it working.

Code Integration

Now that we have the two API keys, it’s time to start coding and see how to integrate them in the code.

For the code part, you need to visit this GitHub repository:

Visit the “ESP32 ChatGPT v2” repository and download the ZIP file. After unzipping, you will find two folders, one is Speech to Text and the other is Text to Speech. Let’s enter the Speech to Text folder first.

In this folder, there are many different libraries, you need to keep them in the same folder, do not replace or modify the location. People often make the mistake of separating the Arduino files from the library files. But double clicking on the folder will open the Arduino IDE.

All the necessary precautions have been provided in the code to upload the code on your system or to get the code to work properly.

First, you need to install version 1.0.6 of the ESP32 development board package. To get the same version, you first need to go to File > Preferences on the Arduino and paste the specified link in there. Then go to Sketch > Include Libraries > Manage Libraries, type in “ESP32” and select the specific version 1.0.6.

This is the first step you need to follow. Second, all files must be located in the same folder named “ESP32TextToSpeech”, as already mentioned. Third, all credentials – such as WiFi SSID, password, Google Cloud key and OpenAI API key – need to be entered into a single header file called “credentials.h”. This step has been simplified by not entering the credentials in separate header files as before.

Created a single header file credentials.h where you can provide the SSID and password of your WiFi router. Next provide the Google Speech to Text API key and then select the language to use, currently set to Hindi English. After that comes the ChatGPT credentials, first the ChatGPT API token, then the OpenAI model – the earlier model was called Da Vinci but it doesn’t work anymore, the new model is GPT-3.5 Turbo. next is the Temperature and Maximum token variable, you can learn more about it in the ChatGPT documentation.

All credentials are simply entered into this header file, which is the only header file you need to change, everything else is left as is. No other changes are required. Once everything is written, you can upload the code to the ESP32 board. First select the board – go to Tools > Boards > ESP32 Boards, scroll down and select “ESP32 DevKitV1”. After that, select the correct COM port and click the Upload button to upload the code to the ESP32 board for speech-to-text function.

Similarly, we move on to the next code for text-to-speech functionality, which is much simpler. Go to the folder you downloaded from GitHub again, this time to the “TextToSpeech” folder. You can see that there is only one Arduino file here, so quickly open it. There are also a few prerequisites to make sure the code works properly on your system.

First of all, you still need to install the ESP32 development board package version 1.0.6. Then you need to install the Audio.h library, the link to which is given in the code. Simply click here to open the GitHub repository and click “Code” > “Download ZIP” to download the zip file. Then go to Sketch > Include Libraries > Add .ZIP Library in the Arduino IDE, select the folder where you downloaded the library, click on the ZIP file, and click the Select button. This will add the library to your Arduino IDE. Make sure you add the same library that is mentioned in the code, many people have installed different Audio.h libraries and run into problems.

Thirdly, you also need to enter the WiFi SSID and password credentials in the same file, which is here. You only need to provide the WiFi SSID and password, that’s all you need to change in this code, the rest stays the same.

Now, after selecting the correct board and port, click the Upload button again to upload the text-to-speech code to another ESP32 board. This is the easiest way to get your project running successfully locally without errors.

Test Run

After uploading the code to both ESP32 boards, let’s power up the device and ask ChatGPT some questions. Wave your hand so that the proximity sensor lights up red, when the device is listening to your voice. When the green light comes on, it means your voice has been sent to Google Cloud for speech-to-text conversion. Once the text is available, it’s sent to the OpenAI servers, where the light turns blue. After all the processing, you can see that we are receiving from ChatGPT and getting a response through the speaker.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top