A friend of mine was looking for an AI transcription tool – after a quick Google search and not finding anything in the target price range (free) – I decided to ask ChatGPT 4 to help me spin up a python based tool using some widely available libraries. The end result isn’t perfect..but it’s fantastic all things considered.
Step 1: Installing Python
For Windows Users:
- Download Python:
- Visit the official Python website at python.org.
- Click on the “Download Python” button. This should automatically select the version suitable for Windows.
- Install Python:
- Open the downloaded
.exe
file to start the installation. - Make sure to check the box that says “Add Python X.X to PATH” at the bottom of the installation window. This step is crucial as it allows you to run Python from the Command Prompt.
- Click “Install Now” and follow the on-screen instructions to complete the installation.
- Open the downloaded
For macOS Users:
- Install Homebrew (if not installed):
- Open the Terminal.
- Paste
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
and press Enter. Follow the on-screen instructions.
- Install Python:
- After Homebrew is installed, type
brew install python
in the Terminal and press Enter.
- After Homebrew is installed, type
For Linux Users:
- Most Linux distributions come with Python pre-installed. You can check by opening a Terminal and typing
python3 --version
. If it’s not installed, or you need a different version, you can install it via your distribution’s package manager. For example, on Ubuntu, you would usesudo apt-get install python3
.
Step 2: Installing Required Libraries
- Open Command Prompt or Terminal:
- Windows: Open the Start menu and type “cmd”, then press Enter.
- macOS/Linux: Open the Terminal application.
- Install Libraries:
- Type the following command and press Enter:
bash
pip3 install SpeechRecognition pydub imageio-ffmpeg
- If
pip3
doesn’t work, try replacing it withpip
.
- Type the following command and press Enter:
Step 3: Preparing the Script
- Download the Python Script:
- Save the file (downloadable here), to your Desktop or in any folder you can easily access.
Step 4: Running the Script
- Open Command Prompt or Terminal again:
- Navigate to the folder where you saved
xscribe.py
. You can use thecd
command followed by the directory path. For example,cd Desktop
if you saved it on the Desktop.
- Navigate to the folder where you saved
- Execute the Script:
- Type the following command, replacing
path/to/file
with the path to your audio file andpath/to/transcription.txt
with the desired path for your output file. For example:bashpython3 xscribe.py /path/to/your/audio/file.mp3 /path/to/your/transcription.txt
- Press Enter.
- Type the following command, replacing
Additional Notes
- For Non-WAV Files: The first time you run the script with an MP3 file, it might take a little longer because it’s downloading
ffmpeg
in the background. - Troubleshooting: If you encounter any errors related to permissions (especially on macOS and Linux), you might need to prepend
sudo
to the pip installation command and enter your computer’s password when prompted.
Performance
But does it work? I wasn’t sure so I asked ChatGPT to write a script to really test the limits of common transcription algorithms. Designed to be particularly challenging, it gave me this:
Speech Recognition Test Script
Good morning, everyone. Today, we’re embarking on a fascinating journey through the realms of artificial intelligence and machine learning. These technologies have revolutionized how we interact with the digital world, from simple tasks like asking Siri for the weather forecast to complex financial analyses.
Let’s delve into some specifics:
- Homophones: She sells sea shells by the sea shore. The principal principle of this algorithm is its principal component analysis.
- Technical Jargon: Quantum computing offers a paradigm shift in computational capabilities, leveraging qubits over binary bits. Cryptocurrency mining has become a computational arms race, with Ethereum and Bitcoin at the forefront.
- Numbers and Dates: The rover landed on Mars on February 18, 2021, marking a significant milestone. Its mission is to explore the Jezero crater, believed to be an ancient lakebed.
- Challenging Words: The pseudopseudohypoparathyroidism diagnosis was confirmed through comprehensive tests. Worcestershire sauce is often mispronounced and is a common ingredient in savory dishes.
- Conclusion: As we navigate through these advancements, it’s imperative to critically assess the implications on privacy, security, and societal norms. The future is bright, but it’s our collective responsibility to steer this technology towards the betterment of humanity.
Which resulted in this:
Note this was all on one line. Not the best for formatting purposes, but hey – you get what you pay for!
good morning everyone today we’re embarking on a fascinating Journey Through the Realms of
artificial intelligence and machine learning these technologies have revolutionized how I
interact with the digital world from simple tasks like asking Siri for the weather forecast to
complex financial analysis let’s delve into some specifics homophones she sells seashells by the
seashore the principal principal of this algorithm is its principal component analysis to
technical jargon Quantum Computing offers a paradigm shift and computational capabilities
leveraging qubits over binary bits cryptocurrency mining has become a computational 18th 2021
marketing a significant Milestone its mission is to explore the Jewish believe to be an ancient
words hypothyroidism diagnosis was confirmed through comprehensive tests or sauce is pronounced
and his common is a common ingredients conclusion as we navigate through these advancements it’s
imperative to critically assess the implications on privacy security and societal Norms the
future is bright but it’s our Collective responsibility to steer this technology towards the
betterment of humanity