When your employer produces an image video and you ask yourself why it does not have any subtitles.
Deaf people are not really your target audience
But how about when your phone is muted, when you are in a loud environment, ... Impairt does not mean all the time.
i'm sorry but NO
API from Google for transcribing audio content with accurate captions. All customers get 60 minutes for free per month.
In order to use this we need the audio of our image video as an wav file (not mp3).
$ ffmpeg -i Imagevideo-newcubator-1080.webm Imagevideo-newcubator-1080.wav
This auto file we can then give to the google speech-to-text service. (https://console.cloud.google.com/speech/transcriptions/imagevideo-newcubator-720-b4ee1092432122cc4b4a37d4518b45cc-3d780f784be3685e-4d0e2/workspace/newcubator/recognitionTask/19199a3a-9091-492e-be63-a9360db218ac?project=arkania-253918&supportedpurview=project)
As a result we get a nice JSON file with the exact position of each recognizes word. But this is no help for my VLC Player or any Web-Browser. What we want is an srt file. Luckily google has a sample project for that
git clone https://github.com/GoogleCloudPlatform/community.git cd community/tutorials/speech2srt pip3 install -r requirements.txt gcloud iam service-accounts keys create ~/Desktop/ml-dev.json --iam-account email@example.com python3 speech2srt.py --storage_uri gs://newcubator/imagevideo-newcubator-720-b4ee1092432122cc4b4a37d4518b45cc.wav --language_code de --audio_channel_count 2 --sample_rate_hertz 44100
Time to see some results and have some fun. The speech-to-text subtitles are not perfect in any way but a lot of help. In order to tweek the result I found happyscribe