susom/docker-ffmpeg

Name: docker-ffmpeg

Owner: Stanford School of Medicine

Description: a ffmpeg web service

Created: 2017-05-17 18:39:22.0

Updated: 2018-04-25 22:27:38.0

Pushed: 2017-05-17 18:56:35.0

Homepage: null

Size: 6

Language: PHP

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

docker-ffmpeg

a ffmpeg web service

This is a container that does two things:

1) It wraps ffmpeg as a web service so you can post audio and get back a compressed version of the audio in a variety of formats 2) It integrates with IBM Bluemix Watson to provide audio to text capability

https://www.ibm.com/watson/developercloud/doc/speech-to-text/index.html

Building Container

er build -t ffmpeg-web-service:latest .

Running Container

er run -d --name 'ffmpeg' \
bluemix_user_pass=xxx:yyy" \
080:80 \
start 'unless-stopped' \
eg-web-service:latest

where xxx:yyy equals the service credentials provided from your watson service. You must first create a watson service with the speech-to-text engine. From here you should be able to provision credentials like the following:


rl": "https://stream.watsonplatform.net/speech-to-text/api",
sername": "xxx",
assword": "yyy"

Web Service

This web service runs FFMPEG to convert a single input file into a single output file.

:   The upload file contents with a valid file suffix to indicate the format
on: TRANSCODE (default) or TRANSCRIBE

Transcoding:
format: The output file format (wav/mp3/etc...)
rate:   (optional) override the encoding rate in Hz (e.g. 16000)

Transcribing:
language: en (default), es (spanish), zh (chinese), pt (portuguese)

Example Request:

POST:
    file => "audio_1.wav"
    format => 'mp3'
    rate => '48000'

Will return a streamed MP3 file or error message

POST:
    file => "audio_1.amr"
    language => 'es'

Will return a json object for the translation of the audio, such as:


"results": [
    {
        "alternatives": [
            {
                "confidence": 0.28,
                "transcript": "flat out a few birds "
            }
        ],
        "final": true
    }
],
"result_index": 0

Since the transcription service only accepts 16kHz or higher samples, in the example above, the 8 Khz amr file will be transiently transcoded to a 16kHz wav file and then sent off for transcription. This can take 30 seconds or longer.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.