voxmedia/Transcriber

Name: Transcriber

Owner: Vox Media

Description: NWJS os x desktop based application that given a video/audio file returns a transcription using IBM Watson Speech to text API

Created: 2016-05-26 18:13:33.0

Updated: 2018-04-01 13:06:31.0

Pushed: 2017-01-09 21:43:32.0

Homepage: https://voxmedia.github.io/Transcriber

Size: 8303

Language: HTML

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Lightweight Speech to text desktop app for OSX Using IBM Watson API

This app was an initial prototype to test the quality of IBM STT, and is no longer activly supported I am now working on a more full fledge version at https://github.com/OpenNewsLabs/autoEdit_2 ( http://www.autoedit.io ).

IBM Speech to text API

To use this app you need to get IBM Watson API keys for their speech to text service, by making an account with Bluemix

Usage - Development

If you clone the repo you can start the app with npm start.

Usage - User

Or you can get the latest release packaged and ready for use here

This is a Tray Menu app.

Transcriber menu

First you Select Media, audio or video you'd like to transcribe.

Notifications show when a transcription as started and when it's finished.

On completion a editable text area shows you the transcription.

demo

By default the transcription is also saved to clipboard.

You can disable Autosave to clipboard if working on text editing or making use of the system clipboard for some other program to avoid it overwriting something else you might be doing with it.

Setting IBM Watson API keys

First time you start the application you'd be prompt to set the API keys.

Should you need to change those you can use shortcut cmd + shift + a.

These are saved inside the app as a json file wttskeys.json at the root of the application.

Which is in the .gitignore so that it doesn't accidentally gets added to git by mistake, when in development mode.

Overview of project
Technical overview
Convert video to audio

The video_to_audio module converts video or audio into IBM audio specs. Initially modified from Sam Lavine's gist.

Audio files are saved in ./tmp/audio folder.

IBM Speech to text API

The stt folder contains the module to interact with the IBM Speech to text API. If you want to dive more into this their documentation on how to interact with the API is pretty good.

Transcribing video

transcribe.js requires both modules described above and brings it all together.

Converts audio into video, and then sends it to Watson for transcriptions. Transcriptions are saved onto a text file in ./tmp/text folder.

module returns the path to the text file.

index.js abstracts transcribe.js in case the interface needs to change at a later stage.

NWJS

indext.html contains the Implementation of the NWJS app. Adding Menu Tray to the application.

See comments in the code `./index.html and nwjs wiki as well as nwjs documentation for more on this.

User flow

When a user selects a video it's transcribed, appropriate system notifications for start and end are triggered.

When done unless option is un-ticked transcription is saved to clipboard.

in which case user can click on Copy transcriptions to cliboard to get the transcriptions.

Build NWJS app
Option 1

Use deploy script

 deploy.js

This creates a build folder inside the repo. The build folder is also in .gitignore to avoid accidentally pushing it to remote.

Option 2

To rebuild the app in NWJS refer to the documentation

Install nw-builder

install -g nw-builder

From one level above the application folder (cd .. from root of repo)

ild -p osx64 ./transcriber

creates a build folder that contains the app

Todo

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.