Flac or Wav files are preferred.
Sample rate of 16000 Hz is preferred. Values lower than this may impair speech recognition accuracy, and higher levels have no appreciable effect on speech recognition quality.
We have found the "video" model to be most effective for recorded audio (not necessarily from video). Include model=video
.
To include timestamps, set the enableWordTimeOffsets
parameter to true in your request configuration enableWordTimeOffsets=true
For purposes of pricing, a "document" is 1000 characters including whitespace. So breaking up text into 1000 or just multiples of 1000 characters is the best value. 1100 characters is priced as 2 documents or twice the cost of 1000 chars.