DronesoundTV

Extreme YouTube ChatBot on Raspberry Pi

Click to view

Platform

Raspberry Pi, YouTube

Components

Perl, Java, YouTube API, Freesound API

Release

This project is a YouTube Livestream - go there.

See also User Guide 1.0 and GitHub

DronesoundTV is a livestream generated by a Raspberry Pi single-board Linux computer, which mixes the latest content from the Freesound.org sampling community with visual displays and audience interaction, into an avant-garde 21st-Century remake of television "Info" channels. Yes, it's a bit Punk Rock. Viewers participate directly from the YouTube chat by adding to the key words used in searching Freesound for samples: "Drone violins", for example, will add violin samples to the mix.

Users furthermore have near-complete control of the entire show: They can change layouts, background colors and images, import animated GIFs and even extend the remote control command set.

The ultimate goal is a channel that is defined by its own user community and the latest content from popular culture. More sources of real-time content - such as SoundCloud and News Feeds - are in the works.

Behind the scenes is "StageGhost", my ongoing project for a "live stream performer's remote stage hand". DronesoundTV is basically the reference implementation of StageGhost, on YouTube.

Background

This looks like the latest incarnation of my "Sonic Monkey" family of "dronesound" gadgets, such as the two for Android - but beyond the shared concept, everything in DTV is from scratch as all content in this case is taken from off-board.

Geek Details

See the project's Official Hackaday Site for even more geekery.

In the not-great-but-passable "Geek Sesh" video above, I describe most of DronesoundTV's theory of operation.

YouTube Data API / Streaming API

It's by way of these APIs that viewer requests are accepted, sample credits are reported in chat and - most recently - bans are automatically issued for chat users who use expletives.

Freesound API and Analysis Features

The diagram in this article shows the method by which DronesoundTV selects sounds from a combination of configured keywords and viewer submissions.

But one shortcoming of the the diagram is that it doesn't enough emphasize the effect of sounds leading to other sounds based on their similarity. Imagine an arrow going from the "mix" back into the left side: This part of the logic is a feedback loop with about 50% decay rate.

Most of the sounds playing at any given time were selected based on their similarity to other sounds that had been playing previously, and were highly-rated by the Freesound community. Similarity is based on the tuning and detected rhythmic properties of a sound - not the words with which it was associated.

By this method, user-requested violin samples lead to an opera vocals of the same pitch but different timbre, which then lead to a poetry reading because it had a compatible BPM (Beats Per Minute).

This is only made possible by Freesound's excellent Sound Analysis features, which has an extensive sonic profile for each sample.

The results are a constantly-morphing stream of sound which is always changing, because the Freesound community is always active.

SOX

The venerable SOX audio mixer is used to re-mix the sounds, also applying special effects such as pitch bends, delays, reverbs - an area of autonomous creativity that the project will be exploring in depth.

Future Plans

I've always wanted this to be more of an AI project, but without resorting to the sort of mimicry for which I've cricitized the Google Art Project, a system like this requires an audience for feedback.

In other words: It either learns to impersonate something that is pre-defined as "sounding good", or it is directed in an unspecific search for "a good sound" by external feedback - i.e. viewers.

As more interactions become available, I'll be moving from mostly static logic to a malleable, genetic-algorithm inspired model with reinforcement powered by viewers.

Once I get some video-capture capabilities, I will be able to explore tying the visuals directly to the sound. With no direct video output from the Raspberry Pi, I am limited to simple ASCII and ANSI displays.

Something at the bottom, related resources perhaps.