Jihadi pollution

How content labelling could drive out the jihadi pollution of the internet

In mid-March 2017 I published, on my LinkedIn page, an article on Age Verification under the Digital Economy Bill 2016-17 (DEB). Within it I set out how the provisions of Clause 91 of the DEB (now Section 104 of the Digital Economy Act 2017) would effectively mandate age verification to protect children from seeing inappropriate content and then outlined how the SafeCast Headcode labelling of content could enable the provisions of the DEB to achieve its child protection goals. Our work has established that it is possible to use just six head codes to label all video content on television and the internet - and these head codes are called the SafeCast Headcode labelling system. (For further information on this please see this video on the SafeCast website and try out the demonstrators on the SafeCast site)

Since then various parties have asked me how content labelling could stop 'bad people' from simply falsely labelling their content so that it was not filtered out. My reply is straightforward - there are lots more good people than bad people and there are two connected mechanisms that can be deployed.

The first is a universal practice to label content correctly and a human feedback system which quickly identifies mis-labelled content through complaints which can be subject to immediate take-down notices. Systems like this already are in place in respect of copyright infringement notices served on Google, Facebook and others by the major movie and music companies.

The second one is the advances in Artificial Intelligence and Big Data which can work on the huge volume of correctly identified content. Large companies such as Google, YouTube and Facebook can deploy Artificial Intelligence on their platforms to support the creation of a safer internet for children without having to become internet censors. So long as there is far more correctly labelled good content than bad incorrectly labelled content, the good content can be used to teach artificial intelligence systems to identify what constitutes good content and how to distinguish it from bad content.

Like the latest cytology systems in hospital labs which count and identify cervical cancer cells in PAP-smear test samples and do so far better than humans can do, modern digital platforms in the cloud can be made capable of cleaning up the internet so that it becomes safe for children - leaving only a few doubtful cases which will require human intervention and take-down notices.

In support of my arguments, I turn to the work of Professor Andrew Blake, the Founding Director of the Alan Turing Institute. In a lecture at the Alan Turing Institute on 2 December 2016 Professor Blake, whose work at Microsoft Research has been especially directed at vision systems, gave a short masterclass on the latest developments in Machine Learning and how Big Data is changing the way in which we can identify and label anything through the use of deep neural networks so long as we have enough examples to feed into our systems. (For the full analysis please look at Professor Blake's YouTube video lecture here).

Professor Blake's insights were supported by the commentator Benedict Evans who works for Andreessen Horowitz ('a16z'), a venture capital firm in Silicon Valley that invests in technology companies. Three days after Professor Blake's masterclass, Ben Evans published his annual end of year presentation which, for 2016, was entitled "Mobile is Eating the World". About six minutes into his presentation Ben identifies the huge shift that has taken place in Machine Learning as a means whereby Artificial Intelligence will be able to search and classify still and video images in an intelligent manner - enabling computers to attach meaning to video images. We are quickly arriving in a world where computers will be able to read images as easily as text.

Already one company is offering an application development toolkit for scanning millions of hours of video. Matroid, founded by Stanford University adjunct professor Reza Zadeh, can scan video in an intelligent manner. A brand advertiser at a car company may want to know when their product appears, unscripted, in hundreds of hours of TV shows or online videos. Or when Piers Morgan appears, or how often a man holding a banana is recorded. Users can easily write a filter — Matroid calls them detectors — of their own to find particular people or objects, or they can pick from a library of pre-programmed filters designed by the startup.

When Matroid 'detectors' are linked to Big Data video content that has been labelled using the SafeCast Headcodes, then we can give our children and society at large a "Cleanfeed" internet without censorship and at the same time 'take back control' of the internet from the black-flag waving cohorts of ISIS and its deluded supporters.