5 STAR AI.IO

TOOLS

FOR YOUR BUSINESS

Our  NEW Site  OFFERS FREE AI TOOLS & FREE STUDIES To SITES With  5-STAR Artificial  Intelligence TOOLS That Will HELP YOU Run YOUR BUSINESS Quickly & Efficiently & Increase YOUR SALES 

5-STAR AI PRESENTS OUR FREE   

CHATAI TOOLS

OPEN TO THE PUBLIC

HI! My Name is STAR

What Can I Help you With?  

Feel FREE 2 Ask!!! 

HELLO 

HELLO

 I AM STAR & I AM YOUR ASSISTANT TO GUIDE YOU AND YOUR BUSINESS IN THE AI IoT WORLD.

 SO ... FINELY WELCOME TO OUR NEW SITE

 THE 5-STAR AI & IO TOOLS FOR YOUR BUSINESS.

IS ARE NEW WEBSITE AND IT'S ABOUT ALL THE TOP & THE BEST AI & IoT TOOLS On The NET.

We provide you the best  Artificial Intelligence tools and services that can be used to create and improve YOUR BUSINESS Websites BOT & CHANNELS.

This Site includes Tools for Creating Interactive Visuals, Animations, 3D & videos.   

As Well As TOOLS for SEO, Marketing & Web Development.

IT Also Includes Means FOR Creating & Editing Text, Images & Audio.

The Website IS Intended To Provide Users With A Comprehensive List OF AI-Based Tools To HELP Them Develop & Improve Their Businesses.

This Website IS A Collection OF Artificial Intelligence TOOLS & Services That Can BE Used To Create & Improve Websites. 

IT Includes TOOLS FOR Creating Interactive Visuals, Animations, & Videos TOOLS and ALSO TOOLS FOR SEO, Marketing & web Development.

שלום 

שלום   

אני סטאר ה"בינה המלאכותית ואני העוזרת המקצועית שלך באתר 5 כוכבים כדי להדריך אותך ואת העסק שלך לעולם החדש של הבינה מלאכותית  

  אז... ברוכים הבאים לאתר החדש שלנו

    כלי בינה מלאכותית בעליי דירוג של 5 כוכבים עבור העסק שלך 

 האתר הוא על כל הכלים החדשים ברשת בעליי דירוג של  5 כוכבים  

אנו מספקים לך את מיטב הכלים והשירותים של בינה מלאכותית שניתן להשתמש בהם בימינו בכדי ליצור ולשפר אתרים וערוצים עסקיים 

אתר זה כולל כלים ליצירת ויזואליה אינטראקטיבית, אנימציות, תלת מימד וסירטונים 

האתר מציג גם כליי בינה מלאכותית לקידום אתרים, שיווק ופיתוח אתרים 

האתר כולל גם אמצעים ליצירה ועריכה של טקסט, תמונות ואודיו 

האתר נועד לספק למשתמשים רשימה מקיפה של כלים מבוססי בינה מלאכותית כדי לעזור להם לפתח ולשפר את העסקים שלהם והכל בחינם 

אתר זה הוא אוסף של כלים ושירותים של בינה מלאכותית שניתן להשתמש בהם כדי ליצור ולשפר אתרים 


האתר נועד לספק למשתמשים רשימה מקיפה של כלים מבוססי בינה מלאכותית  כולל הסברים שיסייעו לכם לפתח ולשפר את העסק הדיגיטאלי שלכם בחינם במקצועיות וביעילות

HELLO & WELCOME TO THE

5 STAR AI.IO

TOOLS

FOR YOUR BUSINESS

LIP READ

Generate  Your First Professional 

AI TensorFlow PROJECT & GET YOUR

BUSINESS 2 Another  Level.

5 Star Free AI YouTubeshorts Title Generator

Feel FREE 2 UZ

How to Code a Machine Learning Lip Reading App with Python Tensorflow and Streamlit

How to Code a Machine Learning Lip Reading App with Python Tensorflow and Streamlit

ENTIRE TRANSCRIPT NO TIME  

a couple of weeks ago I managed to release the most amazing machine learning model that I've ever had the chance to work on in my entire life it takes in a number of video frames passes it through to a machine learning model which is built in tensorflow and Python and is in effect able to go and perform that's right it's able to take a set of videos and transcribe what a person is saying now whilst I'm still waiting for my juicy defense contract we're going to take this a step further and build it out into a full stack application using streamlit python tensorflow and a whole bunch of other great python libraries now this can be extended to a whole bunch of use cases If eventually you wanted to go and replace the video feeds with a webcam if you wanted to go and deploy it on a edge device there's a whole range of possibilities plus this being an absolutely brilliant example of what is possible with machine learning great for Eurasia map anyway ready to do it let's get to it alrighty so the first thing that we are going to go on ahead and do is get our project open up inside of vs code because we're going to be doing all of the coding inside of vs code now if you haven't gone and checked out the original tutorial I'll include a link somewhere up there so you can go and check that out but all of the code is available on GitHub so if you want to go and pick this up you want to get the existing model checkpoints those are going to be available there so you can definitely go on ahead and grab those now what we're going going to go on ahead and do is jump into vs code so I'm going to open up the existing folder that I've got and just type in code Dot and you can see we've got a ton of stuff floating around up here let me quickly explain this model structure so over here we've got our data folder this was set up in the previous tutorial we've got our lips virtual environment and we've got our pre-trained model checkpoints those pre-trained model checkpoints are available inside of GitHub I've included the checkpoints for 50 epochs and I think 96 epochs 96 epochs Works absolutely brilliantly so you can go on ahead and pick those up we're going to be mainly doing our work inside of the app folder now I've got two existing files now these are going to look really familiar too if you saw the previous tutorial because really all we're doing here is we're instantiating our deep learning model which I went through how to build from scratch in the previous video and we are going to be loading up the weight so right down the bottom you can see that we are loading the existing weights from the 96 Epoch checkpoint so this will give us the ability ability to run this function here load model inside of our full stack application and bring that into our app we interrupt your regular programming to tell you the courses from it is officially a lot if you'd like to get up and running in machine learning deep learning and data science head on over to www.courses from Nick to find the latest and greatest I'm also going to be releasing a free python for data science course in the upcoming weeks so be sure to stay in the know but if you're ready to hit the ground running well I highly recommend you check out the full stack machine learning course this goes through seven different projects 20 hours of content all the way through full stack production ready machine learning projects head on over to www.courses from nick4 bundles forward slash full stack ML and use the discount code YouTube 50 to get 50 off back to our regular programming now the core advantage of having these existing checkpoints is we're taking a bit more of a full stack feel this means that we're picking up from the existing machine learning endpoint right it's almost as though you've gone and done the entire machine learning engineering bit you're now passing this off to a software engineer and it's good practice to get into a habit of learning how to take a deep learning model or take a machine learning model and bring it into a full stack environment and really get it out there into your users this is exactly what the focus of this particular video is going to be we're taking this deep learning trained deep learning model and we're going to be integrating it inside of a streamlit app now the other helper file that we've also got is this utils.pi file again this is coming directly from the existing tutorial we've got a couple of imports we are handling our vocab and we're defining a chart to num function and a num to char this effectively takes our characters that our machine learning model is going to spit out convert them to numbers the number to char model or the num2 chart function does the opposite it takes a number of tokens and converts them to cash is because in fact our lipnet model is actually outputting characters now this is one of the cool things about libnet in the fact that it is a character-based model this means that if we were to go and pass through additional videos in the future it would be able to go and learn how to decode those specific words even though it might not have necessarily seen that entire word before there was a question on the previous video about whether or not this is in fact the case yeah it is a character-based model which is what makes it so cool the other functions that we've got inside of here are load video load alignments and load data we're not going to use those specific sub functions we're going to be using load data which is going to return to us our pre-process video and our alignments now again inside of that bigger tutorial I actually went through what that actually looks like and the cool thing is if you actually go and take a look at the thumbnail from that particular video this actual function was used to go and generate those thumbnails which is kind of meta but anyway I thought that was kind of cool all right so we've got two helper functions so we've got utils.pi and we have model util.pi we're going to be using those inside of our full stack streamlit app and I'll make these two files available inside of the updated GitHub repo so you can go into that we're going to be working inside of the app folder all right let's actually do some coding now so I'm going to open up my uh we are opening up a command prompt so I'm going to send it into app and we are now inside of the app folder let's just see that can we zoom in let me zoom in so wow that's very zoomed in as you can see we can't let's zoom out a little so if we open bring that up you can see that we are inside of our main project folder and then we're inside of a subdirectory called app so if I actually go and type in LS you can see I should have let's bring this let's do that again so you can see that we've got two files in there right now model util.play and utils.play cool brilliant okay so let's actually get to building this app because right now we've just been messing around and taking a look at what our project looks like so we're going to create a new file and we're going to call this streamlit app dot pie beautiful and then what we're going to do is we are first I'm going to import a couple of libraries so first up we are going to import streamlit as St so this is going to last work with streamlit we're then going to import OS as that's just going to make it a whole bunch easier to work with our different file paths and we're also going to be using it to list out the files inside of our directory we're also going to import image IO and that is going to allow us to take a series of videos and convert it into a gif which just looks really really cool and it allows you to see what our machine learning model is actually going to be able to take in as input before making a prediction all right so we've got streamlit we've got OS and we have image i o we then need to import tensorflow as TF what do you reckon should I start doing some more tutorials on pytorch I've been doing quite a fair bit on tensorflow but I know that people like pytorch as well let me know in the comments below all right cool so we've got import tens flows TF we then need to bring in some stuff from model util and utils.pi so we are going to import uh actually we're going to from in from utils import we need load data and we need num to char beautiful and then from model util we are going to load up the load model function import load model alrighty cool so those are our six key Imports that we're going to need now come here want to know a secret are you looking for your next dream job in data science machine learning deep learning or just data in general or you need to join jobs from net each and every week I send you a curated list of the best jobs in data these are jobs with great perks great people and great roles plus you'll get access to exclusive content like amas interviews and resume reviews so why not join plus it's completely free link is in the description below what are you waiting for all right back to the video so you're probably thinking why are we using streamlit why are we using tensorflow why are we using each one of these libraries well extremely it just makes it ridiculously easy to go and build full stack applications and it's kind of designed for machine learning Engineers data scientists to be able to take their production eyes or their trained machine learning models and bring it into a full stack environment that's why we're using streamlit image i o it really is going to be about that gif so if you don't want to go and create or convert your video to a give to be able to visualize it you don't really need that but I personally think it just makes things look a whole lot nicer to tensorflow actually gives us our deep learning capability if you've got a GPU on your machine it's going to run way faster but that being said you don't necessarily need a GPU although highly recommend it anyway let's get back to it so now that we've got our Imports done so these are all our Imports so import all of the dependencies let's save that first things first let's go and try to kick off our applications we haven't really done much inside of there so far so I'm just going to open up a terminal so on vs code I typically hit control and then tilde which gives me this terminal ability right now I'm running it inside of Powershell but you can run it inside of a command prompt inside of a Mac OS machine you're probably going to be doing this inside of your terminal so what are we going to do we're going to start up our app so if we type in streamlit and then we are going to type in streamlit run let's make this a little bit smaller so we can see it streamlit run and we're going to run streamlit app.pi so keep in mind we are inside of the app folder so hence why we're able to do this all right so if we go and run this this should open up our streamlined app inside of a browser but we're not going to see anything but let's go in ahead and take a look that's all looking promising so this is our streamlit app blank screen doesn't matter it's going to work we are going to build this okay so that is the beanings of our app doesn't look like we've got any errors yet so we are in a good State let's bring this over here okay so now that we've got that done we should probably start adding a little bit of some uh structure to it so let's go ahead and do that let's first up add a sidebar so we can type in with st dot sidebar and then we're going to add St dot image and I'm just going to grab an image that I personally like hold up a few moments later so I've got this link to this image it just looks kind of cool it's got a data and AI feel to it so if I go and paste that in that's sort of what it looks like it shows massive throughput going through a GPU or a bunch of hands but anyway you've got to get the idea so we're going to type in with st dot sidebar this is going to give us sidebar capability inside of our streamlit app we are then going to add a title so we can do that using St dot title and we might call this lip buddy oh wow my typing is a shocker today and then we can add St dot info so that's going to give us the ability to add some information about our application so we might say um this application is originally developed from the lip net deep learning my head's covering that now deep learning model cool so if we go and save that let's just make these single quotes got a bit of a bad habit of using both what do you guys do let's get rid of that beautiful okay cool let's uh go and refresh our app now we can hit r let's open up our sidebar boom take a look at that the beginnings of our applications we've now got the image so that corresponds to this line here St dot image we've got the title which you can see is lib buddy that corresponds to that and we've also got the info box which corresponds to that bit down there so that is the beginning structure of our application now up and running so the next thing that we're going to go on ahead and do is start laying out our app and giving it a little bit of structure so we're going to set up two separate columns so one column will be able to display the originating video the other one is going to go through all of our machine learning steps so first up we're going to create a little bit of a gif which decodes what we're Transforming Our original video to post processing we're then going to have the raw predictions and then we're going to have the decoder predictions which take the individual number tokens and convert this into an actual set of words before we do that we actually need to be able to go and get an existing set of videos that we're going to have selected we're going to use St but select box to go ahead and do this so let's go ahead and start setting out our options now if you cast your mind back to the original video inside of the data folder we had all of our videos inside of this S1 folder now there's one thing that I noticed which was a little bit of a pain when it came to streamline in that it wasn't able to play these dot MPG files because I think they're an older codec we're going to solve this using ffmpeg so I'm going to show you how to do that as well okay let's go ahead and write a little more code so so far I've done our dependencies we've set up the sidebar set up the sidebar what we're now going to go ahead and do is get a list of drop downs for all of our potential options so here is where OS is going to play actually we're going to do one additional thing we're going to set the layout through the streamlit app as wide so to do that we can type in St dot set page config boom and then we can type in layout equals to wide so this is going to give us the ability to have our page set up to White I'll include a little bit of documentation over there so you can take a look at what that means that is our page config now done now let's jump back over to these options so get so what we're really doing is we're generating a list of options or videos so in the future if we wanted to go and sub this out for a webcam I'd imagine that this is where I'd be plugging in that block of code let me know if we want to extend this app out to that okay so what we're going to do is first up we're going to get all of our potential options so let's go and type in options and we're going to say options is equal to os.path.join and remember our options are going to be our videos so our videos are going to be inside of a folder called so we need to back out of our app folder we're then going to go into our data folder and then we're going to go in a folder called S1 now ys1 well the original lipnet data set had a ton of different speakers I'll include a display of what that looks like there are a ton of speakers S1 is Speaker one but there were a whole bunch of others which were included in the original data set that was built for the lipnet architecture we're just using one speaker okay so that is going to give us the file path to that folder but if we type in OS Dot dot uh Listia that's going to give us all of the different options so if we go and type in print options let's just go and let's open this up and let's go and refresh our app I'll be printing out uh what's Happening Here can only be called set page config can only be called once per app so we need to go and restart our apps let's do that shut this down and then restart good work okay boom that looks promising all things holding equal we should be able to print out full of do we have an error there what's happening St dot pagecon I think we need to bring this up this should actually be further up let's bring this back up here beautiful so this should be above the St dot sidebar that's my bad let's go and refresh that beautiful all right and you can see here that we're printing out all of the different video names right so we've now got a set of video names that we can use as a drop down inside of our app so to do that we can then go and type in um we're going to save this as a variable so we're going to say selected video is equal to St dot select box we're then going to pass through an option or we're going to pass for a title actually we're going to say choose video and then we're going to pass through our options to that so if we go and save that now we'll bring up that app hit refresh boom so we've now got our sidebar we've got this ability to go in ahead and choose our video so later on when we go and select this video first I'm going to render it so you'll actually be able to play it and then we'll go through all of that deep learning process which is going to be great okay where are we at so we've now gone and set up at the ability to go and choose a video so the next thing that we want to go on ahead and do is start setting up that layout because we don't have that set up right now so we can first up double check that we've got an option selected so we can say if options and then we're going to create two columns so let's create should we create outside options now let's just do whatever here so we're going to say call one or call One Call two is equal to St dot columns and then we're going to pass through two so I'll include a little bit of documentation up on the screen right about now so you can see what columns does but really this is going to generate two columns and then we're going to say with call one we're going to do something and then with call 2 we're going to do something beautiful so let's just have some text for now so if we take in St dot text and we're going to say this is column one and then let's paste that over here and then we're going to say this column two so let's get rid of these past statements and let's open up our app backup so let's refresh all right so you can see that we've now got column one we've now got column two so we've now gone and set up our layout so just to recap so we've now got our sidebar we've now got the ability to go on ahead and choose our video and we've now got column one over here and if we scroll on over we've got column two so that is the base layout of our app now ready now the next thing that we're going to need to go on ahead and do is actually get a video set up so that we can play it inside a streamlit I sort of alluded to this a little bit previously in that streamlit doesn't like the dot MPG file format which is perfectly okay we can use an extension or a helper Library called ffmpeg which allows us to go and convert this dot MPG file into MP4 so we're going to do that we're then going to read in our video and then we're going to display it inside of our application so that we are able to go and play it inside of our streamlit app we're going to be using the St dot video function so to streamlit dot video display helper function which actually allows us to go and render the video inside of our application now in order to get our video up and running we are going to first up need to grab that video and we need the full file path so right now this is just going to give us the specific file name we need the entire file path so for now let's get rid are we happy with this St dot now we don't want this column um let's actually go and do our magic to get our files so first up what we need to do is we need to get the full file path so we're going to say file path is equal to OS and the reason that we need this file path is because right now if we go and choose one of these options the value selected file over here or selected video is really only going to hold this so bbafn dot MPG it's not going to hold the entire file path to that particular file we need the full file path to be able to go and do this conversion using ffmpeg so let's go on ahead and do that we're just going to effectively append this together so os.path.join we are going to then say we want to go back out of this folder we're then going to go into Data we're then going to go into S1 we are then going to go and pass through our selected video so that should give us the full file path then what we're going to do is we're going to do a little bit of system magic using ffmpeg so ffmpeg it does a ton of uh video processing so if you've ever used youtube.dlp to get some YouTube videos completely legally ladies and gentlemen we got them ffmpeg is actually behind that so we are going to be converting this so we are going to say os.system and this is going to allow us to run a command line call so we are then going to run the full command line call which is going to be ffmpeg we are then going to pass through Dash I which will will allow us to pass through your file path and we are going to pass through a string or pass through our variable there so inside the squiggly brackets I've gone and passed through the variable file path so the full line is OS dot system using F we're able to go on ahead and use some string formatting we're then passing through or calling ffmpeg Dash I we're passing through this variable into our string so if we closed it that would we'd get rid of those errors but now we need to say what we want to go and convert this dot MPG file to we are going to convert it into a mp4 file format so to do that we can pass you dash V codec uh and it's going to be lib x264 I think something like that I think that's right and then we're going to Output this file to a file called test video dot MP4 and then we need to pass through Dash yes to say that we want to do that conversion and overwrite the lib executed all right so let's take a look at that full line let me zoom out a little bit so we've now gone and written file path os.path.join we're grabbing we're jumping out of our existing folder going into Data going into S1 and grabbing our selected video so this will give us the full file path to our selected video we're then running os.system and then we're running our FFM Peg conversion to take our DOT MPG file and convert it to MP4 I know it's a little bit of a effort to go and get to this stage but hopefully that should make our life easier when it comes to rendering our video so let's actually go and test this out now so if we go and are we still printing out all the file paths no okay that's much better we don't want to go on ahead and do that again so let's uh are we still running it up let's scroll on down looks like we are let's just shut down our app and restart cool so that looks like it's running the conversion already so that looks promising right so take a look we've now got our test video.mp4 so that looks let me put my headphones on so I can hear this I cannot hear anything maybe you don't get audio out of out of whatever I'm playing it in so let's go into the main folder which will be inside of here pin blue F2 now cool so successfully gone and converted our video and these are the original videos from the lipnet model I'll include a link in the description if you don't want to go and take a look at the original entire data set okay but that looks promising so we've now successfully gone and converted it what we now want to do is render it inside of our application so we're now going to do some rendering inside of the app it's pretty straightforward it's just three lines of code to go on ahead and do this so the first line is let's grab our video so we can type in video equals open and then we're going to be opening up test video.mp4 and we're going to be reading it as a binary so we're going to pass through RB so video equals open passing through test underscore video dot MP4 and RB to read it as binary we're then going to read it so video bytes is equal to video dot read and then we want to render it so we're going to use St dot video and pass through video bytes and this should give us the ability to see a video inside of our app so let's go and rerun our app let's close these now boom boom boom boom boom take a look at that that is our video now rendering inside of our application so we've got to play it pin blue F2 now all right let's add a little bit of info but take a look so pimp blue F2 now now the determining characteristic as to whether or not this is working is if we go and choose a different video are we going to be able to see that rendering so if we go and choose a different one let's choose this one so bbal 8p means I think binblue at L8 please bimb blue L8 please there you go all right cool we're looking positive so it looks like it's successfully converting each one of these videos let's add a little bit of information to our app so just above options let's add a title we're going to say St dot title and we're going to say lipnet full stack app boom beautiful and then above our video let's add in an info box so we can type in St dot info this is con uh display the video below displays the converted video in MP4 format save that all right let's go and rerun beautiful all right so we've now got our title and we've now got our info box so if we go and change our video I've got that YouTube speed player hence why it's playing so quick so we can drop that down to been blue at L7 soon beautiful all right cool that is our video now rendering so the video is now successfully rendering but what we actually need to do is we need to take this video and pre-process it before passing it through to our lipnet app itself now luckily we've got that utils file which is going to allow us to load in data simply by passing through a file path this is going to return back the pre-process video which is actually going and isolating the mouth within the original video so you're actually going to see the isolated component of that we're also going to get the associated annotations for that video so if we wanted to we could actually go and compare now we've got the video itself so we can actually see what was being spoken and determine whether or not our lipnet model is actually performing well or not now I'm going to add in a little bit of flare here and we're actually going to take that video and output it as a gif which just looks really cool if you wanted to go and embed this inside a markdown or share the results or share the pre-processing of this particular video before it goes to the lipnap model this is exactly what we are going to be able to see d right now so on to loading some data it is so let's clean this up let's just double check how we're doing so far so we've gone and brought in our dependencies gone and set up our layout set up a sidebar set up a title we've got our options and we've got our column one which is all about rendering the video and then the next thing that we are going to want to go on ahead and do is there's going to be three parts to this to the second column right so let's just add in a couple of info boxes so if I type in St dot info it's one of my favorite streamlit functions it just looks really cool because it helps tell your user what on Earth they're looking at I find it so often that people are building different applications and nobody really has an idea as to how they work adding a little bit of info just adds that extra little bit of flair and tells your user that you care about them I just think it goes that little step further anyway all right estee.info so first up we are going to display um this is the pre-process gif which will actually we can actually this is all the machine learning model sees when making a prediction because that it's true right like when when we actually go and run this pre-processing that is all our machine learning model is going to be seeing so their second info box is going to display the predictions this is the output of the machine learning model as tokens so our machine learning model actually returns a set of tokens which goes through a decoder called CTC it's called connectionist temporal classification there's a brilliant blog post that I found about it which actually explains how this works it's also used quite a fair bit for automatic speech recognition which I think is kind of cool so that is what we are going to be returning back and then the last thing that we're going to do is we're going to run St dot info and we are going to decode the raw tokens into um into words and this is where numb to char is going to work in so num2 chart is going to be helping us there okay first things first what we want to go on ahead and do is pre-process our data I've just kicked to my table all right so let's go ahead and do this the first thing we need to do is load data so we're going to get back the video and we're going to get back annotations back and no notations and to that we need a part or run the low data function remember we run imported load data right up here so we're going to be using that now to the full line is video comma annotations equals load underscore data and then to that we are going to be passing through the full file path which we just defined up here so we're going to take that file path we are going to pass it through to here but before we do that we need to convert this into a tensorflow tensor because that is the expected file format that this particular function is expecting to do that relatively easy we can type in TF dot convert underscore 2 underscore tensor and if we close that this will actually return back our video and our annotations now we're not going to stop that oh no what we're now going to do is we're going to take this video and we're going to convert it into a gif and then we're going to render the GIF inside of our streamlit app I know so what we're now going to do is to use image i o so image IO Dot mimsave to that we are going to specify what our output GIF is going to be called so it's going to be called animation dot gif and then to that we'll go in and pass through our video and the number of frames per second that we want our app would give to be so three lines out so we've got our St info which is going to say this is all the machine learning model sees when making a prediction we're then using the load data function and we this should be convert 210 sub we're passing through our file path which we defined up here from that we're going to get video and annotations back or the pre-processed video back and as well as the annotations we don't really need the annotations but our low data function brings them back anyway and then we're using imageio dot mimsafe to actually go and convert this video into a gif so imageio.memsave we pass through the name of the GIF that we want to Output we pass through the video and we pass through how many frames per second we want our GIF to be then what we can do is we can actually go and render this inside of our app we can use St dot image and then we're going to pass through animation.give because it's actually going to be actually before I do this let me just show you this working so if we go and save this let's go is our app still running app looks like it's still running let's go and refresh what should happen if this runs successfully is that when we go and run this we're going to get a file called animation.gif inside of our app file so let's go and refresh this just by hitting r so that looks promising right so it doesn't look like we've got any errors if we go back to our app take a look we've got animation.gif and so the cool thing is that this GIF is all the machine learning model sees when it makes a prediction how cool is that no audios passed that is all that's being used to go and do that lip reading which is why I find this absolutely phenomenal we just need one more line to render this inside of our application now so we can go and pass through St dot image and pass through animation.gif so if we go and refresh we should be able to see this take a look that's our gift there it is a little bit small so we can make that a little bit bigger if we go back into St dot image and pass through width equals 400 save that go and refresh boom much better how cool is that so we've now got our video playing we've now got our pre-processed gift playing what we now need to do is actually going ahead and start making some Productions right so predictions what we first are going to need to do is use the load model function that we imported from Models util to load up our sequential model which we originally authored inside of that previous tutorial inside of a Jupiter notebook so we're going to load that up that is also going to load the pre-trained weights which effectively means that we've got a trained deep learning model to be able to go on ahead and use inside of our application what we're then going to do is we're going to take the video that we just loaded using the load data function we're going to pass that through to our model using the model.predict method and then we're going to take the predictions and pass it through the CTC decoder which comes from tensorflow Keras this is going to help take any duplicate predictions and condense them down this is the beautiful thing about the CTC decoder it kind of does all of this tricky coding for us again that blog post that I referenced previously makes your life a ton easier when it comes to understanding connectionist temporal classification highly recommend you check that out anyway let's start making some predictions so what we need to do now is load up our models so that's exactly what we're going to do now keep in mind that we already brought in this load model function from model util now again that file is going to be available on GitHub so you can pick that up you don't need to go on ahead and write it yourself I just wanted to make your life a little bit easier so we're going to load in our model let's go ahead and do this so we're going to say model equals load model right and this is going to return our tensorflow Keras model inside of this variable model which means we're going to have all of the amazing things that Keras and tensorflow gives us we're going to be able to namely use the dot predict method to go and make our predictions so model equals load model we then want to make some predictions so we're going to say y hat is equal to model.predict and then we're going to take this video and we're going to pass it through to the model.predict method but keep in mind when our model dot or when we use model.predict it's expecting a batch of inputs we only have one input or one example that we're going to be passing through to our model so we need to go and wrap it inside of another set of arrays so in order to do this relatively easily we can just run TF Dot expand dims Pastor our video and then pass through axis equals zero and then we need to close that so that's going to return our predictions what we then need to do is we need to go and run this through the Keras CTC decoder I'll include some information about the CTC decoder up there it isn't exactly a hugely well documented feature inside of Keras but there is a little bit of information which I managed to find so I'll uh give you a bit of that info right about now over here yeah cool all right cool so now what we're going to go ahead and do is we are going to take this model or take these set of predictions and we are going to run it through that decoder so we can then go and take y hat and we're going to pass it through to this so we're going to set TF or we're going to set a variable called decoder.tf dot Keras dot back end dot CTC decoder let me zoom out a little bit and we're going to pass through the predictions to that we then need to pass through the length of those predictions the length is 75 we established this when we created the original tutorial we're then going to specify what type of algorithm we want to use when it comes to decoding so we are going to use a greedy algorithm which means we're going to take the most probable prediction when it comes to generating our outputs we're going to say greedy is equal to true now when you get this prediction back it's nested inside of a bunch of arrays so we're going to grab the first value and the first value again so this should give us what we actually need so that should effectively be our decoded output if we go and just print this out we should be in a good position so let's quickly recap so we've gone and loaded our model we've gone and made some predictions we are then running it through that CTC decoder so if we run SC or we're then going to Output it as st.txt and we'll pass through decoder save that let's go and refresh this now so if this works we should get a prediction in this little space here so let's refresh oh we have got an error CTC decoder module Keras API Keras backend has no attribute CTC decoder uh CTC is it one word a ctcd code not CTC decoder okay my bad save that let's go and refresh now foreign you can see there that those are our outputs and so this is sort of what I mean so we're going to get be getting back our tokens from our deep learning model this isn't necessarily a set of words as of yet so what we actually need to do is we need to go and convert this into a set of words actually before I do that let me show you what this would look like if I output it before running it through the decoder so if I type in st.txt and just pass through y hat you'll see the before and after so we're going to up with the roll predictions and then the set of predictions that we get after running it through the decoder so let's refresh this let me zoom out and you should see that we get a bunch of duplicates uh so this is just giving us the raw probabilities back um if we ran through uh tf.org Max and we said access equals one let's see what we get back there foreign there you go all right so take a look so you're getting back a bunch of duplicates right so you can see we've got our output 62 14 6 19 19 10 23 19. so you can see that we're getting all of these duplicates additionally when we go and run it through the CTC decoder there is a special algorithm which goes and decodes this to get us better sets of predictions which is exactly the reason that we're going ahead and use this so you can see that's the raw output if we were just to go and run it through an ARG Max function this is what we actually get back once we run it through that CTC decoder so we are not going to use this we are going to use the CTC decoder output so if we save that that is our decoder now we can also just grab the raw string by typing dot numpy rather than outputting the entire tensorflow value so if we go and refresh that boom much better so you can say that that is the raw output out of our machine learning model but we're not going to stop there oh no no no what we're now going to do is we're now going to take these set of tokens and run it through the num to char function which we previously imported from the utils library that we went and created and this is going to take that raw set of numbers and convert it into a set of words at the same time what we're going to do is run it through a function called tf.strings dot reduce join which is going to concatenate it together into a single sentence let's go wrap this up home stretch now so what we now need to do is we now need to take this decoder and or this decoder output and we are going to go and decode those raw tokens into words so first things first we're going to go num to char and pass it I'm not numb to chat all right enough Chit Chat Empire sit through that decoder output beautiful so if we just go and output that so if I run St dot text and pass that through so just to quickly recap so we're taking this decoder output we're passing it through to the num to char function which comes from up there so if we go and save our app and let's go and refresh you'll see we got a bunch of words down here A bunch of letters right so right now we're just getting all of these letters and we've got a bunch of blank spaces so that's really difficult to read so what we can actually do is we can condense this down into something which looks a little bit more sensical so let's go and do that so we can go and grab let's just do it on another line um we'll say converted prediction and we are going to say so numb to Cha we're effectively just grabbing this right now so setting that to that and then what we're going to do is we'll run tf.strings dot reduce join pass that through and then we're going to take this converted prediction and pass it through our St dot text method so this is coming from streamlit it's just a streamlit uh text element but we can get rid of that comment down here let's actually add it up here so convert prediction to text beautiful all right so let's go and refresh all things holding equal should see magic down here take a look at that binblue at L7 soon okay so right now this is still inside of a tensorflow tensor we can actually we can leave it like this or we can actually go and clean it up so if we wanted to clean it up we can just do uh over here we can type in Dot numpy let me zoom out so you can see this Dot numpy and then dot d code and then pass through utf-8 boom and then if we go and refresh we're gonna test this out take a look so that is our raw prediction so bin blew at L7 soon so let's go and play this and see how we're actually doing so if we go and select a video let's go and choose one right down here so LG we're going to choose this one you can say that this is our model doing our video conversion oh god that's way too fast lay green with G5 again lay grain with G5 again and take a look that is what our model is predicted lay green with G5 again what if we chose another one um let's choose way further down uh what about this praj1s please swear to J1 soon based red at J1 soon how amazing is that so we're able to go on ahead take a video pass it through this entire pipeline which has quite a fair few steps but it's able to make a prediction now keep in mind that this isn't using the audio it's just using the raw video to be able to go and decode what is being said so crazy anyway let's go and grab another one what about this one so red 01 again set red at 01 again he's got a bit of an English accent we'll we'll give him that what about uh this one blue F2 now in blue at f2 now absolutely amazing let's go all the way down what's the last video a thumbs DB that's incorrect we don't need that one uh swv9a set right with v9 again set why all right so in this particular case not perfect set white with SP S9 again not too bad so maybe we could do some additional training there and this sort of goes to show that we're not faking this that this is a genuine set of predictions please wait by Q7 again Place White by Q7 again is that what I said play Sprite by Q7 again how cool is that guys that is the complete application now done now again all of this code is going to be available on GitHub so you can go on ahead and check it out and that is the application now built so we've gone through quite a fair few steps we set up our streamlit framework we then loaded in our video and we built through some of those challenges that we encountered right so streamlit can't play The Dot MPG file formats we converted it using ffmpeg we then set up our layout so that we're able to go and visualize our videos as well as take a look at each one of the steps that our machine learning model is going through we took a look at the pre-processed gift we then took a look at those output tokens and then we took a look at what it happened or what happened once we went and converted those raw tokens into a set of words and that gives us our final application let me know what you thought of this tutorial hopefully you enjoyed this I know that we put up the poll determining whether or not we wanted this as a code that episode or rather as a raw tutorial but I wanted to give this a little bit more Flair even though we won't necessarily do encode that hopefully you enjoyed it I will catch you you in the next one now as I mentioned all of this code is going to be available by GitHub so if you jump on over to knick knock knock and go to the lipnet repository I'm going to make sure that the entire application is uploaded inside of there so you can just run a quick git clone and grab that yourself let me know if you end up putting this on your resume or inside of your portfolio projects I'd love to hear all about it anyway thanks so much for tuning in thanks so much for tuning in guys hopefully you've enjoyed this video if you have be sure to give it a big thumbs up hit subscribe and tick that Bell and all that other good stuff and let me know what you thought of this video do you think we should take it a little bit further do you think we should make some amendments where would you like to see this go anyway in the meantime I am working on a ton of amazing new sets of tutorials namely around Transformers plus the math for ML course is underway and that'll all be released on YouTube so you'll get a chance to get up to speed hopefully you're also enjoying the shorts I've been putting in a ton more effort into those to be able to share a little bit or a couple of nuggets of knowledge that I'm gaining as I'm going through this process and on The Learning Journey when it comes to picking up machine learning data science and deep learning hopefully you're enjoying it anyway I will catch you in the next one peace

Links: CTC Blog Post: https://distill.pub/2017/ctc Oh, and don't forget to connect with me! LinkedIn: https://bit.ly/324Epgo Facebook: https://bit.ly/3mB1sZD GitHub: https://bit.ly/3mDJllD Patreon: https://bit.ly/2OCn3UW Join the Discussion on Discord: https://bit.ly/3dQiZsV


17,085 views 26 Feb 2023

Get notified of the free Python course on the home page at https://www.coursesfromnick.com Sign up for the Full Stack course here and use YOUTUBE50 to get 50% off: https://www.coursesfromnick.com/bundl... Hopefully you enjoyed this video. 💼 Find AWESOME ML Jobs: https://www.jobsfromnick.com Get the Code: https://github.com/nicknochnack/LipNet

TensorFlow 2.0 Complete Course - Python Neural Networks for Beginners Tutorial

What is Tensorflow for Python?

What is Tensorflow for Python?

Published on: August 19, 2020 by Sagnik Banerjee

All AI and ML engineers use some or the other kind of programming languages to work on. These programming languages can be any like Python, Scala, R, C++, etc. and to work using these programming languages they either need to know the in-depth knowledge of coding from scratch or should have specific libraries that make their life easy. One such type of programming language that uses Object-Oriented Programming and thus allows code reusability and several libraries in Python. Being the most preferred language by all the Data Scientists out there this language has proven to be a true friend when it comes to solving Machine Learning and Deep Learning problems.

Tensorflow Python’s library:  The reason for the success of Python out there is because it has many libraries and a huge community that contributes their knowledge in building this programming much simpler every day.  One such powerful and largely used by data scientists is Tensorflow. This library is related to carry out all Machine Learning and Deep Learning related work. The base concept that it uses to run these things in Neural Networks.

Yes, this is the way our brain works by responding to stimulus and this is the concept behind the working of Tensorflow. This library was developed by Google for its internal purpose but was made open source because they found out that it can help the Data Science community to a great extent in solving their problems. The released date for this API is November 9 2015 and was licensed under the Apache License 2.0. Read more: 5 Most common programming languages used in AI (Artificial Intelligence)

On this Page  show 

Some of the interesting works and features of Python’s Tensorflow library are given below:

There are many more features that can be performed with the help of Tensorflow and for further details, you can go to the official website and can also follow different tutorials from various sources like Udemy, DataCamp, Intellipat, Coursera, etc.

Conclusion

If you want to learn the concepts of Neural Networks and want to play with them by inculcating them into your programs then you should go with this amazing library that will make your life easy. Also if you have the requisite skills then you can show your talent to everyone with the help of this library feature. And yes, you can contribute to the development of this API by providing necessary codes that can enhance the working of Tensorflow.

Categories

science, Technology

Tags

AI, Artificial Intelligence, deep learning, Machine Learning, ML, python,


Deep Learning for Computer Vision with Python and TensorFlow – Complete Course

GitHub.com/TensorFlow

          

Documentation


TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications.

TensorFlow was originally developed by researchers and engineers working within the Machine Intelligence team at Google Brain to conduct research in machine learning and neural networks. However, the framework is versatile enough to be used in other areas as well.

TensorFlow provides stable Python and C++ APIs, as well as a non-guaranteed backward compatible API for other languages.

Keep up-to-date with release announcements and security updates by subscribing to announce@tensorflow.org. See all the mailing lists.

Install

See the TensorFlow install guide for the pip package, to enable GPU support, use a Docker container, and build from source.

To install the current release, which includes support for CUDA-enabled GPU cards (Ubuntu and Windows):

$ pip install tensorflow


Other devices (DirectX and MacOS-metal) are supported using Device plugins.

A smaller CPU-only package is also available:

$ pip install tensorflow-cpu


To update TensorFlow to the latest version, add --upgrade flag to the above commands.

Nightly binaries are available for testing using the tf-nightly and tf-nightly-cpu packages on PyPi.

Try your first TensorFlow program

$ python

>>> import tensorflow as tf

>>> tf.add(1, 2).numpy()

3

>>> hello = tf.constant('Hello, TensorFlow!')

>>> hello.numpy()

b'Hello, TensorFlow!'

For more examples, see the TensorFlow tutorials.

Contribution guidelines

If you want to contribute to TensorFlow, be sure to review the contribution guidelines. This project adheres to TensorFlow's code of conduct. By participating, you are expected to uphold this code.

We use GitHub issues for tracking requests and bugs, please see TensorFlow Forum for general questions and discussion, and please direct specific questions to Stack Overflow.

The TensorFlow project strives to abide by generally accepted best practices in open-source software development.

Patching guidelines

Follow these steps to patch a specific version of TensorFlow, for example, to apply fixes to bugs or security vulnerabilities:

Continuous build status

You can find more community-supported platforms and configurations in the TensorFlow SIG Build community builds table.

Official Builds

Build Type

Status

Artifacts

Linux CPU


PyPI

Linux GPU


PyPI

Linux XLA


TBA

macOS


PyPI

Windows CPU


PyPI

Windows GPU


PyPI

Android


Download

Raspberry Pi 0 and 1


Py3

Raspberry Pi 2 and 3


Py3

Libtensorflow MacOS CPU

Status Temporarily Unavailable

Nightly Binary Official GCS

Libtensorflow Linux CPU

Status Temporarily Unavailable

Nightly Binary Official GCS

Libtensorflow Linux GPU

Status Temporarily Unavailable

Nightly Binary Official GCS

Libtensorflow Windows CPU

Status Temporarily Unavailable

Nightly Binary Official GCS

Libtensorflow Windows GPU

Status Temporarily Unavailable

Nightly Binary Official GCS

Resources

Learn more about the TensorFlow community and how to contribute.

Courses

License

Apache License 2.0

tensorflow/tensorflow 

Taking YOUR Existing BUSINESS

With AI TensorFlow

Build a Deep Learning Model that can LIP READ using Python and Tensorflow | Full Tutorial

ENTIRE TRANSCRIPT NO TIME  

the field of machine learning is taking Monumental leaps each and every day there's a new machine learning model which is pushing what we thought was possible we've seen the likes of whisper chat jpt daily 2 and large language models so I wanted to take this opportunity to help you build your very own game changer model we're going to be teaching machine learning how to lip read [Music] what's happening guys my name is Nicholas tonight and in this tutorial we are going to be building our very own machine learning model that is able to read lips now all this code is going to be available as well as the data so you're going to be able to build it up from scratch and get this up and running now Nick why do we need to build a machine learning model that's able to read lips well this is almost like an extension of the sign language model that we've built previously it improves accessibility and gives Society the additional power to be able to use machine learning for good so how are we going to build it well we are going to be using a range of Technologies we're going to be using opencv to be able to read in our videos when we're going to be using tensorflow to be able to build up our deep learning model we're then going to bring it all together and test it out so that we're able to decode what a person might be saying and again this is going to be using a client conversation format so you'll be able to get an understanding of what we are doing at each single point in time ready to do it let's go and have a chat to our client yo Nick what's up Johnny yeah not much oh no this isn't one of your ml startup ideas again is it actually uh all right what is it I was hoping you could get me to use machine learning to do lip reading lip reading yup what are you the FBI why lip reading well it kind of goes hand in hand with the stuff that you did around sign language but flipped around okay fair interesting I'll even give you 10 of the company I'm gonna need you to code it all though all right fine let's do it but not because of the 10 cent because of you guys hopefully you enjoyed this tutorial first thing we need to do is install and import our dependencies let's go get em co-founder let's do this thank you we interrupt your regular programming to tell you the courses from me is officially a lot if you'd like to get up and running in machine learning deep learning and data science head on over to www.courses from Nick to find the latest to end greatest I'm also going to be releasing a free python for data science course in the upcoming weeks so be sure to stay in the know but if you're ready to hit the ground running well I highly recommend you check out the full stack machine learning course this goes through seven different projects 20 hours of content all the way through full stack production ready machine learning projects head on over to www.courses from nick4 bundles forward slash full stack ML and use the discount code YouTube 50 to get 50 off back to our regular programming alrighty so the first thing that we told our client that we would be doing was installing and importing some dependencies so let's go on ahead and do that okay so the first thing that we need to do is install a bunch of dependencies so we've got our first line of code which is exclamation mark pip install opencv Dash python map plot lib image i o and G down now we also need to add tensorflow to this list if you don't have it installed already Okay so we're going to be using opencv to pre-process our data and I'm going to show you where to get the data in a second I've actually built a little script that's going to download it for you from my Google Drive matplotlib is going to be used to render the results so we'll actually be able to see the outputs of our pre-processed videos we're going to use image i o to create a quick little GIF so you'll actually be able to see a couple of the frames stacked together it looks pretty cool g down is going to be used to actually download our data set so it just works really seamlessly with Google Drive and I think that's what I'm going to start doing for a data sets going forward and tensorflow is going to allow us to be build a deep neural network so if we go and run that cell this should go ahead and install all of our dependencies that's looking pretty promising if we want to go and take a look at the different versions of libraries that we're going to be using we really should be moving this up we can run pip list and let's go and take a look so we installed I can't even remember what we installed now we installed opencv Dash python let's enable scrolling so let's go and take a look at the version of that that we are using OpenTV python we're using 4.6.0.66 pretty good matplotlib the version of that that we're using is 3.6.2 image i o where's that bad boy over here so 2.23.0 for G down where is that Dune uh over here we're using 4.6.0 and a photo tensorflow we are using 2.10.1 also this code is going to be available on GitHub so if you jump on over to GitHub and go to Nick knock knock and go to repositories this file is under lipnet it's private right now but I'm going to make it public so if you actually go and take a look I've included pre-trained model checkpoints so you don't need to go and train this yourself you can kick things off with a pre-trained model and I've also included the Jupiter notebook so just a quick get clone of that particular repository and you're going to be able to get started with all the code that you see in here okay so that is the packages that we need installed so that is now open all right the next thing that we need to do is import a bunch of dependencies so we are going to be importing OS so the first line there is import OS this is just going to make it a lot easier to navigate and Traverse through different file systems this is going to work a lot more seamlessly if you're working on a Windows machine or a Linux machine there are a few nuances that I noticed that I had to do particularly for the data splitting but I will explain that a little bit later so first line is import OS our second line is import CV2 this is going to last the import opencv which is going to be needed to pre-process and load up our videos then we've got tensorflow so import tens flow as TF that's going to be our primary deep learning framework that we're going to use we're going to use tensorflow data pipelines as well so if we actually go and take a look tensorflow data pipeline take a look so TF data now this can be a little bit tricky so I always try to evaluate whether or not this is better doing intense Flow versus pytorch who knows maybe at some particular stage I'll actually go and transition but I've got this working so tf.data is a great data pipeline framework it allows you to actually go and transform data can be a little bit tricky at times so sometimes you do need to do um some stuff that is a little bit slightly nuanced which is going to be unique regardless of which deep learning framework you're using but we're going to be using a proper tensorflow data pipeline which is probably a lot more closely aligned to proper machine learning operations so we are going to be building up our data pipeline using the tf.data API okay the next thing that we're importing is numpy so import numpy as NP it's always just good to have numpy along for the ride if you need to go and pre-process any arrays I've then imported typing this is something that I've sort of personally taken on as a little bit of a challenge or stretch goal this year to effectively start using type annotations a little bit better I'm not great at it but I am improving hence why we're going to be using the list type annotation we've then imported matplotlib so we've gone a written from matplotlib import pipelot as PLT so this going to allow us to render the pre-process or post process output of our data loading function and then I've also brought in image i o there's like a one-liner which allows you to convert a numpy array to a gif which looks really cool and you're able to see what you've actually got in pre-process which is particularly useful when you're dealing with videos okay those are our Imports now prepared so import OS import TV2 import tensify stf import numpy's MP we're going to import typing uh we're going to import matplotlib and image IO those are our dependencies are now imported the next thing that we need to do is just prevent our exponential memory growth so if you're running this on a GPU which I highly recommend you do whether that be colab or some other sort of cloud service or if you've got a GPU on your machine which is Cuda enabled highly recommend you run this line because it's going to prevent your machine from sucking up all the memory and getting out of memory errors so the full line and you've probably seen me use this a bunch in other deep learning videos as well so first up we're going to get all of our physical devices So Physical underscore devices is equal to tf.config.list underscore physical underscore devices so if I actually grab this line here um and we should need to actually run our Imports so let's let that run five minutes later now import if I actually go and run this we should be able to see which physical devices we have on our machine so you can see I've got my one GPU showing up there basically what we're then going to say is we are going to prevent any exponential memory growth so tf.config.experimental dot set memory growth and we are going to assign that to our one GPU that we've got here and we're going to set it to true but if we do have a GPU then we're going to be able to successfully set that if we don't then we're just going to pass so we can then go and run this and you need to do this pretty much straight away before you go and do any modeling or anything otherwise it's not going to take okay that is our set of dependencies now imported and in or installed and imported so just to Recaps we've gone in the import or installed opencv matplotlib image IO G down intense flow and then we've got an imported all of our dependencies and we've also gone and set our memory growth limit to true for tensorflow this is particularly applicable if you are training on a GPU okay those are our dependencies now installed and imported back to our client foreign so we're going to be working with the grid data set for this nice is this something that we'd eventually be able to use with a custom data set say ourselves sure I've actually got this planned we just need to capture frames of a person speaking then use a speech text model to transcribe what they're saying that data set could then be subbed into the model training pipeline let me know if you want that tutorial ah got it so the grid data set for now yep we need to build two data loading functions one to load the videos and one to load the Align transcriptions got it let's roll alrighty so now that we've gone and installed and imported our dependencies the next thing that we want to go on ahead and do is build our data loading function so there's two key data loading functions that we're going to need to build here the first is to load up our videos and then the second is to actually pre-process our annotations and our annotations in this case are sentences which our particular person in the videos is actually gone and talked about now the data set that we're going to be using is an extract of the original grid data set so this data set was built to be able to go and build lip reading models now I've gone and made your life a little bit easier by just loading this data into my Google Drive so you're just going to be able to download the specific sections or parts that I use to actually go and about how to build this that you're going to be able to go ahead and build this so first things first what we need to do is import G down so that full line is import G down and G down is a library that just makes it super straightforward to going ahead and grab data out of Google Drive once you've got that the next thing that we're going to go on ahead and do is download the data itself we are going to Output it inside of a file called data.zip and then we'll extract it all into its own separate folder so the full line is URL equals and then I've got this specific URL here so you can actually grab that paste it into your browser and you'll be able to download the data set we're just going to use Python to draw it because it makes your life return easier we're going to Output the file to a file called data.zip we're going to use gdown.download to that we pass through the URL which is this over here we also pass through the output file name and we've set quite equal to false we can then extract that as well because we're going to be downloading it as a zip file we don't need it as a zip file we need it unpacked so we can use gdown.extract all and pass through data.zip to be able to go and extract that so if I actually go and run this you'll see it should start downloading our data and there you go we're now downloading data so it's around about 423 Megs this is only one speaker the original grid data set I think has something like 34 different speakers so if you wanted to extend this way further up specifically using the grids data set you definitely could but I'm going to take this a different direction later on and we're actually going to grab data of ourselves and be able to train it on that so let's let that download and then we'll be able to kick things off come here want to know a secret are you looking for your next dream job in data science machine learning deep learning or just data in general will you need to join jobs from Nick each and every week I send you a curated list of the best jobs in data these are jobs with great perks great people and great roles plus you'll get access to exclusive content like amas interviews and resume reviews so why not join plus it's completely free link is in the description below what are you waiting for all right back to the video a few moments later alrighty so that is our data now downloaded you can say that we've gone and successfully downloaded it there now if we actually go and open this up you'll actually see that we have got our data.zip file now downloaded and we've also got this new folder called data which is what the extractor function would have done app is something that I got planned for in the future if you want to see this as part of a code that episode inside of a full stack app let me know in the comments but we are most interested in this over here so this data folder so if we open this up and this is what this uh section or code cell is going to create we're going to have a file called alignments and inside of that a file called S1 or a folder called S1 and this represents all of our annotations they're in this dot align file format which is interesting to say the least if you open them up this is what an annotation looks like now these specific videos are really around moving certain things to certain places so it still means silence and then we've got these different commands so binblue at f2 now if we go and take a look at another one um let's scroll on down there's this one lay white by F5 again so you can see that it's not necessarily things that you'd encounter out there in the real world but we're definitely going to be able to train a model to be able to decode this from purely a video no audio so still mean silent so we're actually going to get rid of those when it comes to pre-processing annotations but we want to really extract this so we want lay white by F5 again and then we in this particular case we'd want bin blue at f2 now now if we actually go and take a look at their videos we've actually got videos in here as well so if I jump into our S1 folder so if I go into my root folder so datum and then S1 and then you can see I've got MP4s here so that's not going to play it within um what do we do inside a jupyter notebook so let's actually just open it up so if I go into the folder that I'm currently working in and we go into data and we go into S1 you can see I've got all of these MPEG files right so these are all about videos if I go and play one pin blue F4 please let me Chuck my headphones on is that been blue F4 please in blue at F4 please so you can see that we've got different videos of a particular person saying something now eventually it's Blue by C7 again Place B Blue by B7 again so you've actually got matchy annotations right so if we go and open up a specific annotation go to alignments S1 and go and open up this one so BB af2n this should effectively represent The annotation for the matching video so bin blew at F true now so if we go and find that particular video which should be the first one so BB AF let me zoom in boom boom boom pbaf2n so we should be able to go and play this blue F2 now right so it just said Ben blue at f2 now in blue at f2 now so this is the data that we're going to begin working with now that is our data now downloaded the next thing that we want to go on ahead and do is actually get this into a data loading function so I've gone and written this function called load video this is going to take a data path and then it's going to Output a list of floats which is going to represent our video so what we do is we first create a CV2 instance a video capture instance which takes in our path and then we're going to Loop through each one of these frames and store it inside of an array called frames what we then do is we reduce them or we calculate the mean we calculate the standard deviation and then we standardize or scale our particular image features so we subtract the mean and we divide it by a standard deviation I'm also doing something here which is effectively isolating the mouth region now I'm doing this using a static slicing function so I'm basically saying go from position 190 to 236 and position 80 to 220 to isolate the mouse there's there is a slightly more advanced way to do this using a specific face detector to extract the lips which is what the original lipnet paper actually does so if I show you the lymph paper lip net paper so lipnet actually uses I think it uses dlib to be able to go and extract the mouth so if I go d-lib search from within here d-lib yeah so they use dlib to be able to go and isolate their mouth now I've just gone and done it statically for the sake of keeping this relatively straightforward but that is effectively what we're doing there so We're looping through every single video we're storing the frame inside of our own set of arrays we're converting it from RGB to grayscale as well that means that we're going to have less dado to pre-pros pre-process and then we're isolating the mouth using this static slicing function we're then standardizing it so we're calculating the mean calculating the standard deviation that's just good practice to scale your data and then we're casting it to a float 32 and dividing it by the standard deviation so if we go and run that that is our load video function now done now we're going to go on ahead and Define our vocab now a vocab is really just going to be every single character which we might expect to encounter within our annotation so bin blew at f2 now we've also got a couple of numbers in there as well just in case we need them so if I go and run that and we can actually go and take a look at our vocab so you can see it's just a list which contains each and every potential integer there now the cool thing about this is that we can actually go and use the Kara string lookup function to be able to go and look up or convert our characters to numbers and our numbers to characters so over here you can see I've got Char to num and this is originally from the Keras CTC uh I think it's ASR tutorial so it actually uses this specific loss function to do automatic speech recognition so they I've actually thought that this was a really neat way to do it and it keeps everything nice and clean so we've got two functions here Charter num and num to chart the first one takes a character and converts it to a number and the second takes a number and converts it to a character so it just makes your life a ton easier when it comes to actually converting text to string and string to text so if I go and type in child to num I think we can go and pass through um let's say a e d and you can see it's converting it to one two three now if I went and typed in um n I C okay it should be a comma you can see it's converting each one of these characters to an integer over here so this is effectively one hotting not necessarily one hot encoding our data set but it is tokenizing it and returning a specific token value or an index effectively now we're going to be able to pass through this data to our loss function to be able to calculate our overall loss because our model is going to be returning a one hot encoded version of this now likewise we can actually go and decode this so if I go and use num to char and if I pass through this array which is 14 9 3 and 11 we should get the reverse which gives back Nick boom and you can see that there so it's a byte encoded value but you can see I've got n i c k so these are going to be our lookup functions that allow us to convert and reconvert our text to encodings okay that is our vocabulary now defined so the full line there is chart underscore two underscore num is equal to TF dot keras.layers.stringlookup we pass through our vocabulary and then we're setting out a vocabulary token so if it encounters a character that it hasn't seen before it's just going to be a blank value then we're doing the opposite we're creating a num to child function which is equal to TF dot keras.layers.stringlookup we are then going and passing through the reverse so we can actually go and get our vocabulary out of this so if I type in chart to num dot get vocabulary boom you can say it's just returning back all of our unique characters beautiful again we're setting out of uh vocabulary token and we're using invert equals to true to say that we want to convert numbers to characters not the other way around okay and then we're printing out our vocabulary uh vocabulary and the size all right and then we're going to use a function to actually load up our alignments so our alignments being these we are going to take in a specific path which eventually is going to map through to these paths or alignments forward slash S1 we're going to open up that path and then we're going to split out each one of these lines if the line contains the value silence we are going to ignore it because we don't necessarily need it we're then going to append these into an array called tokens and we're going to convert them from characters to numbers so we're going to go and split that data out and convert it into a set of characters now there's one last thing that we need to do before we can go on ahead and test this out we need to go and load the alignments and the videos simultaneously so we're going to extract both of those paths and we're going to return the pre-processed videos and the pre-processed alignments together so we need a load data function so this is going to take in the path to our video we are then going to split it out and convert it so that we have a video path and an alignment path what we're then going to do is we're going to use both of our functions so we're going to use our load video function and our load alignments function over here and we're going to return the frames and the alignments out of each one of those functions so if I go and run that what we can then do is get a test data path so if we just go and grab this particular video which is our first one so bba16n and then what we can go on ahead and do is pass that to our load data function which is what we had here I'm going to wrap that specific path inside of our tf.convert to tensor function and this is just going to convert a raw string to a TENS flow tensor so TF dot convert utensa if I pass through test path you can see we're going to get a tensor back now to grab the tensor value we can type in Dot numpy and that's going to grab that and then I believe we can type in decode and it should be UTF -8 and then we're going to be able to grab that specific path inside of our load data function what we're doing is we're actually splitting now if you're running this on Windows you're perfectly fine to run this as is if you're going to run this on a Linux or Mac machine comment this line out and uncomment this so this is going to be I believe to be able to run it on a Linux machine I had to play around with it when I was running it on colab versus running it on my Windows machine so that's the only change that you do need to go and make if you're going to run it on a different type of machine so I'm going to comment that out and leave the windows bit open so what we'll effectively be doing is we'll be grabbing this string and then we'll be splitting it so if I type in dot split I'm going to split on the double backward slash so that we are now able to unpack the entire path because what we actually want to do is we want to grab this file name here because we're going to grab the matching alignments to that because the alignment will be called bba1 or l6n dot align and it's going to be in a slightly different folder so this magic that is happening here in these three lines is exactly what is happening so I'm then going and splitting on oh we're actually grabbing the last value which is index negative one so you can see that we've now got the file name there and then we're splitting again on a DOT so we've now got the file name and the file extension we can then grab the file name like so by grabbing the first index and you can see that there so that is grabbing the file name we're then appending it using os.path.join so we'll grab in the video path and the alignment path remember if we go and take a look at our data it's freezing up a bit all right so for inside of our data folder we've got a folder called alignments in S1 S1 contains all of our videos and alignments contains an S1 folder which contains all of our alignments with speaker one and we've only got one speaker because I've cut down the data set so if we actually go and run this load data function this should return our pre-processed videos as well as our pre-processed alignments which would which we should then be able to go and use inside of our deep learning function Okay cool so take a look so we've got a tensor return which is a 75 frames in length which is 46 pixels High by 140 pixels wide by one channel because we've gone and converted that to RGB if we go and take a look what do we have in our next cell so this is returning a mappable function let me just quickly run you through this first up so if we actually get what we're going to be getting back from this low data function is frames and then alignments so if we go and take a look at frames that is our frames data set so you can see that we've got the shape there if we wanted to go and plot out an example so I could run plot.im show and if we just grab one frame we should be able to show it so that's so you can actually see the person's mouth right there pretty cool right so this actually allows you to go and see all of the different frames that we're going to process and as we go through each one of these frames you're going to see the mouth move so if I jump ahead to frame 40 you can see that the lips are moving and this is the impact of subtracting the mean and the standard deviation we're really isolating these regions that you can see highlighted in yellow here now if we go and take a look at our alignments alignments this is the word representation of what is being said so if we actually go and run this through uh it should be numb to char which remember are our pre-processing functions that allow us to convert numbers to characters we should be able to go and pre-process this so if we grab the numpy value out of that boom you can see uh it's let's go and decode utf-8 and we might need to go and loop through 4X in we're going to return x dot d code utf-8 and what do we get in there it has no attribute decode we just used that up there let's just print it out oh I think we need to grab the numpy value um there we go all right and we could we should be able to go for decode utf-8 that doesn't want to do it all right there we go much better so this is the result of our transformation so you can see it says bin blue at L6 now so that is the result of actually going and Transforming Our alignment so I've just gone around and print but this is a long-winded way to go ahead and do it not exactly the most efficient but it is showing us our result so you can see there that we've got our final end result there's a way to condense this down as well so I think if it's tf.strings dot reduce join yeah unmatched what have we done there what is that enclosing boom there you go so that is the result of our transformation so you can see we've got B and blue at L6 now so that is us going and undoing all of our transformation The Raw representation of that is just this alignments there you go so each one of these individual characters but remember these just represent this specific sentence so being blue at L6 now okay that is our set of alignments done the last thing that we're going to do is we're going to wrap this inside of a mappable function which is going to allow us to return back flow 32 is in in 64s and we're also going to be able to use our raw string processing so this is just something that I noticed and one of the nuances when actually going and dealing with tensorflow data pipeline so typically if you want to go and use pure string processing you gotta wrap it inside of a tf.pi function so if we go and run that now the next thing that we're going to be able to go on ahead and do is create our data pipeline so let's quickly recap so what we've now gone and done is we have successfully gone and downloaded our data using gdown we've gone and created a pre-processing function to be able to go and load our video we've defined our vocabulary we've gone and defined a character to number function a number to character function a load alignments function a low data function we've then gone and tested it out using our test path and you can see that we are now returning back a bunch of frames which show our person's mouth which should effectively show the person's mouth moving when we stack all of these frames together we've also gone and converted our alignments so remember this was our raw alignments we've now converted it into an encoded sequence which will then be able to pass through to our machine learning model and we've also gone and created a mappable function which we're going to need for our data pipeline in a second alrighty let's jump back on over to our client so that's our data loaded right right ish we need to build a data pipeline this will be used to train the Deep learning model tensorflow will draw random samples from our data set in order to complete one training step oh okay anything else yeah we also needed to look at the data to make sure our Transformations have worked successfully nice off to the pipeline we go then so we're now on to creating our data pipeline so let's go on ahead and do this the first thing that we're going to go ahead and do is import matplotlib which I think we already had imported so this might be me oh we did all right that's me importing stuff multiple times ignore that we've already got it imported but most importantly we are going to be able to go and create our data pipeline so this is probably one of the most important bits out of this entire thing because creating the neural network is great but actually having a data pipeline is just as important so first things first we are going to create our data set and to do that we are running tf.data.dataset.list files so this is going to go inside of our data folder inside of our S1 folder and it's going to look for anything that ends in MPG which is the file format that our videos are currently stored in it's quite old but it still definitely does work we're then shuffling it using data.shuffle and we're specifying the cache size to 500 so this will grab the first 500 Shuffle those up and then return a value out of that we're then mapping it so we're going to take the raw file format so if I actually comment this out let me show you what it looks like so at its base file format if I run data dot as numpy iterator dot next this is just going to return a file path which is then going to be passed through to our load data function which is going to do the splitting and then we're going to run two sub functions which is load video and load alignments so that is exactly what we need to do by running the map function so even if I run Shuffle you're still only getting back files right or file paths this isn't returning data yet which is where the map function comes in so data equals data.map and we're running this mappable function the mappable function is just wrapping our load data function inside of a tf.pi function and this is going to allow us to work with our specific file path formats which now if I go and run this and if I go and run this particular cell we're actually going to get our data back so this is returning our frames and our alignments so if I go and take a look at frames boom that's our set of frames we can run plot dot IM show grab one set of frames boom uh we need to close this not boom boom all right so you can see that we're now getting our data back inside of our data pipeline if I go and take a look at our alignments boom you can say that we're now getting our alignments back okay so then what we want to do is we want to pad this out because right now we're going to have variable lengths for each one of these sets of alignments if I go and run this again you can see that one's a different length that one's a different length that one's a different length that one's a different length and this is because there's going to be a different number of characters inside of each one of these sets of alignments that we've got over here so what we can do is we can convert these to a padded batch so we've then gone and overridden our data Pipeline with data equals data dot padded batch we're batching into group sizes of two so each one of these is going to have two videos and two sets of alignments we're then patting out our shapes so we're not really going to pad out our videos we are going to ensure that we have 75 frames we're not going to pad out the actual image itself we're just going to ensure that we have 75 frames for each one of these videos and we're going to ensure that we have 40 tokens for each one of our alignments if there's less than 40 it's going to be padded out to zero and then we're prefetching to ensure that we optimize our data pipeline so that we're loading or pre-loading as our machine learning model is still training so if we go and run this full pipeline that is brilliant we can then go and run this particular line here which is going to now load two videos and two sets of alignments so you can see that we've now got two sets of alignments and you can see that we've got trailing zeros at the end because it is padding out our alignment likewise if we go and take a look at a frame you can say that we should have two frames now so if I type in land frames boom we've got two sets of videos inside of each batch okay that is looking brilliant now what we can actually go ahead and do is run through this so I'm going to run data dot as numpy iterator so this allows you to iterate through exactly the same as what we're doing up there you can see that by running dot next we're going to get a so Val 0 is going to be returning our frames then this is my favorite function right so imageio dot mimsafe actually converts a numpy array to a gif so if I go and run this line here so imageio dot mimsave it's actually going to take our data set which is what we've gone into find over here and it's going to grab the second value which should actually it's going to grab the second instance of our video so if I set I could set this to zero or one because we've got two sets of videos inside of each batch and it's going to convert it into a gif so if I actually go and run this inside of your folder now you should have a file called animation.gif and you can see that this is what our lipnet model is going to learn to decode so purely based on the gift that you're seeing it's going to learn to try to decode what the person is saying and convert this to text this is the amazing thing about this model because we'll actually be able to take nothing but this types of data and convert it into a sentence really and this is going to get even better once we go and convert this onto our data set which is probably going to come inside of another tutorial but I wanted to get this base one out okay so that is what image io.mimsave does we can then go and plot out our image which you've sort of seen already so plot.im show what we're doing here is we're grabbing our sample data set which we've just gone and created over here we're grabbing our video so index 0 is going to reference the so let me explain this indexing or this subscripting so the first zero is representing that we want our videos so that's what our first zero is referencing the second video is saying give me the first video out of the batch and then the third zero is telling me give me the so the third zero is giving me a return the first frame my head's blocking this frame in the video right so if we wanted to go and grab the last frame I could pass through 74 because remember we're going to have 74 or 75 frames per video that is the last frame in our video we could go and grab right in the middle which is going to be 35 which you can see is the mouth moving I could even go and grab the second video by changing this index to one here so you can see that that is a completely different video now and then we've also got our alignments which we went and took a look at so oh wow I didn't need to go and do all of that decoding I knew I had a way more efficient way to write it so tf.strings.reduce join we're then going and looping through every single word inside of our alignment and you can see that this is the end result so bin white by n two now which is going to be The annotation for our first video over here so if we actually go and take a look this is doing our this right now is grabbing our second video so if we went and created the GIF for the first video so that should have gone and done it if we've gone reopen our animation so this animation that you can see here is the representation that we're going to be passing through to our neural network which actually represents in white by N2 now so this is almost like moving chess pieces it's not chess pieces but that's sort of the feeling that you get right or that's sort of the set of commands that we're actually getting out or that a person is actually communicating back through and this animation so we're going to effectively produce the Deep learning model which takes this input and is able to Output this bin wide by N2 now pretty cool right anyway that is our data pipeline now built so kind of straightforward there so we've gone and created a tensorflow data set we've gone and tested it out using the dot as numpy iterator method and using the dot next method to grab the next instance out of our data pipeline we've also used imageio dot mimsave to convert a numpy array into a gif so that you can see what this actually looks like and we have also gone and taken a look at the pre-processed images as well as the pre-processed annotations alrighty that is our data Pipeline and now ready for training so remember we've now got a data pipeline over here we're not going to be splitting this out into training and validation although you definitely he could and now we're just going to be running on this particular data set best practice is you split this out into a training and validation partition but if you guys do this as part of the tutorial let me know and I'll add it to the GitHub repository okay that is our data pipeline now ready alrighty on to modeling I'm destined for the catwalk man bruh seriously though check out my palette face I'm the male embodiment of Bella Hadid yeah well we've got to build this model now we're going to use 3D convolutions to pass the videos and eventually condense it down to a classification dense layer which predicts characters so single Letters At A Time yep we'll use a special loss function called CTC AKA connectionist temporal classification to handle this output interesting why use that loss function well it works great when you have word transcriptions that aren't specifically aligned to frames given the structure of this model it's likely to repeat the same letter or word multiple times if we use a standard cross-entropy loss function this would look like our model's way up CTC is built for this and reduces the duplicates using a special token but our data set was aligned yeah you bang on but when it comes to eventually subbing out the data with data that we create it's going to be way more cost effective to Simply use non-align data our model is going to be ready for it ah got it after the catwalk then the next thing that we need to do as we told our client is actually design our deep neural network although we're not going to be on the catwalk we are going to be working in tensorflow so first things first we are going to go ahead and import our dependencies there's quite a fair few here so we've actually got Os Oh man I need to go and clean up some of those inputs so first up we're going to be importing the sequential model class so from tens loader Carousel models import sequential we're then going to be importing a bunch of layers so from tensorflow.keras.layers import con 3D so this is a 3D convolution the con 3D tensorflow absolutely brilliant when working with videos or we're going to be performing a 3D convolution or spatial convolution over volumes to use quite a fair bit for video processing or video classification in this that particular example that I was sort of looking at previously uh we are then going to be using an lstm so this is going to give us our current neural network eventually I want to convert this over to a Transformer neural network so that we've sort of moving over to state of the art we're using a dense layer Dropout layer a bi-directional layer to be able to go and convert or pass through our temporal component when we're using our lstm we are also using I think we need to clean this up but we've got maxpool 3D activation reshape spatial Dropout batch normalization time distributed and flattened I think I don't actually use all of those there might be leftovers from me prototyping this but we'll take a look we've then got our Optimizer so from tensorflow.cars.optimizes import atom and then we've also got our callback so from tensorflow.carastore callbacks import our model checkpoint so this is going to allow us to save down our model every X number of epochs I think we're doing it every single Epoch and our learning rate scheduler so ideally we don't we want to sort of start out fast and then slow down as we get to our optimization point or the minimum value of loss that we could potentially get to okay then we've got our neural network so this is a couple sets of convolutions we're then flattening it out using a Time distributed layer we've got a two sets of lstms and then we're using a dense layer to drop this out let me walk you through this so first up we're instantiating our model by running model dot sequential or model equal sequential we're then passing through a convolution with an relu activation with a Max pooling layer I could actually condense this down by just passing through activation over here that's perfectly okay this was again prototyping in its process so then I've gone and read a model dot add conf3d so we're going to have 128 com 3D convolution kernels these are going to be three by three by three in size our input shape is going to be 75 by 46 by 140 and that is the representation that we've got from our data so remember data dot as numpy iterator got next and if I grab the first value or the first value we should be able to go let's grab zero dot shape you can say it is 75 by 46 by 140x4 we're passing through that exact same shape into our neural network we're specifying padding equals same so that we preserve the shape of our inputs then we're using our relu activation to give us some non-linearity to our neural network and we are condensing this down using a 3D Max pooling layer so this is effectively going to take the max values inside of each one of our frames and it's going to condense it down between a two by two square so it's going to take the max value to be able to halve the shape of our inputs then we're doing pretty much the same three times so except the only difference is that we're then going to have 256 3D comms layers or three 3D con units and then 75 3D con units and then we've got this time distributed layer over here so this is effectively going to allow us to have 75 inputs into our lstm neural network so that we eventually will output 75 units which represents our text-based characters we've then got two lstm layers of 128 units we've got a specific form of Kernel initialization I actually found a great repo which shows the pure lipnet model and they were using orthogonal kernel initialization we also are going to be returning sequences so that our lstm layer does not just return a single unit returns all 75. we're also specifying bi-directional so we're passing our state from left to right and right to left because it is likely to impact how we actually go and translate this out and it's I believe it's best practice and what was originally done in the paper actually let me show it to you in the paper so if we actually scroll up so they've got there you go so they're using a group as opposed to an lstm so they've got a spatial convolutional neural network they've got a bi-directional Groove we've got a bi-directional lstm and again they're using CTC loss as well then we've got Dropout after each one of our lstm layers so we've got a little bit of regularization and we're dropping out 50 of the units so we've got to drop out after our lstm layer and we've got to drop up out of our other lstm layout and then we've got a dense layer which is going to take in I believe it's 46 units so just taking our vocabulary size plus one to be able to handle our special token the vocabulary size so you can see it's 40 so it'll be 41 outputs so this means that we're going to have our output should be let's take a look so it's going to be 75 by 45. this represents we're going to get one output per frame that we pass through and 45 is going oh 40 actually it's 41 not 45 and 41 is going to represent a one hot encoded representation of our final output now we are using a soft Max Activation so we're going to be able to use an ARG max value to be able to go and return the most likely value we're also going to be doing a little bit of sampling and I think we're using a greedy algorithm later on so that we get the most likely character returned back okay that is our deep neural network so let's go and create that have we not we haven't gone and imported this so let's import this let's go and create our neural network so that is instantiating right now beautiful and then what we can go ahead and do is run the summary so this shows you a little in a little bit better detail what we're actually building up let me zoom out so we've got our convolutional layers we've got our activation and another Max pooling layer again com 3D activation Max pooling con 3D activation Max pooling we've got our time distributed layer right so if you take a look at the last output that we're getting from our conv layers it's going to have the shape of 75 effectively think of this as 75 time steps by 5 by 17 by 75 so that is the last set of output now what we want to do is sort of preserve that temporal component so we keep it at 75 and then we're flattening this down so we've got 6375 here so this is just let me add another cell so this is just these sets of values flattened so it's 5 by 17 by what is it 75 boom 6375 which is what you've got there 6375 then these values are then passed through to our lstm layers which have got 256 units actually they've got 128 units but it's doubled because we're bi-directional so we've effectively got two sets of lstm layers there and then we're then passing that through to a dense layer we've also got our Dropout over here and our drop out over here and then we're passing it through to our dense layer which is going to Output as I said 75 frames by 41 outputs which are one hot encoder representations of our characters total parameters are 8.4 million so it's a it's live but it's definitely no chat GPT large in in that particular respect so that is our deep neural network now created so again you can step through this you can tweak it if you want if you come up with a better architecture by all means do let me know I actually saw on papers with code um code they are grid Corpus that somebody might have already built it using attention I haven't taken a look at this but you can see they've got a CTC attention model which would be really really interesting to take a look at but uh if you wanted to go and dig into that by all means do take a look there's also this model here which is on GitHub this was the official model so it is a brilliant example of how to go in ahead and build this up it is a little bit more hardcore to step through and it the architecture is a little bit different to mine but if you actually go into this particular GitHub repository brilliant implementation of lipnet mine just works through it doesn't use as much data and is a little bit more straightforward to walk through okay that is the model summary I'll include that link in the description below by the way all right cool so let's go and test this out so this is gonna suck at the moment but I like to always do this when I'm prototyping a neural network pass through some inputs just to see what we're outputting so right now if I go and grab our model and use the dot predict method we can pass through our validation or our original sample data which we stored as Val and if we go and pass through Val zero we should get some predictions back might take a little bit of time because we're first initializing it so it'll be loading into GPU memory now let's give it a sec perfect we now have a prediction so if we actually go and take a look at the result of that prediction you can see it's just returning random gibberish right now so that is actually what our model is predicting so three exclamation mark exclamation mark KKK bunch of exclamation marks a bunch of these bunch of exclamation marks a bunch of K's so nothing crazy there so we are just using exactly the same as what we did previously so tf.strings dot reduce join we're then passing through or we're using a greedy algorithm and just grabbing the maximum prediction return back if I show you the raw prediction what we're getting back so if I just grab one example so what we're getting back is a set of let's take a look at the shape so we're getting 75 outputs each one of these represented as an array with 41 values which is just a one hot encoder representation of our vocabulary so if I went and run TF dot ARG Max and said axis equals one this is returning back what our model is actually predicted so a bunch of characters so right now we're returning uh the second prediction there this is what it actually looks like right so a whole bunch of characters return back if we went and ran this through our num to character pipeline so four x in that and then if we were going uh what is it num to char plus through X you can say that we've got all of our characters there and if we run TF dot strings dot reduce join come on buddy boom boom boom right you can say that those are our predictions this is exactly the same as this almost identical right um slight difference in that I'm using ARG Max over here rather than over here but that is effectively showing us what our model is currently predicting this sucks right now we're going to make it way better okay we can also take a look at our model input shape and we can also take a look at our model output shape which we've already sort of given or extracted when we when nrenmodel.summary but that is our deep neural network now defined so if I scroll on back up what we've got to Define is we've gone and imported all of our core dependencies for our deep neural network we've then gone and defined our neural network over here which is giving us that model which has a 8.5 million parameters and we're able to go and pass through our frames to get a set of predictions back out which right now doesn't look so great but once but keep in mind we haven't actually gone and trained this so once we train it we're going to get much better predictions alrighty that is our deep neural network now defined let's jump back on over to our client we're on the home stretch just needed to find our loss function and a callback so we can see how the model is progressing nice well Chop Chop then get coding let's do it so we're now pretty close to training our model the first thing that we need to do is to find a learning rate scheduler so this is just basically going to give us a learning rate of whatever we're passing through if we are below 30 epochs if not we're going to drop it down using an exponential function alrighty cool that's now defined the next thing that we're doing is we're defining our CTC loss this particular block of code I'm going to give original credit to this automatic speech recognition model which I believe is defined somewhere a little further down was it from this one I believe it was where's their CTC lost CTC plus yeah it's over here so this basically allows us to use a similar method so they're passing in uh Audio Waves we're going to be passing through videos to be able to go in ahead and do this so what we're doing is we're taking through our batch length we're calculating our input length and our label length and then we're passing it through to tf.keras.backend.ctc batch cost so there isn't a ton of documentation on this which is funnily enough a lot of times you guys ask me Nick should I learn tensorflow or pytorch sometimes where I feel tensorflow falls short is in some of the documentation within some of this nuanced stuff so this is one example where I'd be like I wish I'd gone and learn pytorch but it definitely works very very well regardless of that fact Okay so we've got CTC loss defined there this is going to take in our y True Value our y pred value our input length which is consequently the length of our y prediction value which should be 75 and our label length which should be 40. so this is going to take in our y true predictions which is going to be our alignments this is going to be our one hot encoded predictions this is going to be the value 75 because it's going to be the same shape of the output of our machine learning model and then our label length over here is going to be 40. so that is our loss function defined then this is a lot of code but really what we're doing is we're going to be outputting a set of predictions so we're going to Output the original prediction or the original annotation and then the prediction itself in order to do that we're using a special function called tf.keras.backend.ctcd code which is specifically designed to decode the outputs of a CTC trained model which we'll also use to make a prediction down here so this is an example of a callback so I've written class produce example and then to that we are passing through the Keras callbacks function we are going to be subclassing that in order to be able to go and call this callback on every Epoch end so if we go and run that now this should allow us to compile our model so we're then grabbing our model we're running dot compile which is a typical standard python org which is a typical standard Keras graph call in order to compile our models so basically what we're saying is that we're going to be setting our Optimizer to an atom Optimizer with an initial learning rate of 0.0001 we're specifying our loss that's being defined as our CTC loss function which is what is defined over here so if we go and compile this no errors we're looking pretty good okay the next thing the next three things that we're doing uh we're just defining our callbacks so we've got one checkpoint callback which is going to save our model checkpoints so this is originally or we originally imported it right up here so we imported model checkpoint and learning rate scheduler we're now going to Define instances of these so model checkpoint is just defining where we're going to be saving our model so we should probably just create a folder for our model so I'm going to create a new folder call it model models and when our model trains we're going to save our example checkpoints to this particular folder so it's going to be saved inside of models it's going to be called checkpoint we're also going to monitor our loss and we're only going to save our weights which means we'll have to redefine our machine learning model in order to load up these weights we're then creating a scheduler this is effectively going to allow us to drop our learning rate each epoch so let's run our checkpoint callback and a schedule call back and then we're also defining our example callback which is going to make some predictions after each Epoch just to see how well our model is actually training so if we go and run that now all that's left to do is actually going ahead and fit our model so I'm going to bump up this number of epochs to 100 because the final model that I'm going to give you the weights for was as of epoch 96. so the last line is model.fit to that we pass through our data if you wanted to have some validation data you could actually just go and pass through validation data specify validation data here I'm not going to use validation data that is not necessarily best practice but you definitely could if you wanted to so the instances of actually keeping this relatively there's probably going to be in our tutorial regardless but you definitely could go and pass through validation data there if you wanted to so model.fit we're passing through our data we're specifying epochs as a hundred and then we're going and passing through all of our callbacks so we'll specify callbacks that we're passing through our checkpoint callback which is this it's basically just saying that we're going to save our model every Epoch a schedule callback which is going to drop our learning rate after we get to Epoch 30 and our example callback which is going to Output our predictions after each Epoch so you can actually see how well or how terrible our machine learning model is actually making predictions so we're actually not going to run it for the 400 I'm actually going to run it for a couple you'll see that it's actually training and then we're going to be able to load up a couple of checkpoints so let's kick this off all things holding equal we should be able to see our model training also this is being trained on a RTX 2070 super so the speed that you're seeing there is effectively that so you can see it's taking around about four minutes four and a half or five minutes per Epoch so let's give this a coupler ebooks and you'll or let's at least give this to a box and then you'll be able to see what the predictions look like a few minutes later alrighty that is true epochs now done so you can see that this is Epoch one and this is Epoch II now I wasn't happy about the fact that I didn't have a training and a testing data set so I went back to the data Pipeline and I added those steps let me show you what I did there so if we scroll on back up what I did is I added three lines of code to be able to go and create this so first up I said that we don't want to reshuffle after each iteration so I added that to the data.shuffle line and then I added these two lines here so first up we're creating a training Partition by taking the first 450 samples and then our testing partition is going to be anything after that so we're running data.skip to grab everything and assigning that to our testing partition then inside of the model.fit method I've just gone and pass through our train data and I've set validation underscore data to test I couldn't live with myself if I didn't actually go and split these out so I went and did it to show you how to do it so that you effectively have best practice because that's what we're all about here getting that little bit better each and every time all right cool let's actually take a look at what's happened so this is Epoch one here so what you're seeing is first up this is the loss for our training data set down here you've also got the loss actually this is the it's the same roughly the same thing this is the validation loss over here so our training loss is 69.0659 our validation loss is 64.34 so not too bad and not too far off if we scroll on down to Epoch 2 our training loss is a 65.58 and our validation loss is 61.24 our learning rate is still at a one a 0.0001 and you can actually see some predictions here now when I was first developing this I was thinking hold on is this just performing like absolute crap but what you'll actually see is that The Closer you get to around about 50 60 70 epochs the better this begins to perform so in this line you can see that this is the original set of transcription so Place blue in B7 soon and then this is what it predicted kind of crap then we've got down here so again kind of crap even though that this was our original prediction but give it a chance so once you actually get to around about 50 epochs the performance increases significantly and you'll actually see that it actually starts performing very very well this brings me to my next bit I've actually made the model checkpoints available to you so that you'll be able to go and leverage these yourself that being said you'll actually now have some checkpoints stored inside of the models folder thanks to the checkpoint uh the Callback that we've already gone and created so we'll actually be able to go on ahead and use these but for now let's jump back on over to our client and then we'll be wrapping this up It's the final countdown some might say we're in the end game now yep yes we are time to use this model to make some predictions let's roll alrighty we're in the final stages of this so we are now going to make some predictions with a model that is not so crap so first things first we are going to be downloading the checkpoints so the checkpoints that I've made available on Google Drive are after 96 epochs so again we're going to be using G down to go on ahead and download this and it is going to download a file called checkpoints dot zip into our root repository so if I go and run this it should start downloading it's around about 90 odd Megs in size as soon as you see it downloaded there so you can see it's downloading it's going to throw it into our models folder boom that's looking promising so it's now saved into our models folder it's gone and overridden our existing checkpoints those kind of sucked anyway so it's perfectly okay okay and then what we're going to do is we're going to load the checkpoints into our model that we just went and downloaded so using model.load weights we're going to load up the checkpoints that are inside of this models folder so if we're going to run this now that's loaded we can then go and grab a section of our data so let's actually go and grab test dot as numpy iterator we're going to grab another sample this is a little bit slow it's something that I noticed inside of the tensorflow data pipelines and I think it's because we're using the skip method and the take method so I think the skip method does take a little bit of time but it's perfectly okay just give it a sec it'll give you a sample back and then we'll be able to go in ahead and make a prediction let's give this a sec a few moments later alrighty we've now got some data back I just uh spent a couple of minutes scrolling through Twitter while I waited for that but we now have some data okay this is looking pretty promising so if we now go and grab a sample out of that oh we've already got a standby I don't know why this these two are redundant we've already got this data all right so this is our sample we can then pass our sample to our model.predict method boom that has made our prediction and if we go and decode this now take a look so these are our predictions so you can see it's actually going to written lay red with the L7 again lay green with G4 please all right drum roll please let's actually see what the actual predictions or what the actual text was take a look lay red with L7 again lay green with G4 please actually performed pretty well so take a look so this is actually the result of passing through these images so it's actually decoding relatively well now if we wanted to go and do this using another video then we'd actually be able to go and load this up using the streamlit app and that is what I am going to be doing inside of the code that episode if you guys want it comment below make sure you give it a like that this video um gets a little bit of air time but let's go and make another prediction so if we went and grabbed another sample so if we went around this again so those are our predictions and this is what we should really Swap this around so this is the real text these are the predictions so set so the original text was set green by J1 soon so you can see that we're actually grabbing data out of our out of our sample and over here we're grabbing the decoded set so this should actually be there to make it make a bit more sense all right so let's actually go and run another sample so if I go and run this run this so the original text was lay red at E3 soon and then the second sample was placed green by R3 again now if we go and run our decoder let's minimize that let's see what our model predicts oh we actually need to go run it through decoder boom take a look lay red at T3 soon so it said T3 soon our model our actual text was E3 soon so not too bad what about over here so play screen by R3 soon play screen by R3 again it's actually performing relatively well my personal thinking though is that this doesn't fully tie it together I still think we personally need the app so again if you reckon that we should go and do the app or build another tutorial for the app let me know in the comments below and we'll be able to build this up but this is actually making valid predictions so if we wanted to we could actually go and load up a new video and try to test this out so if we say load data and let's go and grab a video for example so if we go into this is completely unplanned so let's actually see if this works so if I go into S1 let's grab the path to this particular file so if I go and pass this through and we're going to say dot backwards slash data dot backwards H1 dot backslash start all right let's see how that goes we need to wrap this in TF dot convert tensor okay so that is our data set so this should be BBA f3s so let's go and find that video all right so if we go into videos actually we're not going into videos we're going into YouTube and then let's put my headphones on so I can hear this going to lip former data what do we get BB af3s S1 BBA f3s so this is the video every soon make this ladder pin blue every soon in blue at F3 soon so this is our data so we're actually going to have our sample and we are going to have our uh what is it our text so let's go and say let's add another section test on a video and this is going to be outside of our existing data pipeline so we'll test this out so we've now got a sample and we can pass through let's if we call it sample we should be able to just do this so if I copy this and then what we're going to do is we're going to pass through the sample to here actually if we just call this sample we should be able to use it without much change for that zero boom okay what's happened there uh that's probably because we've only got we need to wrap this so if we run tf.expandims and then pass through access equals zero boom that should make a prediction cool and then I'm going to copy this paste that there and I'm going to copy this paste that there and I'm going to copy this paste that below again okay let's minimize this what do we got so we've got our sample This Is Us custom loading a specific video which we just played which sounds like this every soon cool all right so we should expect this to print out binblue at F3 soon so if we now go and run this this should make a set of predictions if we take a look at why heart that's our set of predictions dot shape cool all right so we've got 75 by 41 so sample one is going to get out of real text so four sentence in Sample uh what have we done there cannot iterate over a scale it tensor so if we just go and wrap sample one should be fine for sentence in Sample sample one it doesn't like this because it is not wrapped so if we do this if I just wrap it inside of another set of arrays Okay cool so that is our real text so been blue at F3 soon let's go and validate that been blue F3 soon beautiful okay and then if we go and make our predictions and we need to go and run it through this first and I think we need to wrap this tf.expandims comma 1 comma zero nope no bueno what's happening transfers expects the vector of size four uh this is only going to be 75 over here what have we done but input is a vector of size three what's y hat returning a shape 1.7 that should be okay okay that worked in blue at j3 soon so it wasn't too far off that's not too bad so this is actually on our live video so if we're going to run this through this pipeline now so this is what was actually said so bin blew at F3 soon it predicted been blue at j3 soon so not so far off let's go and try another one so let's get uh freezing up come on what about this one p-r-a-c let's copy this path right so we're going to do a DOT backslash backward slash so you can actually do it on a separate video without having to use the data pipeline if we go and run this now take a look so the video said Place red at C6 now our model predicted Place red at C6 now let's go check this one out so prac 6n oh my gosh guys how cool is this p-r-a-c PPP RAC ing it it's half the battle battle p r a c c six and this one place to go to C6 now hold on dude show me it show me it plays Red at C6 now guys that is bang on play Square to C6 now plays Red it's it as if that isn't absolutely awesome so it's able to load up a video and decode it and use lip reading to actually go and transcribe this let's try another one um what about this one set blue in A2 please see that blue at A2 please so let's go and copy this name paste that in there all right so let's just quickly Play It Again keep in mind our model doesn't get any audio right it's just using that little GIF that I showed you to be able to decode this if I play it set blue and A2 please set blue at A2 please that's what we're expecting all right set blue at A2 please so this is our annotation set blue at A2 please said blue in A2 please how absolutely amazing is that let's go find another one um what about let's let's get one from up here what about l so I'll be ID 4p so again you could try out a whole bunch of these videos so I'll be uh let's go and play that video light blue in D4 please light blue in D4 please lay blue in D4 please so again let's go and take a look so this is the actual text so lay blue in D4 please to run it through our model we've already made our predictions here this should really be down there it's running through our model blade blue in D4 please guys absolutely amazing is that it's actually making valid predictions oh my gosh this is I get ecstatic every time I see this all right let's do another one uh bras9a been red at S9 again all right let's play that again been red at S9 again so it's bras9a I really want to build this app now all right so being read at S9 again that's it been read at S9 again right okay keep in mind it's not using any audio I run it through our model being read at S9 again how absolutely amazing is that so that is the liftnet model now built hopefully you've enjoyed this tutorial we've been through an absolute ton of stuff just to recap we started out by installing and importing our dependencies building our data loading function which we eventually did a little bit of tweaking too so that we would have a training and testing partition over here we did create our data pipeline we then went and built and designed our neural network which kind of mimics what was originally in the paper a little bit of a set of tweets and then what we went and did is we went and trained it using a custom loss function a custom callback as well as a learning rate scheduler and then last but not least we went and made some predictions and I just went and tested it out on a video so you've got the script to test this out on really whatever video that you want but ideally it's going to perform well on similar images or similar videos to what we've actually gone and trained on but we could definitely go and fine-tune let me know if you'd like that video for now thanks so much tune in I'll catch you in the next one happies thanks so much for tuning in guys hopefully you've enjoyed this video if you have be sure to give it a big thumbs up hit subscribe and that Bell it means the absolute world to me but it doesn't stop here we're going to be taking this to a next step if you want it should we learn how to be able to replace it with videos of ourselves to be able to go and lip read videos in general maybe we should convert this into a code that episode and build up a standalone app to be able to use this out there in the real world let me know in the comments below thanks so much for tuning in guys happy

37,238 views 3 Feb 2023 CodeThat!

Building a machine learning model that's able to perform lip reading! Get notified of the free Python course on the home page at https://www.coursesfromnick.com Sign up for the Full Stack course here and use YOUTUBE50 to get 50% off: https://www.coursesfromnick.com/bundl... Hopefully you enjoyed this video. 💼 Find AWESOME ML Jobs: https://www.jobsfromnick.com

Can an AI Learn Lip Reading?



ENTIRE TRANSCRIPT NO TIME  

Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. When watching science fiction movies, we often encounter crazy devices and technologies that don’t really exist, or sometimes, ones that are not even possible to make. For instance, reconstructing sound from vibrations would be an excellent example of that, and could make a great novel with the secret service trying to catch dangerous criminals. Except that it has already been done in real life research. I think you can imagine how surprised I was when I saw this paper in 2014 that showcased a result where a camera looks at this bag of chips, and from these tiny-tiny vibrations, it could reconstruct the sounds in the room. Let’s listen. Yes, this indeed sounds like science fiction. But 2014 was a long-long time ago, and since then, we have a selection of powerful learning algorithms, and the question is, what’s the next idea that sounded completely impossible a few years ago, which is now possible? Well, what about looking at silent footage from a speaker and trying to guess what they were saying? Checkmark, that sounds absolutely impossible to me, yet, this new technique is able to produce the entirety of this speech after looking at the video footage of the lip movements. Let’s listen. Wow. So the first question is, of course, what was used as the training data? It used a dataset with lecture videos and chess commentary from 5 speakers, and make no mistake, it takes a ton of data from these speakers, about 20 hours from each, but it uses video that was shot in a natural setting, which is something that we have in abundance on Youtube and other places on the internet. Note that the neural network works on the same speakers it was trained on and was able to learn their gestures and lip movements remarkably well. However, this is not the first work attempting to do this, so let’s see how it compares to the competition. The new one is very close to the true spoken sentence. Let’s look at another one. Note that there are gestures, a reasonable amount of head movement and other factors at play and the algorithm does amazingly well. Potential applications of this could be video conferencing in zones where we have to be silent, giving a voice to people with the inability to speak due to aphonia or other conditions, or, potentially fixing a piece of video footage where parts of the speech signal are corrupted. In these cases, the gaps could be filled in with such a technique. Look! Now, let’s have a look under the hood. If we visualize the activations within this neural network, we see that it found out that it mainly looks at the mouth of the speaker. That is, of course, not surprising. However, what is surprising is that the other regions, for instance, around the forehead and eyebrows are also important to the attention mechanism. Perhaps this could mean that it also looks at the gestures of the speaker, and uses that information for the speech synthesis. I find this aspect of the work very intriguing and would love to see some additional analysis on that. There is so much more in the paper, for instance, I mentioned giving a voice to people with aphonia, which should not be possible because we are training these neural networks for a specific speaker, but with an additional speaker embedding step, it is possible to pair up any speaker with any voice. This is another amazing work that makes me feel like we are living in a science fiction world. I can only imagine what we will be able to do with this technique two more papers down the line. If you have any ideas, feel free to speculate in the comments section below. What a time to be alive! Thanks for watching and for your generous support, and I'll see you next time!

LIP READ

פלטפורמת למידת מכונה מקצה לקצה

מצא פתרונות להאצת משימות למידת מכונה בכל שלב של זרימת העבודה שלך.

TensorFlow in 100 Seconds

Hearing aids could read lips through masks

Conceptual illustration of the proposed Lip-reading framework. The framework employs Wi-Fi and radar technologies as enablers of RF sensing based Lip-reading. A dataset comprising of vowels A, E, I, O, U and empty (static/closed lips) is collected using both technologies, with a face mask. The collected data is used to train ML and DL models.

Credit: Nature Communications (2022). DOI: 10.1038/41467-022-32231-1.

News • Healtchare devices

Hearing aids could read lips through masks

A new system capable of reading lips with remarkable accuracy even when speakers are wearing face masks could help create a new generation of hearing aids.

An international team of engineers and computing scientists developed the technology, which pairs radio-frequency sensing with Artificial intelligence for the first time to identify lip movements. The system, when integrated with conventional hearing aid technology, could help tackle the "cocktail party effect," a common shortcoming of traditional hearing aids.

Currently, hearing aids assist hearing-impaired people by amplifying all ambient sounds around them, which can be helpful in many aspects of everyday life. However, in noisy situations such as cocktail parties, hearing aids' broad spectrum of amplification can make it difficult for users to focus on specific sounds, like conversation with a particular person.

One potential solution to the cocktail party effect is to make "smart" hearing aids, which combine conventional audio amplification with a second device to collect additional data for improved performance.

While other researchers have had success in using cameras to aid with lip reading, collecting video footage of people without their explicit consent raises concerns for individual privacy. Cameras are also unable to read lips through masks, an everyday challenge for people who wear face coverings for cultural or religious purposes and a broader issue in the age of COVID-19.

In a new paper published in Nature Communications, the University of Glasgow-led team outline how they set out to harness cutting-edge sensing technology to read lips. Their system preserves privacy by collecting only radio-frequency data, with no accompanying video footage.

To develop the system, the researchers asked male and female volunteers to repeat the five vowel sounds (A, E, I, O, and U) first while unmasked and then while wearing a surgical mask.

As the volunteers repeated the vowel sounds, their faces were scanned using radio-frequency signals from both a dedicated radar sensor and a wifi transmitter. Their faces were also scanned while their lips remained still.

Then, the 3,600 samples of data collected during the scans was used to "teach" machine learning and deep learning algorithms how to recognize the characteristic lip and mouth movements associated with each vowel sound.

Because the radio-frequency signals can easily pass through the volunteers' masks, the algorithms could also learn to read masked users' vowel formation.

The system proved to be capable of correctly reading the volunteers' lips most of the time. Wifi data was correctly interpreted by the learning algorithms up to 95% of the time for unmasked lips, and 80% for masked. Meanwhile, the radar data was interpreted correctly up to 91% without a mask, and 83% of the time with a mask.

Dr. Qammer Abbasi, of the University of Glasgow's James Watt School of Engineering, is the paper's lead author. He said, "Around 5% of the world's population—about 430 million people—have some kind of hearing impairment.

"Hearing aids have provided transformative benefits for many hearing-impaired people. A new generation of technology which collects a wide spectrum of data to augment and enhance the amplification of sound could be another major step in improving hearing-impaired people's quality of life.

"With this research, we have shown that radio-frequency signals can be used to accurately read vowel sounds on people's lips, even when their mouths are covered. While the results of lip-reading with radar signals are slightly more accurate, the Wi-Fi signals also demonstrated impressive accuracy.

"Given the ubiquity and affordability of Wi-Fi technologies, the results are highly encouraging which suggests that this technique has value both as a standalone technology and as a component in future multimodal hearing aids."

Source: University of Glasgow

11.09.2022


ALL 5 STAR AI.IO PAGE STUDY


How AI & IoT Are Creating An Impact On Industries Today

Our  NEW Site  OFFERS FREE AI TOOLS & FREE STUDIES To SITES With  5-STAR Artificial  Intelligence TOOLS That Will HELP YOU Run YOUR BUSINESS Quickly & Efficiently & Increase YOUR SALES  

Hello and welcome to our new site that shares with you the most powerful web platforms and tools available on the web today

שלום וברוכים הבאים לאתר החדש שלנו המשתף אתכם בפלטפורמות האינטרנט והכלים החזקים ביותר הקיימים היום ברשת

גלה את האוסף האולטימטיבי של כלי AI.io 5 כוכבים לצמיחת העסק שלך ב-2022/3. שפר את היעילות והפרודוקטיביות שלך בחינם או שדרג ל-Pro לקבלת הטבות נוספות.

שחרר את הכוח של בינה מלאכותית עם מבחר הפלטפורמות והכלים המובחרים שלנו. קח את העסק שלך לגבהים חדשים ב-2022/3 עם הפתרונות האלה שמשנים את המשחק.

הרם את העסק שלך עם כלי ה-AI.io הטובים ביותר הזמינים באינטרנט. קבל את היתרון התחרותי שאתה צריך להצלחה ב-2022/3, בין אם תבחר באפשרויות חינמיות ובין אם אתה פותח תכונות מתקדמות עם חשבון Pro.

מחפשים פלטפורמות אינטרנט מתקדמות? אל תחפש עוד! הרשימה האוצרת שלנו של כלי AI.io מבטיחה חוויה של 5 כוכבים, ומעצימה את העסק שלך לשגשג ולהצליח ב-2022/3.

אמצו את עתיד הצמיחה העסקית עם פלטפורמות האינטרנט המופעלות על ידי בינה מלאכותית שלנו. מדורגים ב-5 כוכבים ומצוידים בתכונות מתקדמות, הכלים האלה יניעו את ההצלחה שלך ב-2022/3. חקור את האפשרויות עוד היום!

A Guide for AI-Enhancing YOUR Existing Business Application

A guide to improving your existing business application of artificial intelligence

מדריך לשיפור היישום העסקי הקיים שלך בינה מלאכותית

What is Artificial Intelligence and how does it work? What are the 3 types of AI?

What is Artificial Intelligence and how does it work? What are the 3 types of AI? The 3 types of AI are: General AI: AI that can perform all of the intellectual tasks a human can. Currently, no form of AI can think abstractly or develop creative ideas in the same ways as humans.  Narrow AI: Narrow AI commonly includes visual recognition and natural language processing (NLP) technologies. It is a powerful tool for completing routine jobs based on common knowledge, such as playing music on demand via a voice-enabled device.  Broad AI: Broad AI typically relies on exclusive data sets associated with the business in question. It is generally considered the most useful AI category for a business. Business leaders will integrate a broad AI solution with a specific business process where enterprise-specific knowledge is required.  How can artificial intelligence be used in business? AI is providing new ways for humans to engage with machines, transitioning personnel from pure digital experiences to human-like natural interactions. This is called cognitive engagement.  AI is augmenting and improving how humans absorb and process information, often in real-time. This is called cognitive insights and knowledge management. Beyond process automation, AI is facilitating knowledge-intensive business decisions, mimicking complex human intelligence. This is called cognitive automation.  What are the different artificial intelligence technologies in business? Machine learning, deep learning, robotics, computer vision, cognitive computing, artificial general intelligence, natural language processing, and knowledge reasoning are some of the most common business applications of AI.  What is the difference between artificial intelligence and machine learning and deep learning? Artificial intelligence (AI) applies advanced analysis and logic-based techniques, including machine learning, to interpret events, support and automate decisions, and take actions.  Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.  Deep learning is a subset of machine learning in artificial intelligence (AI) that has networks capable of learning unsupervised from data that is unstructured or unlabeled.  What are the current and future capabilities of artificial intelligence? Current capabilities of AI include examples such as personal assistants (Siri, Alexa, Google Home), smart cars (Tesla), behavioral adaptation to improve the emotional intelligence of customer support representatives, using machine learning and predictive algorithms to improve the customer’s experience, transactional AI like that of Amazon, personalized content recommendations (Netflix), voice control, and learning thermostats.  Future capabilities of AI might probably include fully autonomous cars, precision farming, future air traffic controllers, future classrooms with ambient informatics, urban systems, smart cities and so on.  To know more about the scope of artificial intelligence in your business, please connect with our expert.

מהי בינה מלאכותית וכיצד היא פועלת? מהם 3 סוגי הבינה המלאכותית?

מהי בינה מלאכותית וכיצד היא פועלת? מהם 3 סוגי הבינה המלאכותית? שלושת סוגי הבינה המלאכותית הם: בינה מלאכותית כללית: בינה מלאכותית שיכולה לבצע את כל המשימות האינטלקטואליות שאדם יכול. נכון לעכשיו, שום צורה של AI לא יכולה לחשוב בצורה מופשטת או לפתח רעיונות יצירתיים באותן דרכים כמו בני אדם. בינה מלאכותית צרה: בינה מלאכותית צרה כוללת בדרך כלל טכנולוגיות זיהוי חזותי ועיבוד שפה טבעית (NLP). זהו כלי רב עוצמה להשלמת עבודות שגרתיות המבוססות על ידע נפוץ, כגון השמעת מוזיקה לפי דרישה באמצעות מכשיר התומך בקול. בינה מלאכותית רחבה: בינה מלאכותית רחבה מסתמכת בדרך כלל על מערכי נתונים בלעדיים הקשורים לעסק המדובר. זה נחשב בדרך כלל לקטגוריית הבינה המלאכותית השימושית ביותר עבור עסק. מנהיגים עסקיים ישלבו פתרון AI רחב עם תהליך עסקי ספציפי שבו נדרש ידע ספציפי לארגון. כיצד ניתן להשתמש בבינה מלאכותית בעסק? AI מספקת דרכים חדשות לבני אדם לעסוק במכונות, ומעבירה את הצוות מחוויות דיגיטליות טהורות לאינטראקציות טבעיות דמויות אדם. זה נקרא מעורבות קוגניטיבית. בינה מלאכותית מגדילה ומשפרת את האופן שבו בני אדם קולטים ומעבדים מידע, לעתים קרובות בזמן אמת. זה נקרא תובנות קוגניטיביות וניהול ידע. מעבר לאוטומציה של תהליכים, AI מאפשר החלטות עסקיות עתירות ידע, תוך חיקוי אינטליגנציה אנושית מורכבת. זה נקרא אוטומציה קוגניטיבית. מהן טכנולוגיות הבינה המלאכותית השונות בעסק? למידת מכונה, למידה עמוקה, רובוטיקה, ראייה ממוחשבת, מחשוב קוגניטיבי, בינה כללית מלאכותית, עיבוד שפה טבעית וחשיבת ידע הם חלק מהיישומים העסקיים הנפוצים ביותר של AI. מה ההבדל בין בינה מלאכותית ולמידת מכונה ולמידה עמוקה? בינה מלאכותית (AI) מיישמת ניתוח מתקדמות וטכניקות מבוססות לוגיקה, כולל למידת מכונה, כדי לפרש אירועים, לתמוך ולהפוך החלטות לאוטומטיות ולנקוט פעולות. למידת מכונה היא יישום של בינה מלאכותית (AI) המספק למערכות את היכולת ללמוד ולהשתפר מניסיון באופן אוטומטי מבלי להיות מתוכנתים במפורש. למידה עמוקה היא תת-קבוצה של למידת מכונה בבינה מלאכותית (AI) שיש לה רשתות המסוגלות ללמוד ללא פיקוח מנתונים שאינם מובנים או ללא תווית. מהן היכולות הנוכחיות והעתידיות של בינה מלאכותית? היכולות הנוכחיות של AI כוללות דוגמאות כמו עוזרים אישיים (Siri, Alexa, Google Home), מכוניות חכמות (Tesla), התאמה התנהגותית לשיפור האינטליגנציה הרגשית של נציגי תמיכת לקוחות, שימוש בלמידת מכונה ואלגוריתמים חזויים כדי לשפר את חווית הלקוח, עסקאות בינה מלאכותית כמו זו של אמזון, המלצות תוכן מותאמות אישית (Netflix), שליטה קולית ותרמוסטטים ללמידה. יכולות עתידיות של AI עשויות לכלול כנראה מכוניות אוטונומיות מלאות, חקלאות מדויקת, בקרי תעבורה אוויריים עתידיים, כיתות עתידיות עם אינפורמטיקה סביבתית, מערכות עירוניות, ערים חכמות וכן הלאה. כדי לדעת יותר על היקף הבינה המלאכותית בעסק שלך, אנא צור קשר עם המומחה שלנו.

Glossary of Terms


Application Programming Interface(API):

An API, or application programming interface, is a set of rules and protocols that allows different software programs to communicate and exchange information with each other. It acts as a kind of intermediary, enabling different programs to interact and work together, even if they are not built using the same programming languages or technologies. API's provide a way for different software programs to talk to each other and share data, helping to create a more interconnected and seamless user experience.

Artificial Intelligence(AI):

the intelligence displayed by machines in performing tasks that typically require human intelligence, such as learning, problem-solving, decision-making, and language understanding. AI is achieved by developing algorithms and systems that can process, analyze, and understand large amounts of data and make decisions based on that data.

Compute Unified Device Architecture(CUDA):

CUDA is a way that computers can work on really hard and big problems by breaking them down into smaller pieces and solving them all at the same time. It helps the computer work faster and better by using special parts inside it called GPUs. It's like when you have lots of friends help you do a puzzle - it goes much faster than if you try to do it all by yourself.

The term "CUDA" is a trademark of NVIDIA Corporation, which developed and popularized the technology.

Data Processing:

The process of preparing raw data for use in a machine learning model, including tasks such as cleaning, transforming, and normalizing the data.

Deep Learning(DL):

A subfield of machine learning that uses deep neural networks with many layers to learn complex patterns from data.

Feature Engineering:

The process of selecting and creating new features from the raw data that can be used to improve the performance of a machine learning model.

Freemium:

You might see the term "Freemium" used often on this site. It simply means that the specific tool that you're looking at has both free and paid options. Typically there is very minimal, but unlimited, usage of the tool at a free tier with more access and features introduced in paid tiers.

Generative Art:

Generative art is a form of art that is created using a computer program or algorithm to generate visual or audio output. It often involves the use of randomness or mathematical rules to create unique, unpredictable, and sometimes chaotic results.

Generative Pre-trained Transformer(GPT):

GPT stands for Generative Pretrained Transformer. It is a type of large language model developed by OpenAI.

GitHub:

GitHub is a platform for hosting and collaborating on software projects


Google Colab:

Google Colab is an online platform that allows users to share and run Python scripts in the cloud

Graphics Processing Unit(GPU):

A GPU, or graphics processing unit, is a special type of computer chip that is designed to handle the complex calculations needed to display images and video on a computer or other device. It's like the brain of your computer's graphics system, and it's really good at doing lots of math really fast. GPUs are used in many different types of devices, including computers, phones, and gaming consoles. They are especially useful for tasks that require a lot of processing power, like playing video games, rendering 3D graphics, or running machine learning algorithms.

Large Language Model(LLM):

A type of machine learning model that is trained on a very large amount of text data and is able to generate natural-sounding text.

Machine Learning(ML):

A method of teaching computers to learn from data, without being explicitly programmed.

Natural Language Processing(NLP):

A subfield of AI that focuses on teaching machines to understand, process, and generate human language

Neural Networks:

A type of machine learning algorithm modeled on the structure and function of the brain.

Neural Radiance Fields(NeRF):

Neural Radiance Fields are a type of deep learning model that can be used for a variety of tasks, including image generation, object detection, and segmentation. NeRFs are inspired by the idea of using a neural network to model the radiance of an image, which is a measure of the amount of light that is emitted or reflected by an object.

OpenAI:

OpenAI is a research institute focused on developing and promoting artificial intelligence technologies that are safe, transparent, and beneficial to society

Overfitting:

A common problem in machine learning, in which the model performs well on the training data but poorly on new, unseen data. It occurs when the model is too complex and has learned too many details from the training data, so it doesn't generalize well.

Prompt:

A prompt is a piece of text that is used to prime a large language model and guide its generation

Python:

Python is a popular, high-level programming language known for its simplicity, readability, and flexibility (many AI tools use it)

Reinforcement Learning:

A type of machine learning in which the model learns by trial and error, receiving rewards or punishments for its actions and adjusting its behavior accordingly.

Spatial Computing:

Spatial computing is the use of technology to add digital information and experiences to the physical world. This can include things like augmented reality, where digital information is added to what you see in the real world, or virtual reality, where you can fully immerse yourself in a digital environment. It has many different uses, such as in education, entertainment, and design, and can change how we interact with the world and with each other.

Stable Diffusion:

Stable Diffusion generates complex artistic images based on text prompts. It’s an open source image synthesis AI model available to everyone. Stable Diffusion can be installed locally using code found on GitHub or there are several online user interfaces that also leverage Stable Diffusion models.

Supervised Learning:

A type of machine learning in which the training data is labeled and the model is trained to make predictions based on the relationships between the input data and the corresponding labels.

Unsupervised Learning:

A type of machine learning in which the training data is not labeled, and the model is trained to find patterns and relationships in the data on its own.

Webhook:

A webhook is a way for one computer program to send a message or data to another program over the internet in real-time. It works by sending the message or data to a specific URL, which belongs to the other program. Webhooks are often used to automate processes and make it easier for different programs to communicate and work together. They are a useful tool for developers who want to build custom applications or create integrations between different software systems.


מילון מונחים


ממשק תכנות יישומים (API): API, או ממשק תכנות יישומים, הוא קבוצה של כללים ופרוטוקולים המאפשרים לתוכנות שונות לתקשר ולהחליף מידע ביניהן. הוא פועל כמעין מתווך, המאפשר לתוכניות שונות לקיים אינטראקציה ולעבוד יחד, גם אם הן אינן בנויות באמצעות אותן שפות תכנות או טכנולוגיות. ממשקי API מספקים דרך לתוכנות שונות לדבר ביניהן ולשתף נתונים, ועוזרות ליצור חווית משתמש מקושרת יותר וחלקה יותר. בינה מלאכותית (AI): האינטליגנציה שמוצגת על ידי מכונות בביצוע משימות הדורשות בדרך כלל אינטליגנציה אנושית, כגון למידה, פתרון בעיות, קבלת החלטות והבנת שפה. AI מושגת על ידי פיתוח אלגוריתמים ומערכות שיכולים לעבד, לנתח ולהבין כמויות גדולות של נתונים ולקבל החלטות על סמך הנתונים הללו. Compute Unified Device Architecture (CUDA): CUDA היא דרך שבה מחשבים יכולים לעבוד על בעיות קשות וגדולות באמת על ידי פירוקן לחתיכות קטנות יותר ופתרון כולן בו זמנית. זה עוזר למחשב לעבוד מהר יותר וטוב יותר על ידי שימוש בחלקים מיוחדים בתוכו הנקראים GPUs. זה כמו כשיש לך הרבה חברים שעוזרים לך לעשות פאזל - זה הולך הרבה יותר מהר מאשר אם אתה מנסה לעשות את זה לבד. המונח "CUDA" הוא סימן מסחרי של NVIDIA Corporation, אשר פיתחה והפכה את הטכנולוגיה לפופולרית. עיבוד נתונים: תהליך הכנת נתונים גולמיים לשימוש במודל למידת מכונה, כולל משימות כמו ניקוי, שינוי ונימול של הנתונים. למידה עמוקה (DL): תת-תחום של למידת מכונה המשתמש ברשתות עצביות עמוקות עם רבדים רבים כדי ללמוד דפוסים מורכבים מנתונים. הנדסת תכונות: תהליך הבחירה והיצירה של תכונות חדשות מהנתונים הגולמיים שניתן להשתמש בהם כדי לשפר את הביצועים של מודל למידת מכונה. Freemium: ייתכן שתראה את המונח "Freemium" בשימוש לעתים קרובות באתר זה. זה פשוט אומר שלכלי הספציפי שאתה מסתכל עליו יש אפשרויות חינמיות וגם בתשלום. בדרך כלל יש שימוש מינימלי מאוד, אך בלתי מוגבל, בכלי בשכבה חינמית עם יותר גישה ותכונות שהוצגו בשכבות בתשלום. אמנות גנרטיבית: אמנות גנרטיבית היא צורה של אמנות שנוצרת באמצעות תוכנת מחשב או אלגוריתם ליצירת פלט חזותי או אודיו. לרוב זה כרוך בשימוש באקראיות או בכללים מתמטיים כדי ליצור תוצאות ייחודיות, בלתי צפויות ולעיתים כאוטיות. Generative Pre-trained Transformer(GPT): GPT ראשי תיבות של Generative Pre-trained Transformer. זהו סוג של מודל שפה גדול שפותח על ידי OpenAI. GitHub: GitHub היא פלטפורמה לאירוח ושיתוף פעולה בפרויקטי תוכנה

Google Colab: Google Colab היא פלטפורמה מקוונת המאפשרת למשתמשים לשתף ולהריץ סקריפטים של Python בענן Graphics Processing Unit(GPU): GPU, או יחידת עיבוד גרפית, הוא סוג מיוחד של שבב מחשב שנועד להתמודד עם המורכבות חישובים הדרושים להצגת תמונות ווידאו במחשב או במכשיר אחר. זה כמו המוח של המערכת הגרפית של המחשב שלך, והוא ממש טוב לעשות הרבה מתמטיקה ממש מהר. GPUs משמשים סוגים רבים ושונים של מכשירים, כולל מחשבים, טלפונים וקונסולות משחקים. הם שימושיים במיוחד למשימות הדורשות כוח עיבוד רב, כמו משחקי וידאו, עיבוד גרפיקה תלת-ממדית או הפעלת אלגוריתמים של למידת מכונה. מודל שפה גדול (LLM): סוג של מודל למידת מכונה שאומן על כמות גדולה מאוד של נתוני טקסט ומסוגל ליצור טקסט בעל צליל טבעי. Machine Learning (ML): שיטה ללמד מחשבים ללמוד מנתונים, מבלי להיות מתוכנתים במפורש. עיבוד שפה טבעית (NLP): תת-תחום של AI המתמקד בהוראת מכונות להבין, לעבד וליצור שפה אנושית רשתות עצביות: סוג של אלגוריתם למידת מכונה המבוססת על המבנה והתפקוד של המוח. שדות קרינה עצביים (NeRF): שדות קרינה עצביים הם סוג של מודל למידה עמוקה שיכול לשמש למגוון משימות, כולל יצירת תמונה, זיהוי אובייקטים ופילוח. NeRFs שואבים השראה מהרעיון של שימוש ברשת עצבית למודל של זוהר תמונה, שהוא מדד לכמות האור שנפלט או מוחזר על ידי אובייקט. OpenAI: OpenAI הוא מכון מחקר המתמקד בפיתוח וקידום טכנולוגיות בינה מלאכותית שהן בטוחות, שקופות ומועילות לחברה. Overfitting: בעיה נפוצה בלמידת מכונה, שבה המודל מתפקד היטב בנתוני האימון אך גרועים בחדשים, בלתי נראים. נתונים. זה מתרחש כאשר המודל מורכב מדי ולמד יותר מדי פרטים מנתוני האימון, כך שהוא לא מכליל היטב. הנחיה: הנחיה היא פיסת טקסט המשמשת לתכנון מודל שפה גדול ולהנחות את הדור שלו Python: Python היא שפת תכנות פופולרית ברמה גבוהה הידועה בפשטות, בקריאות ובגמישות שלה (כלי AI רבים משתמשים בה) למידת חיזוק: סוג של למידת מכונה שבה המודל לומד על ידי ניסוי וטעייה, מקבל תגמולים או עונשים על מעשיו ומתאים את התנהגותו בהתאם. מחשוב מרחבי: מחשוב מרחבי הוא השימוש בטכנולוגיה כדי להוסיף מידע וחוויות דיגיטליות לעולם הפיזי. זה יכול לכלול דברים כמו מציאות רבודה, שבה מידע דיגיטלי מתווסף למה שאתה רואה בעולם האמיתי, או מציאות מדומה, שבה אתה יכול לשקוע במלואו בסביבה דיגיטלית. יש לו שימושים רבים ושונים, כמו בחינוך, בידור ועיצוב, והוא יכול לשנות את האופן שבו אנו מתקשרים עם העולם ואחד עם השני. דיפוזיה יציבה: דיפוזיה יציבה מייצרת תמונות אמנותיות מורכבות המבוססות על הנחיות טקסט. זהו מודל AI של סינתזת תמונות בקוד פתוח הזמין לכולם. ניתן להתקין את ה-Stable Diffusion באופן מקומי באמצעות קוד שנמצא ב-GitHub או שישנם מספר ממשקי משתמש מקוונים הממנפים גם מודלים של Stable Diffusion. למידה מפוקחת: סוג של למידת מכונה שבה נתוני האימון מסומנים והמודל מאומן לבצע תחזיות על סמך היחסים בין נתוני הקלט והתוויות המתאימות. למידה ללא פיקוח: סוג של למידת מכונה שבה נתוני האימון אינם מסומנים, והמודל מאומן למצוא דפוסים ויחסים בנתונים בעצמו. Webhook: Webhook הוא דרך של תוכנת מחשב אחת לשלוח הודעה או נתונים לתוכנית אחרת דרך האינטרנט בזמן אמת. זה עובד על ידי שליחת ההודעה או הנתונים לכתובת URL ספציפית, השייכת לתוכנית האחרת. Webhooks משמשים לעתים קרובות כדי להפוך תהליכים לאוטומטיים ולהקל על תוכניות שונות לתקשר ולעבוד יחד. הם כלי שימושי למפתחים שרוצים לבנות יישומים מותאמים אישית או ליצור אינטגרציות בין מערכות תוכנה שונות.

WELCOME TO THE

5 STAR AI.IO

TOOLS

FOR YOUR BUSINESS

TRANSCRIPT

PART - 1 

PART - 2 

ENTIRE TRANSCRIPT 

ENTIRE TRANSCRIPT NO TIME 

ENTIRE TRANSCRIPT NO TIME