The sounds of a language are its most basic elements, with writing being no more than a documented sound and grammar existing simply to organize it and words to attach meaning to it. Quite simply, everything is simply a build off of the sounds, or phonology of a language. The phonotactics are the next layer on top of phonology, governing how sounds interact with each other. Only once these two parts of a language are created can you create vocabulary, grammar, any meaningful writing system, and any other system interacting with your language.
There are languages that achieve this basic foundation without the use of sounds, though they never achieve it without the use of a phonology. Examples of this would be sign language, where the phonemes are not sounds but motions, and any coding language, where fundamentally the code speaks with only two phonemes, the binary system of 1 and 0s. Even these languages are built around their phonemes, which are going to be in our case sounds. Sound is simply the best medium for humans to communicate with.
Motion is slow and tyring, and while it may work you can only communicate with one person at a time and you must be looking directly at them. This is a huge obsticle and reduces the range of communication to only maybe 50 meters maximum with absolutly no visual obsticals. It is also much more common for humans to develope problems with eyesight than hearing.
Sent is clearly a widespread form of communication in the animal kingdom, but for human level intelligence it simply falls short. There is really just an endless number of obvious examples for why this would suck.
Touch sucks for all the same reasons motion does, and wow do you have not only a limited communication range but you must have a fundimental trust in the person who you are communicating with, given that they are touching you.
Taste would simply be stupid.
And so we are left with sound; it can travel around and through barriers, has the longest range, is not effected by if you are looking at the other person, for the most part humans keep their hearing their whole lives, and it takes much less effort to move your mouth and exhale than literally any of the options. We have a winner.