06 - Strings

Strings (simple explanation to start with)

The C programming language is closer to the machine than a higher-level programming language like Python. In general this means the coding can more directly affect parts of the machine (memory, tasks, etc) but is more difficult for the user to write and manage. One large benefit of C over a language like Python is the ability to directly manipulate memory and manage how much memory you are using and where it is stored. This allows you to be more purposefully efficient and for programs to run faster. For example, the program doesn't have to spend time re-allocating memory if one of the arrays starts getting to large, which is what happens behind the scenes for arrays in Python. This is just one example of how directly manipulating memory can be beneficial in improving speed and efficiency.

A string in C is actually an array (like a list in Python) of single characters.

We will briefly go into the manipulation of string arrays below, but go more in depth into arrays in a later lesson. At that time we will look at functions that can be used with arrays in general or strings specifically. At the moment we will only look at the very basics of a string in C, which will be: declaring a variable, taking in a string as an input, and printing a string as an output.

As a string in C is an array (like a list in Python), it has a number of elements. This number of elements is given in square brackets [ ] after the variable name. However, you may leave the brackets empty if you are setting it equal to something as it will automatically count the number of elements from the string in quotations that you are setting it equal to. Note: If you are putting in the number of elements the array will have, include one extra element beyond the number you normally count. This element is the null element signaling that the array has ended. Examples: the string "abc" would actually contain four elements number as such: element 0 is 'a', element 1 is 'b', element 2 is 'c' and element 3 is the null character signalling the end of the array.

Declaring/Creating a String Variable

Since a string is really an array of characters, we create it like this:

char variable_name[4] = "Sam";

The data type is "char" for character.

Following the variable name is the square brackets with the number of elements in the array.

Then you can set it equal to a value.

Example #1:

char my_first_string[100]="Hello World!";

// Notice that I can base a huge array and use only the first 12 element slots of it

Example #2:

char my_first_string[] = "Hello World!";

// Here I let the program count how many elements are needed

Example #3:

char my_first_string[100];

// Here I create a string and tell it how many slots to save in memory, but I do not set the strings value

Example #4: (ERROR!!!)

char my_first_string[];

// Here I create a string but leave it's size blank.

// Since there is no value for the computer to count, it does not know how much memory to reserve and gives an error.

/*****************************************************************************************************************************

* Note that the examples #1 and #2 above are nearly equivalent (except I reserve much more memory in the first one)

* However, I could not run both of these in the same program.

* As you have already created a variable named "my_first_string", running both example #1 then example #2 would give you an error.

* If you want to later change the value of a variable, you would leave out the data type.

* Example of this below.

*****************************************************************************************************************************/

Bad example that would have an error (using an integer-type variable):

int number = 5;

int number = 6;

Good example that would correctly change the value of the variable:

int number = 5;

number = 6;

Calling on a String

You use "char" to declare a string because it is really just an array/list of single characters being stored as a group. However, after using "char" to declare the array, from then on it is a string for all intents and purposes. You use %s as the placeholder for inputs (scanf) and outputs (printf).

Example:

char my_first_string[] = "Hello World!";

printf("My first string is: %s \n", my_first_string);

Try it out yourself.

Activity 1

Create/declare five strings (examples - a names, a noun, a place, and two verbs in past tense). Then print one to two complete sentence(s) using all five of these strings.

Solution: Link

Taking in User Inputted String

As you start to take in user input there are a few things to take into consideration. Two main notes really come to mind. The first is that you cannot declare/create a string of unknown length. You can either declare a size yourself like string[20] or you can have the computer count it like string[]="word". Below is an example of code that will give you an error.

Example (ERROR!)

char my_first_string[]; //Note - This will give you an error as you must declare the size of the array/string/list you are creating.

If you want to take in a user inputted string, you need to create/declare a string that is large enough to hold it. For example, if you are expecting a word to be entered that might be around 4-10 letters long, then create a string of length 20 or 30 so you are sure you have enough space.

Example:

char my_string[20];

printf("Please enter a word.\n");

scanf("%s",&my_string);

// Note - As opposed to primitive data types where "variable" refers to its value and "&variable" refers to its memory location,

// A string is actually an object and both "my_string" and "&my_string" refer to its memory location

// As such, the '&' is unnecessary when working with a string type

The second important note is that white space interrupts the "scanf" function. For example, if you enter your name as "Happy Gilmore" it will store "Happy" as your first string and "Gilmore" a second string still waiting in the input buffer. Even if you have "printf" functions before your next "scanf" function call, it will fill the next "scanf" function call with the second string. Check out the program below. Enter an input with a space in it and see how it goes right through the following "printf" function and prefills the following "scanf" function.

Activity 2

Create/declare a string of characters 20 characters long (this is more than you need, which is safe).

Ask the user to input a word (less than 20 characters long and no spaces), and save this word in your string.

Print your string.

Solution: Link

Activity 3

Create/declare three strings of characters 20 characters long.

Ask the user to input three words, one at a time.

Ask for a name, a noun, and a verb in past tense.

Print a sentence using these three user inputted words.

Solution: Link

Taking in String With Spaces

In general, scanf uses spaces to separate user inputs. The easiest way to scan an entire line is by telling the scanf function to keep grabbing characters until it encounters the new-line character ( '\n ). This basically means it will grab everything up until the user hits the "enter" key. The line below does this.

scanf("%[^\n]",&string1); \\ Missing a piece

There are one big issue with this:

  1. In general, there will be a new-line character still hanging in the input buffer for the last entry (if any scanf came before), so this will end your string being scanned for before it starts. Fix this by adding a %*c at the end of your last input to remove the '\n' from the buffer.

  2. The code above itself will leave a new-line character hanging in the input buffer after grabbing your string, so as mentioned in #1, include a %*c at the end to grab that character.

Here is a broken example, where you scan for the entire line but leave the new-line character '\n' in the input buffer - Broken Example Link

Here is the modified code that grabs the new-line character and deletes it. In this way, your next scanf command will read until the new-line character at the end of the line (rather than stopping at the first character).

scanf("%[^\n]%*c",&string1); \\ By adding this %*c at the end of each scanf command, you remove that new-line character from the input buffer

Example:

Accessing/Altering the Elements of a String/Array

Just as with a list in Python, you access an element of a string/array by indicating the index number of the item in the square brackets.

Remember three things:

    1. Indexes start at zero

    2. When you access an element of a string, you now have a character data type and will use %c instead of the %s that refers to an entire string.

    3. The last character of your string is the null character signifying the end of the list

Example:

char my_word[]="funny";

printf("The entire string is %s \n", my_word);

printf("The first letter of the word is %c \n", my_word[0]);

printf("The second letter of the word is %c \n", my_word[1]);

Below is the same example in repl.it

Activity 4

Create/declare a string of at least 10 characters (this is more than you need, which is safe).

Ask the user to input a five-letter word and save this word in your string.

On separate lines, print each letter of the five-letter word using complete sentences.

Example: The first letter of the word is: 'w'

Solution: Link

Using the String Header Functions (#include "string.h")

Since a string is really an array, you cannot use basic operators as you can with primitive data types. For example, to set a singe character of a string to a new value you simply do this:

char string[]="happy"; // Set the string to be "happy"

string[1] = 'i'; // Now the string is "hippy"

This works because a single element of a string is a character, a primitive data type that can be set using the '=' operator. Since a string is really an array of values, this does not work with strings. The command below would cause an error:

char string[]="happy";

string = "hippy"; // ERROR!! - Can't set all of the values of an array in this way.

As such, working with strings is a little bit harder than working with primitive data types. Wonderful programmers that came before you have built a library of useful functions and shared it with the world. The string header "string.h" has many of these amazing functions in it. Here are a few:

Below are examples of these commonly used functions:

Common Mistakes

    1. Mixing up data types (%c instead of %s). After you once declare/create a string as an array of characters ( char variable[20] ), forever after this you refer to the variable as a string. The "variable" is not a single character; you created a string (an array of characters). If you use %c instead of %s, you will get a non-sense character like a question mark box or just a random letter/symbol that does not seem correct. You are telling the program to interpret the stored 1's and 0's as a character instead of as a string, so it will read the 1's and 0's and see if there is a character like that. Sometimes there is and you get a random letter/symbol but other times there isn't and you get a question mark. Example of this.

    2. Declaring a blank sized string, causing an error ( char variable[]; ). You cannot be ambiguous when programming. You must declare the exact amount of memory a string needs. You can either set it yourself such as ( char variable[20]; ) or you can tell the program to count how many spaces are needed when give it a starting value ( char variable[] = "happy"; ). If you leave it blank and do not set it to a value, you are essentially saying "Hey program, reserve some amount of memory for me". It says "How much memory?" and you say "IDK". This causes the program to send you red error messages and not function.

    3. Forgetting that scanning a string, white spaces stops the string and separates it into two strings. When using the scanf function, spaces are read as separating multiple string inputs. If you ask for someones name and scan for a string (%s), it will only read their first name. If the user enters two names such as Brendan Dilloughery, then the "Brendan" is saved as the string input and the " Dilloughery\n" is still in the input buffer. This will cause issues because when you call on the scanf function again, it will suck up this prefilled stuff in the input buffer rather than waiting for the user to enter an answer. This is discussed in more detail above. Example of this.

    4. Trying to change the entire value of a string with the '=' operator. Once you create a string ( char name[] = "Sam" ), it is now an array. Arrays in C are not as easy to deal with as Lists in Python. You cannot use simple operators such as "=", "==" or "+" with Arrays. These operations take many more steps to do manually, or you can use common functions that your predecessors created such as the strcpy function that copies each single value of one string to the slots of another string. This strcpy function is how you will set/reset the values of a string. Example, if you create a string ( char name[100] = "Sam" ) and want to change the value to something new you can simply call on the strcpy function with the string's variable name as the first input and the newly desired value as the second input ( strcpy(name,"Joey") ). Example of this.

    5. Declaring a string of size 3 and then using strcpy to put a longer string in its place. If you create a string of length three characters ( char name[] = "Sam" ) and then use the strcpy function to set this variable to a longer string, there are problems. Basically, you have told the computer to only reserve 3 slots for your array. The rest of the memory is fair game for the program to erase/manipulate. When you use strcpy to place a long string into a small array, it seems to work. However, future lines of code may be written over those extra memory slots at any time. So though your string may seem correct one moment, the second half of the string may be altered/erased at any time. This is a very random kind of problem, so it is hard to give a concrete example that always produces an error. Here is an example of what I mean. To avoid this issue, you can give strings you create a generous amount of memory. This allows you to modify/alter strings later on without worrying about this issue. Example: ( char name[100] = "Sam" ). Go to the previous example and simply add the '100' in the brackets at the start of the program and see it fixed.