Solutions

Congratulations on finishing exercises! Let me walk you through solutions.

(1-1) Among seven options, I decided to use "title.basics.tsv " and thought I can reshape this dataset because it has "tconst (string)," alphanumeric unique identifier of each title and "genres (string array)" which are coded up to three for each title. In other words, each title has minimum one to maximum three number of genres, and those genres are a group of variables/information nested to each title (a unit of analysis) following a specific order (in this case, first/second/third tag of genres of the given title).

As you would have learned from cheat sheets, reshaping wide to long or long to wide both requires this type of structure.

The left picture below is the dataset in its original format, and the right pictures show wide and long version of the dataset what I am going to set.

(1-2) Among other six options, I would pick "title.crew.tsv"which contains the director and writer information for all the titles in IMDb. In other worlds, director(s) (array of nconsts) and writer(s) (array of nconsts) are nested to the given title identified by alphanumeric unique identifier ("tconst").

(2) The dataset is not ready because A. reshaping command requires numeric unique identifier while the original version has alphanumeric unique identifier (string variable). B. genres should be coded as multiple variables sharing a stubname. For now, a single "genres" has all the information (e.g. "Adventure, Drama, Fantasy") and we need three genre variables (e.g. "genres1" "genres2" "genres3") sharing a stubname (e.g. "genres").

To solve this issue, I would make a numeric id first and then split "genres" variable into three.

Click here to see data cleaning process and its result.

(3) Restructure the dataset from wide to long and preserve.

Basic syntax of reshaping from wide to long is:

  • reshape long stubnames, i(unit of analysis) j(order info of a group of variable sharing the stub names e.g. year)

So the codes would be:

  • reshape wide genres, i(id) j(genres_123)

Type preserve so that we can get back to here again after the transformation of data structure.

  • preserve

Let's check the result.

  • sort id old genres_123 genres
  • list id old genres_123 genres

(4) Restructure the dataset again from long to wide.

The command is almost the same, except now we will type "long" instead of "wide."

  • reshape long genres, i(id) j(genres_123)
  • sort id old genres1 genres2 genres3
  • list id old genres1 genres2 genres3

(5) Lastly, let's try restore and see what happens.

  • restore
  • sort id old genres_123 genres
  • list id old genres_123 genres

Restore will bring back the long data structure you had obtained in (3). (You may also use preserve command before you conduct initial transformation in (3))