Lecture 14

For today

1. Turn in Notebook 8
2. Work on your project

Today

1. Logarithmic algorithms
2. Project report style guide
3. Project time

For next time:

1. Read Chapter 9 and do the reading quiz
2. Turn in your final report

Log time algorithms

I have asserted that Python's sort algorithm and FFT take n log n time.

And you might know about bisection search, which is implemented in

1) the bisect module and

2) NumPy as searchsorted

3) SciPy as scipy.interpolate.interp1d

In empiricaldist, I use interp1d to implement Cdf.forward and Cdf.inverse.

Bisection search takes log n time. Here's how it works.

So how do we get logarithmic time?

Now let's talk about mergesort.

1) Open notebooks/mergesort.ipynb

2) Read through the text and code, and work on the exercises.

Then we'll do some order of growth analysis.

Final reports

The goal, audience, content, and format of the final report should be the same as for the draft; see Lecture 11.

The logic of the report should be QMRI, not a chronological narrative of the project.

There should be links from the final report to

1) a static version of your notebook on NBViewer (not GitHub) and

2) a runnable version on Binder (link should go to the notebook, not the top level of the repository).

Let's review the style guide.

Reproducibility

How difficult was it for you to replicate the results from the papers you read?

What issues did you run into?

What could you do to make it easier for someone else to replicate what you did?

Start here: http://lorenabarba.com/gallery/reproducibility-pi-manifesto/

Report feedback from last time

Logical flow

The biggest issue is "logical flow", by which I mean that the paper holds up to the following test:

1) Start with a mental model of someone who knows something about complexity science, but nothing about your work or the paper you are replicating.

2) Read the first sentence. Does it make sense to your imaginary reader? Add this knowledge to the mental model.

3) After the first sentence, what is the primary question the imaginary reader wants answered? Does the second sentence answer it?

4) Repeat steps 2 and 3.

Some consequences of applying this test:

1) The most important unanswered question is almost always "Why"?

2) It is critical to define terms as close as possible to first use. Sometimes it's obvious which terms need definitions, but the sneaky ones are dangerous. What do you mean by "accurate", or "optimal"? When you say two things are qualitatively similar, which qualities are you talking about? Etc.

3) It is often useful to create new vocabulary as you go, which includes notation, like p for a probability or k for a node degree. But it also includes shorthand phrases: if you find yourself saying "the distributions of users' friends' friends'" over and over, you probably need to define a short phrase or abbreviation.

4) R before I. Before you interpret a result, explain it to me in neutral terms. Read me the axes, then describe the lines, then tell me what it means. Bonus points if you have already helped me anticipate what the results might look like.

5) Transitions from one experiment to the next are important. You've answered a question, so you hit a low energy point. How do you muster energy for the next point?

6) Use bulleted lists to make organization visible. If there are three regimes, give each one a bullet, a name, and a description.

Example of R before I:

1) Figure 1 shows results from simulations with 4 values of the parameter beta.

2) Each line shows the number of people infected as a function of time.

3) For all values of beta, the infected population rises quickly, peaks, and then falls to zero.

4) But for larger values of beta, the time to reach peak infection is shorter and the height of the peak is higher.

5) To see this relationship more clearly, we can plot these two metrics -- time until peak and peak infected population -- as a function of beta. Figures 2 and 3 show these results.

6) Figure 2 shows...

Figures

Most axis labels are too small. They should be about the same size as the main text, or slightly smaller.

In a blog post, it is not always necessary to number figures. If you include them in the text flow and refer to them immediately, it can work.

However, in almost every case I saw, unnumbered figures did not work. As a reader, I did not know what I was supposed to be looking at.

So for revisions of Project 1, I am going to strongly recommend numbered figures.

Example:

Figure 2 shows the time until peak infection as a function of the infection rate, beta. As beta increases, the disease spreads more quickly and the time until peak infection gets shorter.

Page updated

Google Sites

Report abuse