Introduction
Welcome to Jiangang's python for the impatient. It is intended to give you a cookbook style introduction about reading, writing astronomical data and programming in python. It is a result of several years of painful experience of myself. I hope this can help people quickly get into the science instead of getting daunted by the "technical details".
Python is becoming popular, mainly because it is free and open-sourced. It starts mature, in my measure, around late 2009. There are many existing python tutorials. However, most of the tutorials require you to spend quite an amount of time to have a big picture of the programming language. If you already have some programming experience, you may want to speed this up, i.e., get a better big picture of the language first and then look at some details as needed. So, my following tutorial is for this purpose
Python is an interpretive (someone calls it interactive) language, meaning that you do not need to compile your codes to binary to use them. Python provides a working environment, in which you can use functions interactively. Alternatively, you can scripting in python and make it executable in your operation system. The Python environment comes with some basic functions. But most of the functions you will use are some well established and open source packages. Do not panic yet, these packages are pretty stable these days.
Getting started
There are two different version series for python at this point: python 2x and python 3x. The 3x series is NOT back compatible to the 2x series, though they are similar in most parts. The 2x series is a legacy version and the support/maintenance for this series will be terminated around 2020. The 3x version is a redesign on top of the 2x version and is advocated to be the future of Python. The catch here is that many existing applications and packages are written in 2x version. So, it is really up to you to make the decision.
As I said that the Python depends on a lot of open-source packages developed by the community. I list some of the major ones in the following table. You can find more details by googling the package names.
Now, the first realistic question is, what should I do to start. Nowadays, the Python community has made it easy to start using python via the anaconda package that aggregates most of the widely used packages mentioned above. You can download anaconda from the following link:
https://www.continuum.io/downloads
The most convenient interface to using interactive python for data analysis is the Jupyter Notebook. It provides a web interface for you to do analysis using python. It can be deployed on your local computer or cloud servers. Here is a free one for you to try out:
Python basics
0. Python script/module
Python script is a text file ending with extension name .py. In the text file, you can put the python codes line by line. There are two typical things people want to put into the python script file. The first is a collection of user-defined functions, and the second is a collection of python codes. The following is an example python script, named as my_test.py
import numpy as np
import seaborn as sbn
def generate_random(n):
x = np.random.randn(n)
return x
if __name__ == "__main__":
x = generate_random(100)
y = generate_random(200)
print x
print y
sbn.distplot(x)
sbn.plt.savefig('test.png')
A typical way to run the script from command line (terminal) is as following:
python my_test.py
1. Import libraries
At the beginning of a script, you need to tell the python environment what packages you are going to use in your script. The np and sbn are the names you choose arbitrarily to represent the package in your current session. You can choose whatever name you want. But numpy, seaborn cannot be changed to other names as they are the name of the packages.
import numpy as np
import seaborn as sbn
2. Define functions
In python, you can define your functions in the same script file as your main function, as we showed in the previous example. But you can also define functions in a separate script file and import them when you need. Here is an example of the latter. In a file named rand.py, there are the following definitions:
import numpy as np
import seaborn as sbn
def generate_random(n):
x = np.random.randn(n)
return x
def plot_random(x):
sbn.distplot(x)
return 0
There are multiple ways you can load the functions defined in the rand.py into a python session
A. Import the module
>>> import rand
>>> rand.generate_random(100)
B. If you intend to use a function often you can assign it to a local name:
>>> gen_rand = rand.generate_random
>>> gen_rand(100)
1 2 3 5 8 13 21 34 55 89 144 233 377
C. Import specific functions from the script
>>> from rand import generate_random, plot_random
>>> generate_random(1000)
D. Import all functions from the script
>>> from rand import *
>>> generate_random(10)
3. Flow Control
A. While
a, b = 0, 1
while b < 1000:
print b,
a, b = b, a+b
B. If...elif...else
x = int(raw_input("Please enter an integer: "))
if x < 0:
x = 0
print 'Negative changed to zero'
elif x == 0:
print 'Zero'
elif x == 1:
print 'Single'
else:
print 'More'
There can be zero or more elif parts, and the else part is optional. The keyword `elif' is short for `else if', and is useful to avoid excessive indentation. An if ... elif ... elif ... sequence is a substitute for the switch or case statements found in other languages.
C. For loop
a = ['cat', 'window', 'chair']
for x in a:
print x, len(x)
for i, x in enumerate(a):
print i, x
D. Break and continue:
The break statement breaks out of the smallest enclosing for or while loop. The continue statement continues with the next iteration of the loop.
4. Error handling
Python provides a mechanism to handle errors. If you want to take actions when a given type errors happen, you can use
try:
.........
.........
except (RuntimeError,TypeError,NameError):
.........
If you do not know what kind of errors will show up, you can just do not write the errors, which means all errors.
try:
.........
.........
except:
.........
If you just want to forgive a given type of errors (for example, NameError), you can
try:
.........
.........
except NameError:
.........
else:
.........
Python tips
1. Turn on the interactive mode: in ipython interface, if you want to see the graphs interactively, you probably need to turn on the interactive mode of the matplotlib.pyplot by:
import matplotlib.pyplot as pl
pl.ion()
2. To see the functions of each package (numpy, scipy, pylab, pyfits), you can use the tab key after the dot. For example,
import numpy as np
np.
In the ipython prompt and then strike the tab key. It will prompt you all the available functions. If you want to see the details of a given function, just type the function and then type a ? and then strike the tab key. For example:
np.arange?
Then, the details of the function will show up on the screen. To quit it, type
q
3. Interpolation operator % has special usage for replacing the values in a string. It is faster than using ' '+str(***)+' '+str(***).
In [33]: 'float is %f, %f, %f, %d' %(1,2,3,4)
Out[33]: 'float is 1.000000, 2.000000, 3.000000, 4'
In [34]: 'float is %s' %'good'
Out[34]: 'float is good'
In [35]: 'float is %f' %1
Out[35]: 'float is 1.0000000
4. Executable Python Scripts:
On BSD'ish Unix systems, Python scripts can be made directly executable, like shell scripts, by putting the line
#! /usr/bin/env python
at the beginning of the script and giving the file an executable mode. The "#!" must be the first two characters of the file. On some platforms, this first line must end with a Unix-style line ending ("\n"), not a Mac OS ("\r") or Windows ("\r\n") line ending. Note that the hash, or pound, character, "#", is used to start a comment in Python. The script can be given an executable mode, or permission, using the chmod command:
$ chmod +x myscript.py
By doging this, you can directly exectue your script instead of using
python myscript.py
5. Some important functions:
len() : tell the length of a string
range(): generates lists containing arithmetic progressions (note that this create only list, not array. To create array, use numpy.arange():
>>> range(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
The given end point is never part of the generated list; range(10) generates a list of 10 values, exactly the legal indices for items of a sequence of length 10. It is possible to let the range start at another number, or to specify a different increment (even negative; sometimes this is called the `step'):
>>> range(5, 10)
[5, 6, 7, 8, 9]
>>> range(0, 10, 3)
[0, 3, 6, 9]
>>> range(-10, -100, -30)
[-10, -40, -70]
Pass: This statement does nothing. It can be used when a statement is required syntactically but the program requires no action. For example:
while True:
pass # Busy-wait for keyboard interrupt
dir(): The built-in function dir() is used to find out which names a module defines. It returns a sorted list of strings.
Using R functions from Python
Many R functions are well written and they can be ported to be used within python. This is realized via the rpy2 package. Here is one example to show how to get this work.
First of all, you need to install the rpy2 package (for example, pip install rpy2). Then you start a ipython session and import the relevant package and activate it.
import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()
Next, we import the function that can import R package into the python/ipython session, called importr, by the following:
from rpy2.robjects.packages import importr
Now, set a variable to the imported R package. We will use the 'irr' package for an example.
irr = importr('irr')
What we are interested in is a function, kappa, in the irr package. The kappa function will calculate the Cohen's kappa statistics. In python, we can write a simple wrapper function around this to ensure we get the correct output.
def kappa_python(label1,label2,weight = 'square'):
'''
calculated cohen's kappa
'''
data = np.array([label1,label2])
data = data.T
res = irr.kappa2(data,weight)
return res[4][0]
Note that the wrapper function is to ensure the input and output data meet the requirements specified by the kappa function in R. Now you can test this function by:
x = [1,2,3,4,5]
y = [2,3,4,5,6]
kappa_python(x,y)