Unlike Ray tracing which makes use of only World Coordinate System, Model Coordinate System (for hierarchical models) and Screen Coordinate System, Rasterization uses a few more.
A model in model coordinate system, is transformed in the following order
1. World Coordinate System Transformation
2. View Coordinate System Transformation-change the coordinate
3. Projection Coordinate System Transformation
4. Normalized Device/Screen Coordinate System Transformation
5. Device/Screen Coordinate System Transformation
As programmers, we specify explicitly 1,2, and 3. OpenGL implicitly performs 4 and 5.
Q1: How do you allow user to perform direct manipulation such as drag to move on the model shown on screen so that the model follows the mouse?
A1:You want to transform screen space mouse delta movement back to World Coordinate System.
Otherwise, the model will move too slowly compared to mouse movement if the model is farther than the screen would be or the model will move too fast compared to the mouse if the model is closer than the screen.
To perform this transformation without knowing the matrices for 4,5, is to use similar triangles to find where the mouse movement on screen plane (say P6 -> P3) would be compared to the center of the sphere(P5).
movement according to the center of sphere = (Eye->P5)/(Eye->P6) * P6->P3
For 3D picking, it is important to realize that a mouse coordinate lying between P3 to P4 is a point lying between P1 to P2 on the sphere assuming it lies within the other cross section view.
The psuedocode:
smaller x = (Eye->P5)/(Eye->P6) * X
Then apply inverse tranform to transform the sphere centered around 0,0,0 with radius 1 on the smaller x.
Finally check if x lies within the sphere. If it is, save this distance to the eye. Do this for all objects and find the object with the smallest distance to the eye.
Here is a video of 3D picking implemented in my rasterization engine.