The idea is to render the scene at least twice for each frame, once from every light perspective and finally from the camera perspective. For my demo, I have 1 light source, so I have two passes total. I save the result of the first pass into a framebuffer with depth (in a texture) and stencil (in a renderbuffer) attachment (this is to conform with the stencil UI option).
The first pass, the depth for each fragment from the light perspective is stored into texture. Then, in the second pass, if the depth current fragment transformed to light view is less than depth of the texture, then this fragment is in shadow.
Initialization:
1st pass:
shader.vert:
gl_position = lightSpaceMatrix*Position;//Position is the vertex position attribute. LightSpaceMatrix brings the current position to the coordinate system from light view.
nothing to note in shader.frag because we only care about the depth value being written to the texture attached to the framebuffer.
2nd pass
shader.vert
vec4 lightviewGlPos = lightSpaceMatrix*Position//bring glPos to light Space
...//pass the glPos, lightviewGlPos, viewer direction, light direction, and normal all in World coordinate system
shader.frag
out vec4 finalColour;
vec3 projCoords = lightviewGlPos.xyz /lightviewGlPos .w
if (texture(depthMap, projCoords.xy).x > projCoords .z)
finalColour = vec4(0,0,0,1);
else
finalColour = vec4((diffuse + specular)).xyz, 1.0);
It's interesting that OpenGL behind the scenes does perspective divide for built-in(implicit) inputs to shader.frag like gl_FragCoord. But user defined inputs to fragment shader are perspective correct interpolated in world coordinate. (can change this by modifying 'type qualifier' from smooth [implicit] to 'flat', etc)