Feature‑based Panoramic Image Stitching
This project implements a full panoramic image‑stitching pipeline similar to the panorama mode found in modern smartphone cameras. The system detects visual features across overlapping images, matches them, estimates geometric transformations, and blends the aligned frames into a seamless wide‑angle panorama.
The goal is to design a modular computer vision workflow capable of transforming raw image sequences into a unified panoramic scene using Python, OpenCV, and NumPy. The pipeline handles feature extraction, correspondence matching, homography estimation, and multi‑image blending to produce a smooth final output.
Figure 1. Here is a source image captured for feature detection, matching, and panorama generation.
Python
OpenCV
NumPy
SciPy
Matplotlib
Jupyter Notebook
Computer Vision
Image Geometry
Feature Matching
The panoramic stitching pipeline follows a structured, multi‑stage computer vision process. Each component contributes to transforming raw, overlapping images into a unified panoramic scene.
1. Corner Detection
Identifies keypoints in each image that are distinctive and repeatable across views.
2. Adaptive Non‑Maximal Suppression (ANMS)
Selects a balanced subset of strong, well‑distributed corners to improve matching quality.
3. Feature Descriptor Generation
Extracts local appearance information around each keypoint to enable reliable correspondence.
4. Feature Matching
Pairs descriptors across images to establish point‑to‑point relationships between frames.
5. RANSAC Homography Estimation
Computes a robust geometric transformation while rejecting outlier matches.
6. Image Warping & Blending
Aligns images into a shared coordinate frame and blends overlapping regions smoothly.
7. Multi‑Image Panorama Stitching
Sequentially merges all frames into a final, seamless panoramic output.
Figure 2. Here is a visual overview of the panoramic stitching pipeline applied to the input image sequence.
The first stage of the pipeline identifies strong, repeatable keypoints across the input images. These keypoints — often called corners — are locations in the image where intensity changes sharply in multiple directions. Corners are ideal for stitching because they are visually distinctive and easy to match across overlapping frames.
Both Harris and Shi‑Tomasi corner detectors were evaluated. While Harris is sensitive to gradient structure, Shi‑Tomasi selects points with stronger eigenvalue responses, producing more reliable and evenly distributed features for this dataset.
What Corner Detectors Do:
They find points in the image that are stable, unique, and easy to re‑identify in another photo.
Corners act as anchors for matching images together.
Good corner detection directly improves the accuracy of feature matching and homography estimation later in the pipeline.
Approach:
Convert each image to grayscale
Detect corners using OpenCV’s Shi‑Tomasi method
Visualize the detected interest points before and after ANMS
Figure 3. These are the detected Shi‑Tomasi corners and the refined keypoint set after Adaptive Non‑Maximal Suppression.
Python:
corners = cv2.goodFeaturesToTrack(
gray_image,
maxCorners=300,
qualityLevel=0.1,
minDistance=5
)
Result:
Figure 4 & 5. These are the Shi‑Tomasi corner detections overlaid on the input image.
Reliable corner detection is the foundation of the entire stitching pipeline. Strong, well‑distributed keypoints lead to better feature descriptors, more accurate matches, and a cleaner final panorama.
After detecting an initial set of corners, many of them tend to cluster in high‑texture regions while leaving large areas of the image under‑represented. Adaptive Non‑Maximal Suppression (ANMS) addresses this by selecting a subset of keypoints that are both strong and spatially well‑distributed.
ANMS works by measuring how “suppressed” each corner is by stronger neighbors. Corners that are strong and far from other strong corners are kept, while redundant ones are removed. This ensures that the final keypoint set covers the entire image rather than concentrating in a few dense regions.
A KD‑Tree structure was used to accelerate nearest‑neighbor searches, significantly improving runtime compared to brute‑force distance checks.
Goal:
Keep the strongest corners while maintaining spatial diversity across the image.
Figures 6 & 7. This is the refined keypoint set after Adaptive Non‑Maximal Suppression, ensuring strong and evenly distributed features across the image.
Evenly distributed keypoints lead to more stable feature matching and more accurate homography estimation. Without ANMS, the stitching pipeline becomes sensitive to local texture patterns and may fail to align images cleanly.
Once ANMS selects a refined set of keypoints, each point must be encoded in a way that allows it to be matched across different images. This is done by generating a feature descriptor — a compact numerical representation of the local image region around each keypoint.
Feature descriptors capture the texture and intensity patterns near a corner so that the same physical point can be recognized even under changes in lighting, orientation, or slight viewpoint shifts.
Process:
Extract a 40×40 patch centered on each keypoint
Apply Gaussian blur to reduce noise and emphasize structure
Downsample the patch to 8×8
Flatten into a 64-dimensional vector
Normalize for illumination invariance
This process produces descriptors suitable for cross-image matching.
Python:
patch = cv2.resize(blurred_patch, (8, 8))
descriptor = (patch - np.mean(patch)) / np.std(patch)
Figures 8 & 9. These are example feature descriptors generated from the refined keypoints, capturing local texture patterns for cross‑image matching.
Robust descriptors are essential for reliable feature matching. If two images share the same physical point, their descriptors should look similar—even if the images differ in brightness or slight perspective. Strong descriptors directly improve match quality, which leads to more accurate homography estimation and a cleaner final panorama.
With descriptors generated for each keypoint, the next step is to establish correspondences between images. This is done by comparing descriptors using Sum of Squared Differences (SSD), which measures how similar two local patches are. The goal is to identify pairs of keypoints that represent the same physical point across overlapping images.
To ensure reliability, a Lowe‑style ratio test is applied. Instead of accepting the single best match outright, the algorithm compares the best SSD score to the second‑best. A match is kept only if the best match is significantly better, filtering out ambiguous or unstable correspondences.
Matching Strategy:
Compute SSD between all descriptor pairs
Identify the best and second‑best matches for each keypoint
Apply a ratio threshold to reject weak or ambiguous matches
Visualize the resulting correspondences across image pairs
Python:
if best_ssd / second_best_ssd < 0.5:
good_matches.append(match)
Result:
Figures 10 & 11. These are high‑confidence feature correspondences between two images, with lines indicating matched keypoints across frames.
Accurate feature matching is critical for estimating a correct homography. Poor matches lead to geometric distortions, misalignment, or complete stitching failure. The ratio test dramatically improves match quality, ensuring that only high‑confidence correspondences are passed to RANSAC.
Even with ratio‑tested feature matches, some correspondences will inevitably be incorrect due to noise, repetitive textures, or local ambiguities. To ensure the transformation between images is stable, a custom RANSAC (Random Sample Consensus) algorithm was used to estimate a robust homography while rejecting outliers.
RANSAC repeatedly samples small subsets of matches, computes a candidate homography, and evaluates how well it aligns the remaining points. Only transformations supported by a large number of inliers are considered valid.
Method:
Randomly sample 4 matched keypoints
Compute a candidate homography
Project all points and compute SSD reprojection error
Mark points with error below a threshold as inliers
Keep the homography with the largest inlier set
Re‑estimate the final homography using only inliers
Python:
best_H = RANSAC(match_kp1, match_kp2, N=10000, t=1000.0, threshold=0.3)
Figure 12. This is a detailed visualization of the RANSAC process, showing random sampling, inlier selection, and homography estimation.
RANSAC protects the stitching pipeline from bad matches. Even if a significant portion of correspondences are incorrect, RANSAC isolates the consistent geometric structure and produces a reliable transformation. This step is essential for preventing warped, distorted, or misaligned panoramas.
After running RANSAC on real image pairs, the system produced stable homography matrices that describe how one image should be warped to align with the next. These matrices encode the projective transformation between frames, allowing the stitching pipeline to place all images into a shared coordinate system.
First estimated matrix:
[[ 5.38576198e-01 -2.34639596e-01 9.60681637e+01]
[ 7.32633209e-03 4.73562806e-01 -9.53412762e+01]
[-4.41585762e-06 -1.07649918e-03 1.00000000e+00]]
Second estimated matrix:
[[ 1.25207035e+00 1.36210576e-01 -7.32331499e+01]
[ 4.24242072e-02 1.21663376e+00 -2.91451405e+02]
[ 2.24391727e-04 3.48851141e-04 1.00000000e+00]]
These transformations were derived from real feature correspondences and represent how each image must be projected to achieve geometric alignment.
Result:
Figure 13. Feature correspondences are used to compute the final homography, showing how matched keypoints guide the geometric alignment between images.
Homography estimation is the mathematical core of panoramic stitching. Once a reliable transformation is found, the pipeline can warp images into alignment and blend them into a seamless panorama.
Once a reliable homography is estimated, the next step is to warp one image into the coordinate space of the other. Using cv2.warpPerspective, the image is projected according to the homography matrix, aligning its geometry with the reference frame.
After alignment, the overlapping regions between images rarely match perfectly. Differences in exposure, lighting, and camera motion can create visible seams. To address this, the pipeline blends the warped image with the reference image to produce a smoother transition.
A weighted alpha blend was used for this implementation, with optional support for more advanced blending techniques such as Poisson blending for improved photometric consistency.
Challenges addressed:
Exposure mismatch between frames
Photometric differences across the scene
Visible seams and edge artifacts after warping
Python:
img1_reg = cv2.warpPerspective(img1, H, (w, h))
blended_img = cv2.addWeighted(img1_reg, 0.5, img2, 0.5, 0)
Figure 14. Warped and blended images are combined into a single panoramic frame, showing geometric alignment and smooth transitions across overlapping regions.
Warping aligns the images geometrically, but blending makes them look like a single continuous photograph. Without proper blending, even a correct homography can produce harsh seams or abrupt transitions. This step is essential for producing a visually coherent panorama.
The final stage extends the pairwise stitching pipeline to handle an entire sequence of images. Instead of aligning just two frames, the system stitches each new image onto the growing panorama, updating the reference frame at every step. This transforms a raw set of overlapping photos into a single, wide‑angle composite.
Process:
Initialize the panorama with the first image
Detect features in the current panorama and the next input image
Apply ANMS to retain strong, well‑distributed keypoints
Generate normalized feature descriptors
Match descriptors using SSD and a ratio test
Estimate a robust homography with RANSAC
Warp the new image into the panorama’s coordinate space
Blend the images to maintain smooth transitions
Repeat for all remaining frames
Python:
imgs = [img1, img2, img3]
pano_imgs(imgs)
Step‑by‑step visualization of the multi‑image stitching pipeline, including corner detection, ANMS refinement, and feature matching across the image sequence.
Multi‑image stitching compounds all earlier challenges—feature consistency, geometric alignment, exposure differences—and requires the pipeline to remain stable across many iterations. A well‑designed system produces a coherent panorama even when the input images vary in lighting, perspective, or overlap.
Evaluated Harris vs. Shi‑Tomasi corner detection and selected the more stable, evenly distributed keypoints
Optimized ANMS using KD‑Tree acceleration to avoid brute‑force distance checks
Resolved descriptor‑alignment bugs that were degrading match accuracy
Reduced false correspondences through SSD matching with a Lowe‑style ratio test
Applied RANSAC to reject outliers and compute robust homographies
Successfully warped, blended, and stitched multiple images into a unified panoramic output
This project highlights practical, end‑to‑end computer vision engineering, including:
Designing a complete feature‑based stitching pipeline
Implementing corner detection and spatial keypoint filtering
Extracting and normalizing custom feature descriptors
Performing reliable feature correspondence matching
Estimating robust geometric transformations using RANSAC
Warping and blending images into a coherent panorama
Debugging complex vision algorithms and optimizing performance
Applying OpenCV and NumPy in a real‑world vision workflow