During my internship with Liro-Hill's Virtual Design and Construction (VDC) group, I had the opportunity to contribute to a significant strategic initiative: the creation of a proprietary, in-house product for advanced project management. The overarching goal of the VDC team was to develop a sophisticated viewer that would offer an immersive, panoramic walkthrough of construction sites, utilizing a timeline of data collected with a NAVVIS scanner.Â
My primary role within this initiative was to formulate and develop Fixture Finder, a specialized object segmentation model designed to enhance this platform's capabilities. The core task was to create a system that could automatically analyze the panoramic site imagery and accurately identify custom Mechanical, Electrical, and Plumbing (MEP) fixtures. To accomplish this, I developed a fine-tuned Vision Transformer (ViT) model, a cutting-edge approach to computer vision.
Upon completion of a site scan, the model processes the images and generates a detailed report cataloging the count and description of all identified fixtures. This automated inventory is invaluable for progress tracking, quality control, and as-built verification.
The computational framework of the project is a multi-stage pipeline designed to integrate immersive visualization with artificial intelligence for site analysis. The system's inputs consist of Scanned Data from the site (including point clouds and raw imagery) and the real-time Position and orientation of the user. This information is processed by the VIEWER module, which renders the user's current panoramic view, a corresponding 2D map, and depth maps. Concurrently, these panoramic images are fed into the FIXTURE FINDER module. At its core is a Vision Model that analyzes the imagery to produce predictions, identifying the names, locations, and counts of MEP fixtures. The final Outputs synthesize this information, presenting the user with an immersive panoramic view overlaid with interactive tags that identify detected objects, while also providing aggregated data like the total count of fixtures on the current floor. This creates a seamless flow from raw site data to an interactive, intelligent project management environment.
The project explored three distinct computational workflows to achieve comprehensive object identification. The first workflow involved leveraging existing, pre-trained object detection models with frozen weights to identify common fixtures. For this approach, we considered objects that are typically well-represented in large-scale datasets, such as fire hydrants, toilets, and sinks. The second and third workflows focused on developing a more customized solution by either fine-tuning a single large vision model or training multiple smaller, specialized models. This advanced approach was necessary to accurately segment custom objects and site-specific MEP fixtures not found in standard datasets. The objects targeted for this custom segmentation included doors, windows, fire extinguishers, plugs, and sockets, necessitating the creation of a new, fine-tuned model to meet the project's specific needs.
To develop the fire extinguisher segmentation model, a custom dataset was meticulously created. The process began by gathering 400 high-resolution (1920 x 1080) synthetic images of fire extinguishers. To ensure the model's robustness, these images were sourced from a variety of realistic contexts, including offices, laboratories, educational institutions, and public spaces. For each of these original images, a corresponding ground truth segmentation mask was manually created, precisely outlining the pixels belonging to the fire extinguisher.
This curated dataset of 200 image-mask pairs was then used to fine-tune a YOLO model. The model was trained for 400 epochs, allowing it to learn the specific features and shapes of fire extinguishers from the provided examples. The resulting fine-tuned model was capable of accurately identifying and segmenting fire extinguishers in environments similar to the training data. However, a key observation was that the model tended to memorize the dataset due to its limited size, indicating that performance on entirely new and unseen images could be constrained without further data augmentation or a larger training set.
Building on the success of the fixture identification system, the project was taken a step further. Its scope was expanded to include a concrete crack detection module, transforming Fixture Finder into a more comprehensive tool for monitoring site integrity and safety. This extension demonstrated the model's adaptability and increased the overall value of the in-house management platform.
For the crack detection model, a novel synthetic dataset generation pipeline was established. Utilizing Blender, a parametric system was developed to render realistic cracks onto concrete textures, creating a scalable method for data creation. This process generated a dataset of 1000 RGB image-mask pairs, each at a resolution of 256x256. The original image depicted a clean concrete surface, while the corresponding ground truth mask precisely outlined the synthetically generated crack. This dataset was then used to fine-tune a YOLO model over 50 training epochs. The resulting model demonstrated proficiency in identifying cracks on surfaces similar to the training data. However, when tested on real-world images for inference, it was noted that the model's performance was limited due to the lack of variety in the synthetic dataset, highlighting an area for future improvement through more diverse data generation.