The study utilizes both technical and aesthetic metrics to evaluate the generated images. To comprehensively assess the quality and relevance of the generated architectural images, we use Image Quality Assessment metrics developed in the NIMA project. Specifically, we apply pre-trained "aesthetic" and "technical" models, and each returns a score from 1 to 10. The first model aims to address the aesthetic aspects of the image, and the second tries to evaluate the "clearness" of the picture (in terms of visual artifacts).
We used both to evaluate our approach and to show how each of its steps improves both scores. To isolate the contributions of various system components, we conducted a series of ablation studies comparing different model configurations. We used 1000 images produced by each pipeline version to evaluate average scores. The key observation from this evaluation is that each system component contributes significantly to the final result. Each step in our multi-step process contributes to the overall enhancement of the generated images. The iterative refinement stages, from initial seed generation to final image refinement, result in higher aesthetic and technical scores compared to more straightforward, single-step approaches.