Sustainability has become a paramount concern in contemporary architecture and construction, driving the adoption of various frameworks to evaluate and enhance the environmental performance of buildings. Among these, the Building Research Establishment Environmental Assessment Method (BREEAM) stands out as a leading international certification system, particularly influential in the UK and Europe. This study leverages advanced data science techniques to scrape, process, and analyze approximately 40,000 BREEAM-certified assessments from the GreenBookLive database. The objective is to uncover temporal and spatial trends, sectoral distributions, and regional concentrations of certified buildings. By doing so, the research provides actionable insights for architects, developers, and policymakers, supporting informed decision-making aligned with the UK’s ambitious net-zero targets.
Key findings indicate a significant increase in high-level certifications (“Excellent” and “Outstanding”) over the past five years, with the UK and the Netherlands leading certification efforts. The geographic distribution of BREEAM certifications reveals these countries’ pivotal role in driving sustainability standards within Europe, while adoption is increasing in regions such as Asia (e.g., China) and the Middle East (Figure 1). Emerging clusters in the Netherlands reflect its strong commitment to sustainability, while the commercial and residential sectors dominate certifications across other regions. These insights underscore the importance of targeted policies and incentives to promote sustainable practices in underrepresented areas, contributing to the broader goal of environmental sustainability in the built environment
Figure 1: Global map showing BREEAM-certified project locations, colour-coded by use class (e.g., Residential, Commercial).
The study employs a comprehensive methodology encompassing data extraction, cleaning, analysis, and visualization to achieve its objectives. The following sections outline the detailed processes involved:
The data extraction process involved sophisticated web scraping techniques with Python to retrieve BREEAM certification data from the GreenBookLive database. Given the website’s dynamic nature, a combination of HTTP requests and browser automation via Selenium was employed to effectively interact with and extract data from dynamically loaded content. Selenium is a web automation tool that allows programmatic interaction with web browsers, simulating user actions such as clicking and scrolling, which is essential for scraping dynamic websites.
The dataset comprises information on BREEAM-certified buildings, detailing certifications, ratings, locations, and other attributes. Below is the detailed schema of the dataset:
| Field Name | Data Type | Description |
|---|---|---|
| Building/Asset Name | String | Name of the certified building or asset |
| Client/Developer | String | Name of the client or developer |
| Scheme | String | Certification scheme applied to the building (e.g., In-Use International) |
| Rating/Score | String | Descriptive rating and score for the building (e.g., Very good 58.7%) |
| Rating | String | Final rating achieved (e.g., Very good, Good, Pass) |
| Score | Float | Numerical score achieved in the certification (e.g., 58.70%) |
| Stage/Valid Until | Date | Date indicating until when the certification is valid |
| Certification Number | String | Unique certification number assigned to the building |
| Assessor/Auditor | String | Name of the assessor or auditor responsible for the certification |
| Town/Postcode/Zipcode | String | Town and postcode/zipcode where the building is located |
| Country | String | Country where the building is located |
| NSO | String | National Scheme Operator overseeing the certification |
| Other Information | String | Additional information regarding the asset or project |
| Project Type | String | Type of project (e.g., Offices, Retail, Industrial) |
| Rating (%) | Float | Certification rating expressed as a percentage |
| Latitude | Float | Geographical latitude of the building |
| Longitude | Float | Geographical longitude of the building |
Table 1: Data Schema of BREEAM-Certified Buildings Dataset.
- `requests` for handling HTTP requests and fetching static content.
- `lxml` for parsing HTML and extracting data fields.
- `Selenium` with the Chrome WebDriver for automating browser interactions and handling JavaScript-rendered content.
- `Pandas` for data cleaning, manipulation, and analysis.
- `Folium` and `GeoPandas` for geospatial analysis and visualization.
Implementation Steps:
lxml.Once the raw data was collected, extensive preprocessing was necessary to ensure accuracy and consistency, enabling reliable analysis.
Exploratory Data Analysis was conducted to uncover patterns, trends, and insights within the BREEAM certifications dataset.
Analysis: Time-series analysis tracked annual changes in BREEAM certification ratings over the past ten years, segmented by rating levels (Pass, Good, Very Good, Excellent, Outstanding).
Visualization: Pie charts for each year depict the proportion of certifications across different rating levels from 2013 to 2022. A noticeable trend is the steady increase in higher certification levels, particularly Outstanding and Excellent ratings. These higher-level certifications have grown significantly over time, reflecting an industry-wide shift toward achieving greater sustainability performance in buildings. Early years, such as 2013 and 2014, were dominated by “Very Good” and “Good” ratings, while by 2022, a larger portion of certifications had moved toward “Excellent” and “Outstanding” ratings (Figure 2).
Figure 2: Pie chart depicting the number of office BREEAM certifications in the UK over each year from 2013 to 2022 colour-code by rating.
The bar charts illustrate the number of “Outstanding” BREEAM certifications in the Office and Industrial sectors over time. The charts break down certifications by country, providing clear insights into the trends across different regions and building types.
Analysis:
The office sector shows a consistent increase in the number of Outstanding certifications from 2012 to 2024 (Figure 3). The United Kingdom dominates the chart, consistently maintaining the highest number of certified assessments each year. Countries such as the Netherlands, France, and Poland also show notable growth, contributing to the overall increase in certifications in recent years.
Findings:
A sharp rise in certifications is observed in 2021, with the total number reaching its highest in 2023. This could be linked to stronger sustainability regulations and corporate commitments in these regions. The chart also shows that certifications are expanding beyond just the UK, with countries like Romania and Russia achieving certifications by 2022 and 2023, respectively.
Implication:
The increase in certifications across multiple countries suggests that sustainability standards are becoming more prevalent across Europe. The UK’s sustained leadership in certifications points to ongoing governmental and corporate efforts to adhere to green building standards. Emerging countries on the list also indicate that BREEAM certifications are expanding geographically, showcasing broader adoption of sustainability in the office sector.
Figure 3: Bar chart showing the number of “Outstanding” certifications in the Office sector.
Analysis:
In the industrial sector, the Netherlands and the United Kingdom lead the certifications, with the Netherlands showing a clear dominance since 2019 (Figure 4). The UK follows closely behind, demonstrating substantial growth in the number of certified industrial buildings. Countries such as Slovakia, Sweden, and Czech Republic also feature in the certifications, though in much smaller numbers compared to the top two.
Findings:
The number of “Outstanding” certifications peaked in 2023, reflecting the growing importance of sustainability in the industrial sector. The Netherlands leads the charge, contributing to the majority of certifications from 2019 to 2024. The rise in certifications in this sector suggests that industrial buildings, which were traditionally less focused on sustainability, are now being brought into the fold of sustainable building practices.
Implication:
The growing number of certifications in the industrial sector highlights the increasing focus on sustainability in traditionally energy-intensive industries. The leadership of the Netherlands in certifications reflects strong regulatory frameworks and industrial initiatives to reduce environmental impact. Other countries joining the trend may indicate a broader European commitment to sustainable industrial growth.
Figure 4: Bar chart showing the number of “Outstanding” certifications in the Industrial sector.
Overall Insights:
The data was segmented by building type and region to uncover sector-specific trends and geospatial patterns. The analysis focused on commercial, residential, industrial, and mixed-use buildings across different regions, allowing for detailed insights into how sustainability certifications vary by sector and geography.
Interactive maps were generated using Folium and GeoPandas to display the density and spatial distribution of BREEAM-certified buildings. Commercial buildings in London and the South East demonstrated particularly high certification rates.

Figure 5: Detailed map of London showing BREEAM-certified project locations, color-coded by use class.
The distribution of certifications across the UK and Europe was further explored, revealing sector-specific trends. The maps showed that while London and the South East lead in commercial certifications (Figure 6), Scotland, the North West, and regions across Europe have seen increasing certifications in residential, mixed-use, and industrial projects.

Figure 6: Detailed map of Europe and the UK showing BREEAM-certified project locations, colour-coded by use class.
To gain deeper insights into the relationships between building type, certification levels, and geographic distribution, unsupervised machine learning techniques such as K-Means Clustering and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) were applied. These methods helped reveal patterns and form distinct groups within the dataset based on features such as certification level, location, and building type.
Figure 7: Clustering of BREEAM-Certified Projects Across Europe and the UK Using K-Means and DBSCAN, Color-Coded by Use Class
Objective:
The goal of K-Means clustering was to group buildings based on their certification rating, type, and geographic location. By identifying distinct clusters, the analysis aimed to reveal patterns in certification performance and geographic concentration.
Feature Engineering:
Key features used in the clustering model included:
These features were standardized to ensure uniformity across the dataset, ensuring that no single feature disproportionately influenced the clustering results.
Cluster Determination:
The optimal number of clusters (k) for K-Means was identified using several techniques:
Figure 8: Elbow Method for Determining Optimal Number of Clusters (k).
Figure 9: Silhouette Score indicating the cohesion and separation of clusters.
Figure 10: Calinski-Harabasz Index representing cluster compactness.
These methods helped ensure meaningful and distinct groupings of the buildings based on their certification, type, and location.
Figure 11: Davies-Bouldin Index measuring cluster separation.
Objective:
While K-Means is useful for partitioning data into a fixed number of clusters, DBSCAN was applied to identify clusters of varying density, particularly useful for spatial data. It allowed for the discovery of clusters without needing to predefine the number of clusters, making it well-suited for detecting patterns in geographic data.
Parameter Tuning:
Figure 12: Noise Points vs eps for Different min_samples in DBSCAN.
Figure 13: Number of Clusters vs eps for Different min_samples in DBSCAN.
Inertia:
For K-Means, inertia (the sum of squared distances of samples to their closest cluster center) was minimized to ensure tight clustering. Lower inertia values correspond to better-fitting clusters but are evaluated in combination with other metrics to avoid overfitting.
Insights:
The clustering analysis provided the following key insights:
High-performing commercial buildings were concentrated in central London, reflecting the city’s leadership in sustainability-driven development. These clusters consisted of projects with higher BREEAM ratings such as “Excellent” and “Outstanding.”
Emerging residential clusters were identified in northern regions, particularly in Manchester and Liverpool, where large-scale housing developments are undergoing certification. These clusters reflected the increasing focus on sustainable residential buildings.
Public infrastructure projects, including schools and hospitals, were predominantly clustered in Scotland and Wales. This reflects regional government initiatives promoting sustainability through public sector investment in green infrastructure.
DBSCAN Clusters: The DBSCAN algorithm detected geographic clusters of varying densities, with prominent clusters forming around major urban areas, while less dense regions like rural areas showed fewer certifications. It also effectively identified isolated projects in regions undergoing urban development without forcing them into arbitrary cluster boundaries, as can happen with K-Means.
In summary, the combination of K-Means and DBSCAN, alongside the use of multiple evaluation metrics such as the Elbow Method, Silhouette Scores, Calinski-Harabasz Index, Davies-Bouldin Index, and inertia, provided robust and insightful clustering results. These insights help identify regions and sectors leading in sustainability and reveal geographic trends in building certification that can inform future urban planning and policy development.
The project involved scraping structured data from a relatively simple website, but there were a few notable challenges related to data collection and inconsistencies, particularly for non-UK/European projects.
Dynamic Content Loading:
The website featured pagination and links to individual assessment pages for each project, which required merging the main page data with additional data from the detailed assessment pages. Selenium was used to handle this process, automating browser interactions to fully load dynamic content. Selenium’s WebDriver allowed interaction with JavaScript elements, ensuring that pagination and links were accessed correctly.
Data Collection:
The main challenge was the need to scrape multiple pages and combine data across assessments. The links to individual assessments contained additional information that was merged with the summary data on the primary pages. Using lmxl and Requests, the data was parsed and stored efficiently. Pagination was handled by iterating over pages and collecting all necessary links to the individual project pages.
Data Inconsistencies:
Many projects located outside of the UK/Europe had inconsistent or missing location data, especially latitude and longitude information. For over 40,000 locations, addresses were geocoded using geopy to retrieve approximate coordinates based on the available addresses. The accuracy of these coordinates depended on the quality of the address data, which was sometimes incomplete or ambiguous for certain international projects.
Ethical Compliance:
The scraping process adhered to the terms set in GreenBookLive’s robots.txt file and terms of service. The scraping rate was throttled to avoid overloading the server, and no sensitive or non-public data was accessed. Data anonymization was applied where necessary.
This study demonstrates the efficacy of data-driven approaches in understanding the dynamics of sustainable construction through the lens of BREEAM certifications. The significant growth in high-level certifications underscores the industry’s commitment to environmental sustainability, while regional and sectoral analyses highlight areas of strength and opportunities for targeted policy interventions. As the UK progresses toward its net-zero goals, these insights provide valuable guidance for architects, developers, and policymakers in fostering sustainable building practices. Future research directions include expanding the analysis to other global sustainability frameworks, assessing the economic and human-centric impacts of certifications, and integrating Geographic Information Systems (GIS) for more nuanced spatial analyses.