import pandas as pd
import datashader as ds
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

import datashader.transfer_functions as tf
import seaborn as sns

from IPython.display import HTML, display
import pprint

from tqdm import tqdm
import matplotlib.pyplot as plt
from geolib import geohash as gh
from urllib.request import urlopen
import json
import plotly.express as px
import geopandas as gpd
import boto3
import pandas as pd
import numpy as np
import pyspark
import dask.dataframe as dd
from pyspark import SparkContext
import os
import dask 

from dask.diagnostics import ProgressBar
ProgressBar().register()
warnings.filterwarnings('ignore')

pp = pprint.PrettyPrinter(indent=4, width=100)


HTML('''
<style>
.output_png {
    display: table-cell;
    text-align: center;
    vertical-align: middle;
}


</style>


<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the raw code."></form>
''')

GSOD: Identifying Opportunities using Weather Data ¶

EXECUTIVE SUMMARY

Weather has a huge impact on a country’s economic progress. Based on a study by Henseler and Schumacher, weather has a large impact on the factors of productions of countries [1]. They also showed empirical evidence that high levels of temperature have a negative impact on economic growth and productivity. Thus, carefully examining weather data should be an imperative for key industries in order to provide value to their communities. Fortunately, there’s an abundance of weather data available (General Surface Summary of Day – GSOD Dataset) from the National Centers for Environmental Information (NCEI), a US government sub-branch that focuses on archiving environmental data. However, the data they provide can be overwhelming due to the sheer volume and wide range of features it contains. Thus, deriving insights from this dataset can be quite challenging. In this project, we ask the question “How can we derive valuable insights to various industries from the GSOD Dataset?”. By utilizing Dask clusters and various python visualization libraries, we dig deep into the GSOD dataset and provide visualizations that provide value to its audience. To accomplish this task, we conducted the following steps:

Setup dask cluster in AWS. For this exercise, we used 3 workers with 8GB memory each.
Read the GSOD dataset (dated from 2000 to 2016) from the public AWS s3 bucket, which is around 8.2 GB raw data.
Preprocess and clean the data. Some of the countries have inconsistent country codes. This was manually cleaned to ensure that the results of the plots are correct.
Perform EDA on the dataset. We used heatmaps and bar plots to derive insights from the weather dataset.
Analyze and discuss results from the EDA.

After analyzing the results, we derived a wide range of insights enumerated below:

Weather stations are usually strategically located in coastal areas to maximize the number of measurements it takes.
The number of weather stations are correlated to the landmass and level of development of a country.
Some countries have significantly higher max temperature recorded compared to others, which exposes them to more health, economic, and environmental risk.
Countries in the northern part of the globe have significantly higher recorded windspeed. This makes development of wind power projects more palatable compared to countries that have low windspeed.
Coastal areas in the Philippines presents a great opportunity to develop new wind power projects. Developers can use the data from the weather station as a baseline in identifying areas that have great opportunities before conducting their feasibility studies.
Lastly, we identified countries that have the most exposure in terms of adverse weather conditions. These countries should develop resilient infrastructure, implement proper disaster risk mitigation programs, and ensure insurance coverage to minimize the potential impact of these adverse weather conditions.

However, the analysis we provided are only descriptive in nature. Thus, the following recommendations can be implemented to enrich its value:

Develop a machine learning model which forecasts the weather condition of a specific area based on the historical records of nearby weather stations.
Integrate other environmental or economic data to derive more meaningful insights that provides value to a wider range of audience.
Evaluate effects of environmental policies of a specific country to the weather condition of surrounding weather stations. The hope is that extension of this project could enable policy makers and businesses to make the necessary decisions that provides sustainable growth.

INTRODUCTION

Weather has a large effect on a nation’s economy and its various industries [1]. Renewable energy developers may prefer developing new wind farms in places that have consistently high windspeed. Insurance companies may place higher premiums to places that have frequent hailstorms. Lastly, adverse weather conditions can affect agricultural yield, which inflates the prices of crops. These examples are just the tip of how weather can affect our daily lives. Thus, carefully examining weather data should be an imperative for key industries in order to provide value to their communities. Fortunately, there’s an abundance of weather data available (General Surface Summary of Day – GSOD Dataset) from the National Centers for Environmental Information (NCEI), a US government sub-branch that focuses on archiving environmental data. However, the data they provide can be overwhelming due to the sheer volume and wide range of features it contains. Thus, deriving insights from this dataset can be quite challenging.

In this project, we ask the question “How can we derive valuable insights to various industries from the GSOD Dataset?”. By utilizing Dask clusters and various python visualization libraries, we dig deep into the GSOD dataset and conduct an exploratory data analysis, which provides valuable insights to its audience.

BUSINESS VALUE

Weather has multiple implications to our society. Weather conditions serve as a constraint to the decision making of key entities in our society. Thus, deriving key insights from weather data would provide value to the following:

Government Institutions would benefit from the insights gained from this exploratory data analysis. Weather conditions in their specific country can guide them on how to implement policies and regulations that would result to the best outcome for their communities. For example, they could provide subsidies to farmers if the weather conditions are unfavorable in terms of growing crops.
Businesses gain insights on how to evaluate business risk based on the weather data. For example, an insurance company can develop products that provide insurance coverage to adverse weather conditions.

DATA DESCRIPTION

The dataset used for this study is available in Registry of Open Data on AWS, a collection of daily weather measurements from 9000+ weather stations around the world. For this study, 57.7 million daily weather observations from years 2000 to 2016 were covered which resulted to an input dataset of size 8.2 GB. Table 1 describes the variables used from the dataset.

Table 1. Data Description

Data Field	Description
ID	Unique ID of the weather station
Country_Code	Country Code of country the weather station is located
Latitude	Latitude value of the station location
Longitude	Latitude value of the station location
Year	Year the observation was taken
Month	Month the observation was taken
Day	Day the observation was taken
Mean_Temp	Mean temperature for the day in degrees Fahrenheit to tenths.
Mean_Dewpoint	Mean dew point for the day in degrees Fahrenheit to tenths.
Mean_Visibility	Mean visibility for the day in miles to tenths.
Mean_Windspeed	Mean wind speed for the day in knots to tenths.
Max_Windspeed	Maximum sustained wind speed reported for the day in knots to tenths.
Max_Temp	Maximum temperature reported during the day in Fahrenheit to tenths.
Min_Temp	Minimum temperature reported during the day in Fahrenheit to tenths.
Fog	Indicators (1 = yes, 0 = no/not reported) for the occurrence during the day
Rain_or_Drizzle	Indicators (1 = yes, 0 = no/not reported) for the occurrence during the day
Snow_or_Ice	Indicators (1 = yes, 0 = no/not reported) for the occurrence during the day
Hail	Indicators (1 = yes, 0 = no/not reported) for the occurrence during the day
Thunder	Indicators (1 = yes, 0 = no/not reported) for the occurrence during the day
Tornado	Indicators (1 = yes, 0 = no/not reported) for the occurrence during the day

METHODOLOGY

To obtain insights from the weather data available in the GSOD dataset, we extract it from public Amazon Web Services (AWS) S3 bucket. Then, we pre-process and clean the data to prepare it for our exploratory data analysis. In this section, we outline the methodology that needs to implemented to conduct the exploratory data analysis.

Setup dask cluster in AWS. For this exercise, we used 10 workers with 8GB memory each.
Read the GSOD dataset (dated from 2000 to 2016) from the public AWS S3 bucket, which is around 8.2 GB raw data.
Preprocess and clean the data. Some of the countries have inconsistent country codes. This was manually cleaned to ensure that the results of the plots are correct.
Perform EDA on the dataset. We used heatmaps and bar plots to derive insights from the weather dataset.
Analyze and discuss results from the EDA.

1. Data Extraction¶

A subset of the huge GSOD dataset was extracted and processed from its AWS S3 bucket using a dask cluster of 10 dask workers. Total size of data extracted was 8.32 GB spanning 17 years worth of Global Surface Summary of the Day observations. Multiple CSV files (184,608 csv files) were read and then stored as a dask dataframe to enable efficient and parallelized computing. A sample of the extracted data is displayed in Table 2.

# !aws s3 ls s3://aws-gsod/2015/ --no-sign-request --summarize --human-readable --recursive | grep Size

def get_size(bucket, path):
    """Return the size of the files in the path"""
    s3 = boto3.resource('s3')
    my_bucket = s3.Bucket(bucket)
    total_size = 0

    for obj in my_bucket.objects.filter(Prefix=path):
        total_size = total_size + obj.size

    return total_size/1024/1024/1024


sizes = [get_size('aws-gsod', str(year)) for year in range(2000,2017)]

total_size = np.sum(sizes)
print("Input dataset size is {} GB.".format(total_size))

Input dataset size is 8.233081535436213 GB.

from dask.distributed import Client


client=Client('3.23.216.58:8786')
client

aggregate = ({'Mean_Temp' : 'mean', 
             'Mean_Dewpoint': 'mean',
             'Mean_Visibility' : 'mean',
             'Mean_Windspeed' : 'mean',
             'Max_Windspeed' : 'max',
             'Max_Temp' : 'max',
             'Min_Temp' : 'min',
             'Fog': 'sum',
             'Rain_or_Drizzle': 'sum',
             'Snow_or_Ice': 'sum',
             'Hail': 'sum',
             'Thunder': 'sum',
             'Tornado': 'sum'})

@dask.delayed
def read_file(path):
    """Read from path and return dask dataframe"""
    
    columns = ['ID', 'Country_Code', 'Latitude', 'Longitude','Year', 'Day', 
               'Month', 'Mean_Temp', 'Mean_Dewpoint', 'Mean_Visibility', 
               'Mean_Windspeed', 'Max_Windspeed', 'Max_Temp', 'Min_Temp',
               'Fog', 'Rain_or_Drizzle', 'Snow_or_Ice', 'Hail', 
               'Thunder', 'Tornado']
    dtypes = {
            'ID'              : 'object',
            'Country_Code'    : 'object',
            'Latitude'        : 'float32',
            'Longitude'       : 'float32',
            'Year'            : 'int16',
            'Month'           : 'int16',
            'Day'             : 'int16',
            'Mean_Temp'       : 'float32',
            'Mean_Dewpoint'   : 'float32',        
            'Mean_Visibility' : 'float32',
            'Mean_Windspeed'  : 'float32',
            'Max_Windspeed'   : 'float32',
            'Max_Temp'        : 'float32',
            'Min_Temp'        : 'float32',
            'Fog'             : 'uint8',
            'Rain_or_Drizzle' : 'uint8',
            'Snow_or_Ice'     : 'uint8',
            'Hail'            : 'uint8',
            'Thunder'         : 'uint8',
            'Tornado'         : 'uint8'
            }
    
    return dd.read_csv(path, 
                       usecols=columns, 
                       assume_missing=True,
                       storage_options={'anon': True},
                       dtype=dtypes)  

@dask.delayed    
def aggregate_yearly(df):
    "Return yearly aggregated weather observations"
    
    aggregate = ({'Mean_Temp' : 'mean', 
                 'Mean_Dewpoint': 'mean',
                 'Mean_Visibility' : 'mean',
                 'Mean_Windspeed' : 'mean',
                 'Max_Windspeed' : 'max',
                 'Max_Temp' : 'max',
                 'Min_Temp' : 'min',
                 'Fog': 'sum',
                 'Rain_or_Drizzle': 'sum',
                 'Snow_or_Ice': 'sum',
                 'Hail': 'sum',
                 'Thunder': 'sum',
                 'Tornado': 'sum'})
    
    
    index = ['ID','Year','Country_Code', 'Latitude', 'Longitude']
    agg_records = df.groupby(index).agg(aggregate).reset_index()
    
    return agg_records

# Read data from S3 and parallelize read and monthly aggregation
all_data = []
df_agg = []
start=2000
end=2016

for i in range(start, end+1):  
    path = 's3://aws-gsod/'+str(i)+'/*'  
    data = read_file(path)
    if i == 2000:        
        df_agg = aggregate_yearly(data)
        all_data = data.copy()
    else:        
        df_agg = df_agg.append(aggregate_yearly(data))
        all_data = all_data.append(data)
        
# Convert dask-delayed to dask dataframe       
df_yearly = df_agg.compute().persist()

Table 2. Sample Raw Data

df_yearly.head()

2. Data Cleaning and Preprocessing¶

Data was cleaned by updating incorrect country codes, and adding a new column corresponding to the 3-digit ISO Code of the station's country of location. After cleaning, the data was saved in a csv format for easier retrieval in the future. Data cleaning was only applied to yearly aggregated data (Table 3) as this is the scope of our analysis.

Table 3. Sample Cleaned Data

from urllib.request import urlopen
import json

# Retrieve ISO Codes Files to be used for conversion 
url = ('https://gist.githubusercontent.com/tadast/8827699/raw/'
        'f5cac3d42d16b78348610fc4ec301e9234f82821/'
        'countries_codes_and_coordinates.csv')
    
with urlopen(url) as response:
    iso = pd.read_csv(response, 
                     usecols=['Country', 'Alpha-2 code', 'Alpha-3 code'])
iso['Alpha-2 code'] = iso['Alpha-2 code'].apply(lambda x: x[2:4])
iso['Alpha-3 code'] = iso['Alpha-3 code'].apply(lambda x: x[2:5])
iso = iso.iloc[iso[['Alpha-2 code','Alpha-3 code']].drop_duplicates().index]
 

def clean_data(df, iso):
    """Return cleaned pandas dataframe"""
    
    columns = ['ID','Country', 'Alpha-3 code', 'Country_Code', 
               'Latitude', 'Longitude','Year',
               'Mean_Temp', 'Mean_Dewpoint', 'Mean_Visibility', 
               'Mean_Windspeed', 'Max_Windspeed', 'Max_Temp', 'Min_Temp',
               'Fog', 'Rain_or_Drizzle', 'Snow_or_Ice', 'Hail', 
               'Thunder', 'Tornado']

    
    # Replace ISO code of miscoded countries
    clean_country = {'RS' : 'RU', 'AS' : 'AU', 'JA' : 'JP', 
                     'UK' : 'GB', 'SW' : 'IS', 'SF' : 'GN',
                     'MG' : 'MN', 'SU' : 'SD', 'AG' : 'DZ',
                     'OD' : 'SS', 'TX' : 'TM', 'TU' : 'TR',
                     'RP' : 'PH', 'WA' : 'NA', 'PD' : 'PP',
                     'BC' : 'BW', 'MA' : 'MG', 'ZI' : 'ZM',
                    }
    
    df = df.replace({'Country_Code': clean_country})
    
    #conver ISO-2 code to ISO-3, add country name
    df = (df.merge(iso, how='left', left_on='Country_Code', 
                  right_on='Alpha-2 code').drop(columns=['Alpha-2 code']))    
    #rearrange rows
    df= df[columns]
    
    return df.compute()

dfclean_yearly = clean_data(df_yearly, iso)
dfclean_yearly.head()

# Save yearly observations in one file for easier retrieval
dfclean_yearly.to_csv('yearlygsod2000-2016.csv')

3. Exploratory Data Analysis¶

According to WMO, there over 10,000 crewed and automatic surface weather stations. Upper-air stations, ships, moored and drifting buoys, hundreds of weather radars, and specially equipped commercial aircraft are deployed to measure critical parameters of the atmosphere, land, and ocean surface every day. Figure 1 shows how the weather stations are located around the world. Based on the plot, we observe that weather stations are prioritized to be placed on coastal areas to measure sea-level pressure and other weather parameters. Another critical observation is that weather stations are positioned on remote islands to monitor the weather conditions in remote places. Data from these remote weather stations can forecast the weather condition that might happen to other parts of the world. Lastly, it can also be noted that some countries have a higher weather station density compared to others. This may imply that some countries have more resources to invest in weather stations.

# Get unique locations of weather stations
dfloc = (dfclean_yearly[['ID','Country_Code', 
                         'Country', 'Latitude', 'Longitude']]
                                .drop_duplicates())

fig = px.scatter_mapbox(dfloc, lat="Latitude", lon="Longitude", 
                        color="Country_Code", 
                        zoom=0)

fig.update_layout(mapbox_style='open-street-map', 
                  mapbox_center = {"lat": 15, "lon": 0},
                  mapbox_zoom=0.55)

fig.update_layout(margin={'r':0,'t':0,'l':0,'b':0},
                 autosize=False,
                 width=1000,
                 height=500)

fig.show(renderer='notebook')

Figure 1. Weather Station Locations Across the World ¶

#Get top 10 countries with most number of weather stations
top10stations = (dfloc.groupby('Country')['ID'].count()
                    .sort_values(ascending=False)[:10].reset_index())

plt.figure(figsize=(10,5))
sns.barplot(x='ID',y='Country',data=top10stations)
plt.xlabel("Count of Weather Stations")
plt.title("Top 10 countries with most number of weather stations");

Figure 2. Top Countries based on Weather Station Counts¶

The top countries with the most number of weather stations are displayed in Figure 2. We can observe that the country with the most weather station is the United States of America. Other nations with large landmass are also on the top list, such as Canada, Russia, and Australia. Based on the plot above, the number of weather stations is primarily driven by landmass and its current development level.

# Get shapes of countries for the map plot
aggregate = ({'Mean_Temp' : 'mean', 
             'Mean_Dewpoint': 'mean',
             'Mean_Visibility' : 'mean',
             'Mean_Windspeed' : 'mean',
             'Max_Windspeed' : 'max',
             'Max_Temp' : 'max',
             'Min_Temp' : 'min',
             'Fog': 'sum',
             'Rain_or_Drizzle': 'sum',
             'Snow_or_Ice': 'sum',
             'Hail': 'sum',
             'Thunder': 'sum',
             'Tornado': 'sum'})

url = ('https://raw.githubusercontent.com/python-visualization/'
            'folium/master/examples/data')
country_shapes = f'{url}/world-countries.json'

with urlopen(country_shapes) as response:
    country = json.load(response)
    
# Get aggregates per country per year
yearly_country = (dfclean_yearly.groupby(['Year','Alpha-3 code'])
                    .agg(aggregate).reset_index())
    
fig = px.choropleth_mapbox(data_frame=yearly_country,
                           geojson=country,
                           featureidkey='id',
                           locations='Alpha-3 code',
                           color = 'Max_Temp',
                           color_continuous_scale = 'inferno',
                           opacity=0.95,
                           animation_frame = 'Year',
                           height = 700,
                           hover_name = 'Alpha-3 code',
                        )
                           
fig.update_layout(mapbox_style='open-street-map', 
                  mapbox_center = {"lat": 15, "lon": 0},
                  mapbox_zoom=0.55)

fig.update_layout(margin={'r':0,'t':0,'l':0,'b':0},
                 autosize=False,
                 width=1000,
                 height=500)

fig.show(renderer="notebook")

Figure 3. Maximum Recorded Temperature From 2000 to 2016 ¶

Exploring the maximum temperature recorded in a year, we can identify countries with significantly high maximum temperature for the day. Figure 3 tells us how the climate has changed from 2000 to 2016. From the plot, the United States, Saudi Arabia, and South Sudan are notable for having a consistently high max temperature throughout the years. These countries should have mitigating measures on the consequences of heat waves such as providing proper healthcare, planting crops that are more resilient to heat, and establishing farm infrastructures that protect livestock from dangerous levels of heat [2].

fig = px.choropleth_mapbox(data_frame=yearly_country,
                           geojson=country,
                           featureidkey='id',
                           locations='Alpha-3 code',
                           color = 'Max_Windspeed',
                           color_continuous_scale = 'emrld',
                           opacity=0.95,
                           animation_frame = 'Year',
                           height = 700,
                           hover_name = 'Alpha-3 code',
                        )
                           
fig.update_layout(mapbox_style='open-street-map', 
                  mapbox_center = {"lat": 15, "lon": 0},
                  mapbox_zoom=0.55)

fig.update_layout(margin={'r':0,'t':0,'l':0,'b':0},
                 autosize=False,
                 width=1000,
                 height=500)

fig.show(renderer="notebook")

Figure 4. Maximum Recorded Windspeed From 2000 to 2016 ¶

Figure 4 shows the maximum wind speed recorded per year per country. We can observe that countries on the northern part of the globe generally has stronger wind speed. This presents great opportunities in developing wind power projects in such countries [3]. Government may also incentivize this by providing incentives in developing renewable energy projects.

dfph = dfclean_yearly[dfclean_yearly['Country_Code']=='PH']

fig = px.scatter_mapbox(data_frame=dfph,
                        lat= 'Latitude',
                        lon = 'Longitude',
                        size='Mean_Windspeed',
                        color = 'Max_Windspeed',
                        color_discrete_sequence = px.colors.qualitative.Vivid,
                        size_max = 20,
                        color_continuous_scale = 'viridis_r',
                        opacity=0.95,
                        animation_frame = 'Year',
                        hover_name = 'Country',
                        hover_data = {'Latitude': False,
                                      'Longitude': False},
                        )

fig.update_layout(mapbox_style='open-street-map', 
                  mapbox_center = {"lat": 12.9, "lon": 123},
                  mapbox_zoom=4.3)

fig.update_layout(margin={'r':0,'t':0,'l':0,'b':0},
                 autosize=False,
                 width=1000,
                 height=700)

fig.show()

Figure 5. Average and Maximum Recorded Windspeed in the Philippines ¶

Zooming in to the Philippines, the windspeed is not as strong compared to other countries. However, some locations may be explored to drive the country towards developing a more sustainable energy sources. Based on the plot in Figure 5, we can see that there are opportunities in the coastal areas. Developers can use this as a baseline data before conducting feasibility studies in specific locations.

sum_country = (dfclean_yearly.groupby(['Country','Year'])
               ['Hail', 'Thunder', 'Tornado'].sum().reset_index())
yearly_ws = (sum_country.groupby('Country')['Hail', 'Thunder', 'Tornado']
             .mean().reset_index())

fig, ax = plt.subplots(3,1, figsize=(10,18))

hail = (yearly_ws[['Country','Hail']].sort_values(ascending=False,                                                   by='Hail')[:10])
sns.barplot(data=hail, x='Hail', y='Country', ax=ax[0])
ax[0].set_xlabel("Yearly Average Number of Hail Occurences")
ax[0].set_title("Top Countries - Hail Occurrences")


tornado = (yearly_ws[['Country','Tornado']].sort_values(ascending=False, 
                                                       by='Tornado')[:10])
sns.barplot(data=tornado, x='Tornado', y='Country',  ax=ax[1])
ax[1].set_xlabel("Yearly Average Number of Tornado Occurences")
ax[1].set_title("Top Countries - Tornado Occurrences");


thunder = (yearly_ws[['Country','Thunder']].sort_values(ascending=False, 
                                                       by='Thunder')[:10])
sns.barplot(data=thunder, x='Thunder', y='Country', ax=ax[2])
ax[2].set_xlabel("Yearly Average Number of Thunder Occurences")
ax[2].set_title("Top Countries - Thunder Occurrences");

Figure 6. Top countries that frequently experiences hail ¶

Now, we examine the adverse weather conditions we have experienced the past 17 years (Figure 6). Russia has the most hail occurrences across all countries, which is followed by United Kingdom and Norway. Countries within the top 10 of hail occurrences should have insurance policies that could cover risks caused by hailstorms. In terms of tornado occurences, the United States has the most occurrence across all countries by a mile. Local government in these countries should implement disaster risk-mitigation measures that would lessen the impact of such adverse weather conditions.Again, the United States tops the list with the most number of thunder occurrences across all countries. This means that they should develop infrastructure that are resilient to thunderstorms. Installation of lightning rods on such infrastructure would lessen the risk of having damages caused by thunderstorms.

CONCLUSION

In this project, we conducted an exploratory data analysis on the GSOD dataset to derive valuable insights. This was conducted using a Dask cluster and data visualization libraries. The following insights were derived from our analysis:

Weather stations are usually strategically located in coastal areas to maximize the number of measurements it takes.
The number of weather stations are correlated to the landmass and level of development of a country.
Some countries have significantly higher max temperature recorded compared to others, which exposes them to more health, economic, and environmental risk.
Countries in the northern part of the globe have significantly higher recorded windspeed. This makes development of wind power projects more palatable compared to countries that have low windspeed.
Coastal areas in the Philippines presents a great opportunity to develop new wind power projects. Developers can use the data from the weather station as a baseline in identifying areas that have great opportunities before conducting their feasibility studies.
Lastly, we identified countries that have the most exposure in terms of adverse weather conditions. These countries should develop resilient infrastructure, implement proper disaster risk mitigation programs, and ensure insurance coverage to minimize the potential impact of these adverse weather conditions.

We understand that the exploratory data analysis is limited to its descriptive nature. Further improvements of this study can be explored. Some of the recommendations that can extend this project are the following:

Develop a machine learning model which forecasts the weather condition of a specific area based on the historical records of nearby weather stations.
Integrate other environmental or economic data to derive more meaningful insights that provides value to a wider range of audience.
Evaluate effects of environmental policies of a specific country to the weather condition of surrounding weather stations.

The hope is that extension of this project could enable policy makers and businesses to make the necessary decisions that provides sustainable growth.

REFERENCES

[1] Henseler, M., Schumacher, I. (2019). The impact of weather on economic growth and its production factors. Climatic Change 154, 417–433 Retrieved from: https://doi.org/10.1007/s10584-019-02441-6

[2] Ventimiglia, A. (2019). Heatwave. Futureearth. Retrieved from: https://futureearth.org/publications/issue-briefs-2/heatwaves/

[3] Cetinay, H. & Kuipers, K.A. & Guven A.H. (2016). Optimal siting and sizing of wind farms. Elsevier. Retrieved from: https://www.sciencedirect.com/science/article/pii/S0960148116307091

	ID	Year	Country_Code	Latitude	Longitude	Mean_Temp	Mean_Dewpoint	Mean_Visibility	Mean_Windspeed	Max_Windspeed	Max_Temp	Min_Temp	Fog	Rain_or_Drizzle	Snow_or_Ice	Hail	Thunder
0	010010-99999	2000	NO	70.932999	-8.667	31.787979	27.753826	12.351640	6.422951	31.900000	54.299999	-0.600000	92.0	152.0	152	0	0
1	010014-99999	2000	NO	59.792000	5.341	48.139057	40.645117	6.593919	9.163732	35.000000	71.599998	24.799999	13.0	170.0	31	6	6
2	010015-99999	2000	NO	61.382999	5.867	43.899999	35.436739	6.027072	6.081564	32.099998	78.800003	15.800000	73.0	177.0	75	2	4
3	010017-99999	2000	NO	59.980000	2.250	47.329166	40.840178	6.173433	19.982336	60.000000	60.799999	28.400000	26.0	169.0	20	7	1
4	010030-99999	2000	NO	77.000000	15.500	26.284891	20.093682	15.558242	5.303022	26.000000	49.599998	-12.500000	11.0	102.0	135	2	0

	ID	Country	Alpha-3 code	Country_Code	Latitude	Longitude	Year	Mean_Temp	Mean_Dewpoint	Mean_Visibility	Mean_Windspeed	Max_Windspeed	Max_Temp	Min_Temp	Fog	Rain_or_Drizzle	Snow_or_Ice	Hail	Thunder
0	010010-99999	Norway	NOR	NO	70.932999	-8.667	2000	31.787979	27.753826	12.351640	6.422951	31.900000	54.299999	-0.600000	92.0	152.0	152.0	0	0.0
1	010014-99999	Norway	NOR	NO	59.792000	5.341	2000	48.139057	40.645117	6.593919	9.163732	35.000000	71.599998	24.799999	13.0	170.0	31.0	6	6.0
2	010015-99999	Norway	NOR	NO	61.382999	5.867	2000	43.899999	35.436739	6.027072	6.081564	32.099998	78.800003	15.800000	73.0	177.0	75.0	2	4.0
3	010017-99999	Norway	NOR	NO	59.980000	2.250	2000	47.329166	40.840178	6.173433	19.982336	60.000000	60.799999	28.400000	26.0	169.0	20.0	7	1.0
4	010030-99999	Norway	NOR	NO	77.000000	15.500	2000	26.284891	20.093682	15.558242	5.303022	26.000000	49.599998	-12.500000	11.0	102.0	135.0	2	0.0