Machine Learning to predict air pollution using dummy data in Python
A demonstration of using machine learning to predict air pollution using dummy data in Python:
First, we will start by importing the necessary libraries:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error
Next, we will generate some dummy data for our demonstration. In this example, we will use two features (traffic volume and industrial activity) to predict air pollution levels:
# Generate dummy data
np.random.seed(0)
# Number of samples
n_samples = 1000
# Generate feature data
traffic = np.random.normal(loc=50, scale=10, size=n_samples)
industrial = np.random.normal(loc=10, scale=5, size=n_samples)
# Generate target data (air pollution levels)
pollution = traffic + industrial + np.random.normal(loc=0, scale=5, size=n_samples)
# Combine features and target into a single dataframe
data = pd.DataFrame({'traffic': traffic, 'industrial': industrial, 'pollution': pollution})
Now that we have our dummy data, we can split it into training and testing sets:
# Split data into training and testing sets
X = data[['traffic', 'industrial']]
y = data['pollution']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Next, we will train a linear regression model on the training data:
# Train linear regression model
lr = LinearRegression()
lr.fit(X_train, y_train)
Now that our model is trained, we can use it to make predictions on the testing data:
# Make predictions on testing data
y_pred = lr.predict(X_test)
Finally, we can evaluate the performance of our model using metrics such as mean absolute error and mean squared error:
# Evaluate model performance
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Absolute Error: {mae:.2f}')
print(f'Mean Squared Error: {mse:.2f}')
This is just a simple demonstration of how machine learning can be used to predict air pollution levels using dummy data. Of course, in a real-world project, you would need to use actual data and may need to use more advanced techniques to improve the accuracy of your model.
First, we will start by importing the necessary libraries:
import pandas as pd
import numpy as np
from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error
Next, we will generate some dummy data for our demonstration. In this example, we will use two features (traffic volume and industrial activity) to predict air pollution levels:
# Generate dummy data
np.random.seed(0)
# Number of samples
n_samples = 1000
# Generate feature data
traffic = np.random.normal(loc=50, scale=10, size=n_samples)
industrial = np.random.normal(loc=10, scale=5, size=n_samples)
# Generate target data (air pollution levels)
pollution = traffic + industrial + np.random.normal(loc=0, scale=5, size=n_samples)
# Combine features and target into a single dataframe
data = pd.DataFrame({'traffic': traffic, 'industrial': industrial, 'pollution': pollution})
Now that we have our dummy data, we can split it into training and testing sets:
# Split data into training and testing sets
X = data[['traffic', 'industrial']]
y = data['pollution']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Next, we will train an XGBoost model on the training data:
# Train XGBoost model
xgb = XGBRegressor()
xgb.fit(X_train, y_train)
Now that our model is trained, we can use it to make predictions on the testing data:
# Make predictions on testing data
y_pred = xgb.predict(X_test)
Finally, we can evaluate the performance of our model using metrics such as mean absolute error and mean squared error:
# Evaluate model performance
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Absolute Error: {mae:.2f}')
print(f'Mean Squared Error: {mse:.2f}')
This is just a simple demonstration of how an XGBoost model can be used to predict air pollution levels using dummy data. Of course, in a real-world project, you would need to use actual data and may need to use more advanced techniques to improve the accuracy of your model.
Leave a Comment