CSV File Analysis.txt

CSV File Analysis
city_weather.csv

Columns: city_id, date, hour, temp, wind_speed, description, precip, humidity, visibility, pressure, chanceofrain, chanceoffog, chanceofsnow, chanceofthunder
Purpose: Provides weather data for different cities at various times, which can be used to assess the impact of weather on truck delays.
drivers_table.csv

Columns: driver_id, name, gender, age, experience, driving_style, ratings, vehicle_no, average_speed_mph
Purpose: Contains information about drivers, which can help in understanding their impact on delay (e.g., driving style, experience).
routes_table.csv

Columns: route_id, origin_id, destination_id, distance, average_hours
Purpose: Contains route details, including distance and expected travel time, which are essential for predicting delays based on route characteristics.
routes_weather.csv

Columns: route_id, Date, temp, wind_speed, description, precip, humidity, visibility, pressure, chanceofrain, chanceoffog, chanceofsnow, chanceofthunder
Purpose: Provides weather data specific to routes, which is crucial for understanding route-specific weather impacts.
traffic_table.csv

Columns: route_id, date, hour, no_of_vehicles, accident
Purpose: Contains traffic data and accident information, which can influence delays and should be considered in the model.
truck_schedule_table.csv

Columns: truck_id, route_id, departure_date, estimated_arrival, delay
Purpose: The primary dataset with truck schedules and actual delays. This is your target variable for classification.
trucks_table.csv

Columns: truck_id, truck_age, load_capacity_pounds, mileage_mpg, fuel_type
Purpose: Provides details about the trucks, which may affect delay due to factors like truck age or load capacity.


Checklist for Data Cleaning Completion:
 Verified and handled missing values.
 Identified and addressed outliers.
 Ensured correct data types.
 Reviewed and validated categorical features.
 Saved the cleaned data.
 Documented the changes made.