-
Notifications
You must be signed in to change notification settings - Fork 0
/
CSV File Analysis.txt
42 lines (29 loc) · 1.94 KB
/
CSV File Analysis.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
CSV File Analysis
city_weather.csv
Columns: city_id, date, hour, temp, wind_speed, description, precip, humidity, visibility, pressure, chanceofrain, chanceoffog, chanceofsnow, chanceofthunder
Purpose: Provides weather data for different cities at various times, which can be used to assess the impact of weather on truck delays.
drivers_table.csv
Columns: driver_id, name, gender, age, experience, driving_style, ratings, vehicle_no, average_speed_mph
Purpose: Contains information about drivers, which can help in understanding their impact on delay (e.g., driving style, experience).
routes_table.csv
Columns: route_id, origin_id, destination_id, distance, average_hours
Purpose: Contains route details, including distance and expected travel time, which are essential for predicting delays based on route characteristics.
routes_weather.csv
Columns: route_id, Date, temp, wind_speed, description, precip, humidity, visibility, pressure, chanceofrain, chanceoffog, chanceofsnow, chanceofthunder
Purpose: Provides weather data specific to routes, which is crucial for understanding route-specific weather impacts.
traffic_table.csv
Columns: route_id, date, hour, no_of_vehicles, accident
Purpose: Contains traffic data and accident information, which can influence delays and should be considered in the model.
truck_schedule_table.csv
Columns: truck_id, route_id, departure_date, estimated_arrival, delay
Purpose: The primary dataset with truck schedules and actual delays. This is your target variable for classification.
trucks_table.csv
Columns: truck_id, truck_age, load_capacity_pounds, mileage_mpg, fuel_type
Purpose: Provides details about the trucks, which may affect delay due to factors like truck age or load capacity.
Checklist for Data Cleaning Completion:
Verified and handled missing values.
Identified and addressed outliers.
Ensured correct data types.
Reviewed and validated categorical features.
Saved the cleaned data.
Documented the changes made.