Airport Traffic Analyzer
A real-time flight delay monitoring and prediction system built with Python, R, and machine learning
What it does
The Airport Traffic Analyzer is a machine learning system that predicts flight departure delays using real flight and weather data. It combines data from the Aviation Stack API with weather information from Open-Meteo to train a Random Forest model that can predict delays with an RMSE of 8.4 minutes.
The interactive Shiny dashboard provides airline performance rankings, temporal risk patterns, and real-time delay predictions. You can compare airlines, identify high-risk departure times, and see how weather factors like wind speed correlate with delays across different hours and days of the week.
The system uses a smart data accumulation strategy — instead of fetching fresh data each time, it grows the database by adding only new flights. This gives the ML model more training data over time while reducing API costs by 99% after the initial setup.
Key features
Interactive Dashboard: Five-tab Shiny interface with airline comparisons, temporal risk heatmaps, and real-time predictions with filtering capabilities.
Machine Learning Model: Random Forest regression trained on 1,800+ flights with 59 engineered features, achieving 8.4 minutes RMSE through 5-fold cross-validation.
Data Pipeline: Automated Python scripts that fetch, clean, and merge flight data with weather information, handling duplicates and missing values.
Analytics Report: Comprehensive Quarto document with 50+ visualizations analyzing delay patterns, airline performance, and weather correlations.
Tech stack
tidymodels
Plotly
Aviation Stack APIWhat I learned
Real flight data is expensive and messy
I paid $50 for Aviation Stack API access and quickly learned that real-world flight data has missing fields, inconsistent formats, and duplicates. Cleaning took longer than expected — had to build robust validation pipelines.
Growing datasets are better than fresh ones
Instead of fetching fresh data every time, I built a system that accumulates new flights while keeping historical ones. This gives the ML model more training data and saves 99% of API calls after the first run.
Not all intuitive features work
I assumed wind speed would strongly correlate with delays. The data showed almost no relationship. Lesson: test assumptions with data first, don't build features based on hunches alone.
Feature engineering beats raw features
The model performed much better with engineered features like hour-of-day and day-of-week rather than just timestamps. Turning continuous time into categorical buckets helped capture temporal patterns.
Cross-validation prevents overfitting
My first model had great training performance but terrible test results. Implementing 5-fold cross-validation showed the real performance and helped me tune hyperparameters properly.
Shiny deployment has memory limits
ShinyApps.io free tier has strict RAM limits. I had to optimize data loading and use efficient data structures. Large datasets need to be pre-processed and cached rather than loaded fresh.
API rate limits need careful handling
Aviation Stack API has daily request limits. I built retry logic with exponential backoff and saved all responses locally to avoid hitting limits during development. Always cache external API calls.
Weather data improves predictions
Adding Open-Meteo weather data (temperature, wind, visibility) to the model improved accuracy. Even though individual weather factors were weak predictors, the combination added valuable signal.
Let's Connect
I’m always open to discussing math, computer science, or aviation logistics. Feel free to reach out if you’d like to collaborate.