Airport Traffic Analyzer

A real-time flight delay monitoring and prediction system built with Python, R, and machine learning

What it does

The Airport Traffic Analyzer is a machine learning system that predicts flight departure delays using real flight and weather data. It combines data from the Aviation Stack API with weather information from Open-Meteo to train a Random Forest model that can predict delays with an RMSE of 8.4 minutes.

The interactive Shiny dashboard provides airline performance rankings, temporal risk patterns, and real-time delay predictions. You can compare airlines, identify high-risk departure times, and see how weather factors like wind speed correlate with delays across different hours and days of the week.

The system uses a smart data accumulation strategy — instead of fetching fresh data each time, it grows the database by adding only new flights. This gives the ML model more training data over time while reducing API costs by 99% after the initial setup.

Key features

Interactive Dashboard: Five-tab Shiny interface with airline comparisons, temporal risk heatmaps, and real-time predictions with filtering capabilities.

Machine Learning Model: Random Forest regression trained on 1,800+ flights with 59 engineered features, achieving 8.4 minutes RMSE through 5-fold cross-validation.

Data Pipeline: Automated Python scripts that fetch, clean, and merge flight data with weather information, handling duplicates and missing values.

Analytics Report: Comprehensive Quarto document with 50+ visualizations analyzing delay patterns, airline performance, and weather correlations.

Tech stack

Python

R Shiny

scikit-learn

tidymodels

Plotly

pandas

Aviation Stack API

ShinyApps.io

What I learned

Data Pipeline

Real flight data is expensive and messy

I paid $50 for Aviation Stack API access and quickly learned that real-world flight data has missing fields, inconsistent formats, and duplicates. Cleaning took longer than expected — had to build robust validation pipelines.

Data Pipeline

Growing datasets are better than fresh ones

Instead of fetching fresh data every time, I built a system that accumulates new flights while keeping historical ones. This gives the ML model more training data and saves 99% of API calls after the first run.

Machine Learning

Not all intuitive features work

I assumed wind speed would strongly correlate with delays. The data showed almost no relationship. Lesson: test assumptions with data first, don't build features based on hunches alone.

Machine Learning

Feature engineering beats raw features

The model performed much better with engineered features like hour-of-day and day-of-week rather than just timestamps. Turning continuous time into categorical buckets helped capture temporal patterns.

Machine Learning

Cross-validation prevents overfitting

My first model had great training performance but terrible test results. Implementing 5-fold cross-validation showed the real performance and helped me tune hyperparameters properly.

Deployment

Shiny deployment has memory limits

ShinyApps.io free tier has strict RAM limits. I had to optimize data loading and use efficient data structures. Large datasets need to be pre-processed and cached rather than loaded fresh.

APIs

API rate limits need careful handling

Aviation Stack API has daily request limits. I built retry logic with exponential backoff and saved all responses locally to avoid hitting limits during development. Always cache external API calls.

Machine Learning

Weather data improves predictions

Adding Open-Meteo weather data (temperature, wind, visibility) to the model improved accuracy. Even though individual weather factors were weak predictors, the combination added valuable signal.

Let's Connect

I’m always open to discussing math, computer science, or aviation logistics. Feel free to reach out if you’d like to collaborate.

hello@sabbasov.com linkedin.sabbasov.com Download Resume