Choosing XGBoost is already a design choice. Someone decided that a tree-based supervised model was the right framing, instead of a simpler baseline, a spatial model, a causal model, a simulation model, or no predictive model at all.
This page first explains what XGBoost is as a forecasting technology, then documents the practical parameters and logic used in this Tartu public-safety prototype.
The model is not a neutral technical object. It is built from human decisions about what to predict, which algorithm to trust, which features to include, how to categorize places, how to weight errors, and how to translate predictions into operational risk.
Choosing XGBoost is already a design choice. Someone decided that a tree-based supervised model was the right framing, instead of a simpler baseline, a spatial model, a causal model, a simulation model, or no predictive model at all.
Defining the target and features is subjective work. Decisions such as predicting
incident_count, using area-hour history, encoding weather, or including
nightlife, student, vulnerability, and density markers shape what the model can learn.
Tuning, validation, thresholds, smoothing, and risk-band design are also human choices. Different people could reasonably choose different horizons, error metrics, feature sets, weights, and categories, producing different outputs from the same city.
All model outcomes inherit this human design work. The forecast is an accumulation of choices, assumptions, local know-how, constraints, and omissions. It can be done in very different ways, so its outputs should be read as situated estimates, not neutral facts.
XGBoost stands for Extreme Gradient Boosting. It is a supervised machine-learning method that builds many decision trees in sequence, where each new tree tries to correct the errors made by the previous ones.
For forecasting work, XGBoost is often used when the input data contains mixed feature types, nonlinear relationships, threshold effects, and interactions between variables such as time, weather, land use, and neighborhood characteristics.
In a full production version, the model target would be incident_count for one
district at one hour. Candidate features would include:
The application layer would then convert predicted incident counts into patrol-planning signals, risk bands, and operational summaries.
The current map uses a deterministic, readable forecast engine inspired by an ML workflow. It blends historical area-hour behavior, citywide temporal pulse, weather context, and a small motion wave.
| Component | Current setting | What it does |
|---|---|---|
| Forecast horizon | 48 hours | Builds one citywide hourly outlook for the next two days. |
| Area exact history weight | 0.55 | Uses the same district + weekday + hour pattern as the main anchor. |
| Area hour fallback weight | 0.18 | Falls back to district + hour behavior across all weekdays. |
| Area day fallback weight | 0.12 | Adds district + weekday behavior independent of exact hour. |
| Area mean weight | 0.10 | Keeps each district tied to its overall historical baseline. |
| City pulse bridge weight | 0.05 | Injects citywide temporal pressure into the district baseline. |
| City pulse clamp | 0.70 to 1.34 | Prevents citywide timing effects from becoming unrealistically large. |
| Weather factors | clear 1.04, rain 0.92, snow 0.82, storm 0.88, extremes 0.90 | Modulates hourly incident expectations by weather state. |
| Weather outlook logic | Mostly clear, with rain blocks lasting 3 to 6 hours | Keeps the short forecast visually stable and realistic. |
| Motion wave | 1 +/- 0.05 sinusoidal variation | Adds slight hour-to-hour movement so adjacent hours are not identical. |
| Series smoothing | 0.22 previous, 0.56 current, 0.22 next | Smooths the hourly forecast to avoid harsh spikes between neighboring hours. |
| Risk score formula | 0.68 incident intensity + 0.32 per-capita pressure | Converts forecast counts into a 1 to 10 operational risk score. |
A stronger future version would go beyond the current district-level prototype and evolve into a richer operational forecasting service. The main upgrade path is to add more informative features, more granular spatial modeling, live API integrations, and a scalable product layer around the map.
The next model should ingest many more predictive features. Examples include incident category detail, such as whether the event was traffic-related, disturbance-related, or another specific public-order type. That would let the model distinguish between very different demand patterns instead of learning only one generic incident-count signal.
Instead of forecasting only by suburb, the city could be divided into a much finer GIS grid, for example a 100 x 100 meter lattice. That would produce a more realistic map of hot corridors, nightlife clusters, transport routes, and recurring micro-locations.
The application could communicate with weather services through APIs and use live weather feeds when computing each new hourly forecast. In a fuller public-sector ecosystem, the system could also ingest near-real-time incident counts from aggregated public APIs and refresh each area-hour prediction automatically.
The map application itself is already the right direction: an interactive Tartu city map that shows the next 48 hours of estimated public-order load by area, based on historical patterns, weather, weekday structure, and events. The stronger version would keep the heatmap, the hourly slider, the numerical incident forecast, and the risk-score overlay, but drive them from a continuously updated production model.
In that setup, the user could inspect each area hour by hour, view risk scores directly on the map, and read clearer risk interpretations below the legend or in supporting panels.
The same logic could be scaled from one city to a county-level or state-level service. Tartu would act as the pilot project, and the technical stack could then be extended gradually to new territories without changing the basic forecasting concept.
One possible product concept is CityWatch.ai: a licensed forecasting and map plugin that a local government could embed into its own city website or run on a dedicated public-safety page.
In that scenario, the municipality would pay a service license fee, for example EUR 1000 per month, while also supplying the aggregated incident feed through cooperation with local police. Because the service would work with area-level aggregated counts rather than direct personal identifiers, the data model could remain relatively lightweight while still being operationally useful.
This would turn the prototype from a one-off demonstration into a reusable forecasting service with a clearer ownership model, maintenance path, and deployment story.