Design Rationale

The Communications Control Software with Wi-Fi/5G/LTE is a complex system composed of physical input, data processing, and network/physical output components. In this design, we analyze the architecture based on the functions triggered when a user makes a communication request: 1) processing the communication request after the user submits it, 2) distributing channels, 3) maintaining continuous service during calls, including automatic switching and fault recovery, and 4) recording detailed communication data for monitoring and auditing purposes. In addition, the architecture must effectively support both automatic and manual operation modes. The automatic mode focuses on low-latency connection establishment, dynamic routing, and automatic fault switching (FR-1, FR-3, FR-8; NFR-2, NFR-4, NFR-5), while the manual mode allows the Dispatch to override and make decisions under abnormal or unexpected conditions (FR-2, FR-4, FR-5, FR-7).

The aforementioned functions align as subsets of the five modules above and as combinations of the various Functional and Nonfunctional Requirements. Module 1 (Radio Firmware) contains the lowest-level Push-to-Talk function (press -> connect immediately -> talk -> transmit). Module 2 (Android/iOS App) provides the mobile version, allowing users to join radio channels and communicate through Wi-Fi, 5G, or LTE. Module 3 (Tower Base Stations) handles signal forwarding and management. It receives voice data from the Radio or App and, according to the Controller’s settings, sends the signal to the correct channel or group while monitoring signal quality and interference. Module 4 (Communication Controller) determines which communication channel each user is assigned to and selects the most stable connection. When an abnormal condition occurs, it automatically switches to a backup route to prevent disconnection. Module 5 (Dispatch) acts as the command center, where operators can view all communication statuses, user states, and channel loads, as well as manually intervene when necessary (for example, force channel switching, end calls, or broadcast messages). Because this system is a cross-platform, distributed emergency communication system, the architecture must ensure real-time responsiveness and low latency while maintaining high availability and fault tolerance across multiple network environments. Therefore, in our design approach, we prioritize an architectural style that balances responsiveness and stability rather than one that focuses solely on extensibility.

Decisions

Architecture Style

[Module 1] Radio Firmware: Event-driven and monolithic style

In this module, an event-driven and monolithic style is used because it operates at the lowest level, where timing and response speed are critical. It does not need to communicate with multiple services, but must respond immediately to button inputs and emergency events. Therefore, an event-driven approach is chosen to handle input events quickly, while a monolithic structure keeps all logic within a single program, avoiding external delays and hardware limitations.

[Module 2] Android/ iOS App:

In this module, a layered and event-driven architecture is used because a mobile app typically includes three layers: presentation (buttons, channels, notifications), logic (PTT state, user authorization), and network communication (Wi-Fi/5G/LTE). The layered architecture clearly separates these responsibilities, making the system easier to maintain and expand in the future. At the same time, the event-driven approach allows the system to respond instantly to user actions (such as pressing the PTT button) and system events (such as network switching), ensuring low latency and high responsiveness. By combining both architectures, each layer can focus on its own event-handling tasks, giving the system strong maintainability, flexibility, and responsiveness.

[Module 3] Tower Base Stations:

In this module, a layered and distributed architecture is used because the tower works as a relay that passes signals between multiple channels and devices while connecting with the Controller, Radio, and App modules. The layered design separates the radio layer, communication layer, and monitoring layer to avoid interference and make management easier. The distributed design lets each tower run on its own, so even if one node fails, the others can keep working and prevent communication from being interrupted.

[Module 4] Communications Controller: Layered and Event-driven

In this module, a layered and event-driven architecture is used because the Controller acts as the brain of the entire system. It manages channel assignment, load balancing, and fault switching, while continuously monitoring events such as emergency calls and system status changes for real-time response. The layered design separates the message handling layer, decision layer, and data access layer to improve maintainability. The event-driven design allows the Controller to trigger actions based on status changes, such as switching channels or sending notifications to users.

[Module 5] Dispatch: Layered, Event-driven, and Fault-tolerant

In this module, a layered, event-driven, and fault-tolerant architecture is used because the Dispatch module serves as the system’s command and monitoring center. It must display all users, channels, and system statuses in real time while handling multiple types of events, such as alerts from the Controller, emergency calls from users, and operator commands. The layered design separates the interface layer (UI), control layer, and data layer to improve maintainability. The event-driven design enables quick responses to changing events, such as tower failures or emergency alerts. The fault-tolerant mechanism ensures that even if the Controller or some towers fail, the Dispatch can manually or automatically connect to backup systems and continue monitoring operations.

Justification / Rationale

D-1. Uses a trunked control channel that assigns voice paths as needed, which raises channel efficiency and supports more simultaneous users than fixed channels; the shared pool lets talk groups obtain a path quickly during busy periods, reducing wait time when many users press push to talk.
D-2. Integrates LTE or 5G and Wi-Fi through gateways so coverage extends beyond the radio footprint and broadband data becomes available; the gateways bridge talk groups so users on phones and tablets can hear and speak with radio users indoors, in large buildings, and at the edge of RF coverage.
D-3. Provides an emergency priority override that guarantees real-time communication when timing is critical; a one-touch emergency elevates the call, preempts lower-priority traffic if needed, and alerts dispatch with clear indicators and recorded events.
D-4. Uses a Central Communications Controller to make management easier by coordinating legacy RF and modern IP communications; one place handles group setup, channel assignment, routing, health status, and logging, which keeps configuration consistent and simplifies operations
D-5. Keeps human control during critical incidents with a command interface that automates routine actions yet allows manual decisions when conditions are unpredictable; operators can pause automation, reassign channels, and issue systemwide instructions as the situation evolves.
D-6. Extends Push-to-Talk to smartphones and tablets for non-radio users, which improves interoperability and accessibility; agencies can add temporary users or partner organizations without issuing new radios, while still using familiar talk-group workflows.
D-7. Maintains service when parts fail or signals are disrupted by providing dual paths across RF and IP with automatic switchover; if a tower, link, or controller instance is lost, traffic continues on the remaining path and users can keep talking.
D-8. Enables post-incident analysis and supports documentation standards by keeping detailed logs for at least one year; records include calls, alerts, group changes, and faults with timestamps so investigators can reconstruct events and teams can improve procedures.

Assumptions and constraints

Licensed RF spectrum is available, with coverage from the planned sites.
Public cellular networks may be congested during incidents, so RF remains the primary voice path.
Dispatch has trained operators on duty and a tested playbook.
Mobile devices are managed with MDM and authenticated with SSO.
Sites have backed-up power for at least 24 hours and periodic fuel resupply.
Privacy and retention rules allow one year of operational logging.

Operational scenarios

Routine operations: Most traffic stays on trunked RF. Phones and tablets join talk groups as needed, and the controller keeps groups, routing, and logs consistent.
Large building fire: Indoors, some radios fall back to Wi-Fi or LTE through the gateways, while command and attack groups remain on RF outside. Emergency overrides preempt noncritical calls and alert dispatch.
Rural search and rescue: Coverage gaps are bridged by LTE or deployable Wi-Fi. The controller keeps the team on one talk group, and logs capture movements, alerts, and failures for later review.

Failure modes and mitigations

Control channel lost: Radios switch to the secondary control channel, and the controller fails over to the standby instance.
Site power or backhaul failure: Traffic reroutes to surviving sites or to IP paths; hold-down timers prevent flapping until stability returns.
Carrier outage or heavy congestion: Phones fall back to RF where possible or to managed Wi-Fi; critical groups are pinned to RF.
Gateway fault: Health checks remove the faulty gateway from service, and alarms notify dispatch.
Authentication outage: Radios continue on cached credentials; phone access is restricted to pre-enrolled devices until the IdP recovers.

Network selection policy

Prefer RF for push-to-talk when signal margins are good, since RF provides the lowest delay. If RF quality drops below agreed thresholds or the control channel is saturated, move that user to LTE, 5G, or Wi-Fi through the gateways. When RF quality recovers for several checks in a row, return the user to RF. The controller applies the same policy for group calls so a talk group remains coherent.

Targets and verification

Call setup time: At least 95 percent of calls begin within 350 ms.
Mouth-to-ear delay: At least 95 percent of radio-to-radio talk bursts deliver audio in 250 ms or less; radio-to-phone in 350 ms or less.
Availability: Monthly service availability is at least 99.95 percent, with no single maintenance window longer than 15 minutes.
Emergency handling: At least 95 percent of emergency calls preempt within 1 second and alert dispatch immediately.
Logging: Retain at least 365 days; at least 95 percent of queries over the last 24 hours return in 2 seconds or less; zero data-loss objective.

Security and privacy

Use role-based access for dispatch and admin functions, enforce MFA for administrators and supervisors, secure gateways and apps with mutual TLS and certificate pinning, encrypt logs at rest, and restrict access to audit data based on duty role.

Glossary

Trunking: A shared pool of channels assigned on demand by a controller.
Push-to-Talk: Half-duplex voice where users press a button to transmit.
Talk group: A logical group of users who hear each other’s calls.
Emergency override: A privileged call that preempts other traffic.
Gateway: A device or service that bridges RF with LTE, 5G, or Wi-Fi.

Potential future work

Indoor location for mayday events, deployable cells for incident scenes, over-the-air configuration for radios and apps, automated post-incident reports that summarize calls, alerts, and failures.

Design rational summary

This design keeps real-time voice reliable by combining a trunked RF layer that assigns channels on demand with a Central Communications Controller that coordinates talk groups, routing, status, and logging, while the dispatch interface automates routine actions yet permits manual decisions when conditions change quickly. Trunking raises spectral efficiency and limits wait time during busy periods, so push to talk remains low latency in normal and degraded conditions. Emergency priority and preemption guarantee that urgent calls obtain a path immediately, and the console surfaces clear alarms while recording events for later review. Configuration and policy remain consistent because the controller is the single point for group setup, channel assignment, and telemetry, which simplifies operations without removing human oversight.

Coverage and accessibility expand because LTE, 5G, and Wi-Fi are bridged through gateways that carry talk-group audio to smartphones and tablets, thereby keeping procedures unchanged for radio users while allowing partner agencies to participate without new radios. Resilience comes from dual paths across RF and IP, together with a network selection policy that moves users when quality thresholds or saturation are exceeded and returns them after stability is confirmed, while health checks and hold-down timers prevent oscillation during partial failures. Uniform policies for admission, priority, and path choice apply across the system, and comprehensive logs of calls, alerts, configuration changes, and faults are retained for at least a year to enable investigations and compliance. Security controls, including role-based access, multi-factor authentication for administrators, mutual TLS for gateways and mobile apps, and encryption of audit data, protect the system without adding friction to field users, which makes the combined architecture dependable for daily work and for large-scale incidents.

Page updated

Report abuse