Here you will find all tactics for the quality characteristic Reliability.
Fault Tolerance
Error Handling
Detect, log, and handle errors through structured mechanisms
Here you will find all tactics for the quality characteristic Reliability.
Fault Tolerance
Detect, log, and handle errors through structured mechanisms
Availability
Deploy multiple instances of critical components or systems
Fault Tolerance
Use exceptions for signaling and handling error states
Maturity
Also supports: Availability
Process steps and requirements systematically
Availability
Also supports: Recoverability, Analyzability
Provide detailed instructions for processing tasks and incidents
Fault Tolerance, Recoverability
Record errors with structured formats, severity levels, and stack traces
Availability
Also supports: Analyzability
Collect metrics, configure alerts, and detect anomalies systematically
Availability
Also supports: Capacity, Operability
Track CPU, memory, disk, and network metrics for capacity decisions
Maturity
Also supports: Reusability, Replaceability
Use proven and mature technologies
Fault Tolerance, Recoverability
Also supports: Availability
Remain operational under adverse conditions or faults
Recoverability
Restore operations after disasters or major disruptions
Recoverability
Ensure employees are available to respond quickly to incidents
Availability
Also supports: Capacity
Distribute workload across multiple resources
Fault Tolerance, Recoverability
Introduce disruptions intentionally to test system resilience
Availability
Also supports: Time-behaviour, Capacity
Adjust resources automatically based on current load
Availability
Also supports: Time-behaviour, Capacity
Forecast and plan required resources based on growth predictions
Availability
Also supports: Time-behaviour, Capacity
Define expectations for software availability and performance
Availability, Maturity
Define measurable goals for system reliability and performance
Availability
Also supports: Time-behaviour
Track key metrics of software reliability and performance
Availability, Fault Tolerance
Create and synchronize copies of data across multiple systems
Availability, Fault Tolerance
Detect component failures and redirect operations to standby replacements
Availability
Also supports: Analyzability
Enable a system to monitor its own state and detect issues
Fault Tolerance, Availability
Develop mechanisms to isolate faulty components
Fault Tolerance
Also supports: Analyzability, Adaptability
Ensure consistency between development, test, and production environments
Availability
Inspect and care for production infrastructure through scheduled checks
Availability
Also supports: Testability
Apply engineering principles for stable system operations
Availability, Fault Tolerance
Operate with reduced functionality during failures or overload
Fault Tolerance, Availability
Protect distributed systems from error cascades and overload
Fault Tolerance, Availability
Also supports: Capacity, Modularity
Divide a system into isolated areas to limit fault propagation
Fault Tolerance
Also supports: Analyzability
Enable components to check their own state and functionality
Availability
Send requests actively to a component to check its availability
Availability
Transmit a component's heartbeat regularly to a monitoring instance
Fault Tolerance
Also supports: Integrity
Group multiple operations into an atomic, consistent unit
Fault Tolerance
Retry failed operations to handle transient errors
Availability, Fault Tolerance
Detect and handle system errors or failures via a watchdog component
Fault Tolerance, Maturity
Perform basic tests to verify core functionality of a system
Availability
Forward requests continuously despite failures or errors
Recoverability
Also supports: Analyzability, Integrity
Add timestamps to data or events for temporal tracking
Availability
Also supports: Analyzability
Track service health, uptime, and component availability continuously
Availability, Fault Tolerance
Group servers into a cluster with shared storage and automatic primary takeover
Availability
Also supports: Capacity
Store data on multiple media or systems
Recoverability, Availability
Revert changes and return to a previous stable state
Recoverability, Fault Tolerance
Operate two parallel production environments to minimize downtime
Recoverability, Fault Tolerance
Also supports: Modifiability, Adaptability
Activate and deactivate features for flexible rollouts
Fault Tolerance
Also supports: Modifiability
Update servers or instances stepwise without full downtime
Availability
Also supports: Fault Tolerance, Maturity
Deploy new features hidden to a controlled user subset
Availability, Fault Tolerance
Introduce changes gradually for a limited user group
Fault Tolerance
Also supports: Correctness
Employ data structures that remain operational despite errors
Fault Tolerance
Also supports: Modularity
Limit the impact of faults to a small part of the system
Fault Tolerance
Also supports: Maturity
Use codes to detect and correct errors in data
Fault Tolerance
Also supports: Analyzability
Prioritize errors by impact, track resolution, and communicate outcomes
Fault Tolerance
Also supports: Integrity
Calculate checksums to detect data errors or changes
Fault Tolerance, Maturity
Use multiple different checksum algorithms
Maturity, Availability
Verify data integrity regularly during storage or transmission
Maturity, Fault Tolerance
Check inputs, data, or states for validity to detect errors early
Recoverability
Regularly back up the system state
Fault Tolerance
Also supports: Integrity
Verify integrity of system components, configurations, and data continuously
Maturity
Also supports: Time-behaviour, Capacity
Evaluate system performance and stability under high load
Recoverability
Back up data and system states regularly
Availability
Also supports: Modifiability
Handle disruptions and failures through a structured process
Fault Tolerance
Also supports: Analyzability
Systematically analyze the causes of failures
Availability
Design architectures for maximum availability and fault tolerance
Fault Tolerance
Also supports: Analyzability
Analyze recorded log data for patterns, trends, and recurring anomalies
Maturity
Also supports: Integrity, Correctness
Ensure data accuracy, consistency, and reliability
Availability
Also supports: Integrity, Fault Tolerance
Prevent reliability incidents caused by security vulnerabilities
Maturity
Also supports: Integrity, Modifiability
Apply software patches, security fixes, and version updates on a defined schedule
Maturity
Also supports: Modifiability, Integrity
Automate software integration, testing, and deployment continuously
Fault Tolerance
Also supports: Availability
Replace infrastructure components with new versions instead of modifying them
Maturity
Also supports: Testability
Verify functionality automatically at various test levels
Maturity, Availability
Quantify acceptable unreliability to balance feature velocity and reliability
Maturity
Also supports: Analyzability
Learn from incidents systematically, focusing on systemic improvements
Availability, Maturity
Expose standardized health check APIs for load balancers and orchestrators
Fault Tolerance, Availability
Define and enforce timeouts on all external calls against indefinite blocking
Availability, Fault Tolerance
Control incoming request rates against system overload during traffic spikes
Availability, Fault Tolerance
Drop low-priority requests under overload to preserve critical capacity
Fault Tolerance, Recoverability
Design safely retryable operations without unintended side effects
Fault Tolerance, Recoverability
Manage distributed transactions through local transaction sequences with compensation
Fault Tolerance, Recoverability
Route failed messages to a dedicated queue for later reprocessing
Recoverability, Fault Tolerance
Record changes in a durable append-only log before applying them