Server Monitors
What are Server Monitors?
Server monitors track the health and performance of your servers by collecting real-time system metrics including CPU usage, memory consumption, disk space, network activity, and system load. This enables you to proactively identify performance issues, capacity problems, and potential failures before they impact your services.
Why Server Monitoring is Essential
Performance Optimization
- Resource utilization: Track CPU, RAM, and disk usage patterns
- Bottleneck identification: Pinpoint performance constraints
- Capacity planning: Plan upgrades based on usage trends
- Cost optimization: Right-size your infrastructure
Proactive Problem Detection
- Early warnings: Detect issues before they cause outages
- Threshold alerts: Get notified when resources exceed safe limits
- Trend analysis: Identify gradual performance degradation
- Preventive maintenance: Schedule maintenance based on data
Troubleshooting and Diagnostics
- Historical data: Analyze past performance during incidents
- Correlation analysis: Connect performance issues to system events
- Root cause analysis: Identify the source of performance problems
- Impact assessment: Understand how issues affect overall system health
How to Access Server Monitors
Access server monitoring through:
- Main dashboard → Server Monitors section
- Sidebar navigation → Server Monitors
- Direct URL: /server-monitors
Monitored System Metrics
CPU Metrics
- CPU Usage Percentage: Overall processor utilization
- CPU Load Average: 1, 5, and 15-minute load averages
- CPU Cores: Number of processor cores available
- CPU Model: Processor type and specifications
- CPU Frequency: Current processor frequency
Memory Metrics
- RAM Usage Percentage: Memory utilization percentage
- RAM Used: Amount of memory currently in use (MB/GB)
- RAM Total: Total available system memory
- Memory Trends: Historical memory usage patterns
Storage Metrics
- Disk Usage Percentage: Disk space utilization
- Disk Used: Amount of disk space consumed (MB/GB)
- Disk Total: Total available disk space
- Disk Trends: Storage consumption over time
Network Metrics
- Network Download: Current download speed and activity
- Network Upload: Current upload speed and activity
- Total Download: Cumulative data downloaded
- Total Upload: Cumulative data uploaded
System Information
- Operating System: OS name and version
- Kernel Information: Kernel name, version, and release
- CPU Architecture: System architecture (x86_64, ARM, etc.)
- System Uptime: How long the system has been running
Setting Up Server Monitoring
Step 1: Create a Server Monitor
- Navigate to Server Monitors
- Click "Create Server Monitor"
- Fill in the monitor configuration form
Step 2: Basic Configuration
Monitor Details
- Name: Descriptive name for your server (e.g., "Production Web Server", "Database Server #1")
- Target: Unique identifier for your server (can be hostname, IP, or custom identifier)
- Description: Optional details about the server's purpose and role
Project Assignment
- Assign the server monitor to a project for organization
- Group related servers together (e.g., "Production Environment", "Client X Infrastructure")
- Helps with team access control and reporting
Step 3: Notification Configuration
Alert Thresholds
Set thresholds for when to receive alerts:
- CPU Usage: Alert when CPU exceeds specified percentage (e.g., 80%)
- Memory Usage: Alert when RAM usage exceeds threshold (e.g., 90%)
- Disk Usage: Alert when disk space exceeds limit (e.g., 85%)
- Load Average: Alert when system load is too high
Notification Handlers
- Select which notification methods to use for alerts
- Choose different handlers for different severity levels
- Configure escalation procedures for critical alerts
Step 4: Install Monitoring Agent
Get Installation Script
- After creating the server monitor, click on its name
- Navigate to the "Installation" or "Code" section
- Copy the provided installation script
Installation Methods
Linux/Unix Servers (Bash Script)
The system provides a custom bash script that:
- Collects CPU, memory, disk, and network metrics
- Gathers system information (OS, kernel, hardware details)
- Sends data to your monitoring endpoint via API
- Runs automatically via cron job
Installation Steps
- Copy the provided script to your server
- Save it as a file (e.g., server_monitor.sh)
- Make it executable: chmod +x server_monitor.sh
- Test the script: ./server_monitor.sh
- Add to cron for automatic execution
Cron Job Configuration
Set up automatic data collection:
# Edit crontab
crontab -e
# Add line for every 5 minutes (adjust as needed)
*/5 * * * * /path/to/server_monitor.sh
# Or every minute for high-frequency monitoring
* * * * * /path/to/server_monitor.sh
Understanding Server Monitor Data
Monitor Status
- 🟢 Active: Server is reporting data regularly
- 🔴 Inactive: No data received recently (check agent/connectivity)
- 🟡 Warning: Some metrics exceed warning thresholds
- 🔴 Critical: One or more metrics exceed critical thresholds
- ⚪ Paused: Monitoring is temporarily disabled
Data Freshness
- Last Log: When the last data was received
- Data Frequency: How often your server sends updates
- Expected Intervals: Based on your cron job configuration
- Missing Data Alerts: Notifications when data stops arriving
Metric Trends and Graphs
- Real-time displays: Current resource utilization
- Historical trends: Performance over time
- Peak usage periods: Identify high-demand times
- Correlation analysis: How different metrics relate to each other
Managing Server Monitors
Viewing Detailed Metrics
- Click on any server monitor from your list
- View current real-time metrics
- Examine historical performance graphs
- Review system information and specifications
- Check alert history and threshold breaches
Customizing Alert Thresholds
- Go to server monitor details
- Click "Edit" or "Settings"
- Adjust threshold values based on your server's normal operation
- Set different thresholds for warning vs. critical alerts
- Save your changes
Managing Data Collection
Adjusting Collection Frequency
- High frequency (1 minute): Critical production servers
- Standard frequency (5 minutes): Normal production monitoring
- Low frequency (15-30 minutes): Development or stable servers
- Custom intervals: Based on specific requirements
Pausing Monitoring
- During maintenance: Pause to avoid false alerts
- Server decommissioning: Pause before removing servers
- Testing periods: Pause during load testing or migrations
Understanding Alert Conditions
CPU Alerts
- High CPU usage: Sustained CPU usage above threshold
- Load average spikes: System load exceeding normal levels
- CPU frequency changes: Processor throttling or speed changes
- Core utilization: Uneven load distribution across cores
Memory Alerts
- High memory usage: RAM utilization exceeding safe limits
- Memory leaks: Gradual increase in memory usage over time
- Available memory low: Risk of out-of-memory conditions
- Swap usage: System using swap space (performance impact)
Disk Alerts
- Disk space low: Storage approaching capacity limits
- Rapid disk growth: Unusual increase in disk usage
- Critical disk full: Risk of system issues due to full disk
- I/O performance: High disk activity affecting performance
Network Alerts
- High bandwidth usage: Network utilization exceeding normal levels
- Unusual traffic patterns: Unexpected network activity
- Connectivity issues: Problems reaching monitoring endpoints
- Data transfer anomalies: Unusual upload/download patterns
What to Expect
Initial Setup
- Install monitoring script on your server
- First data appears within minutes of script execution
- Initial baselines are established after a few data points
- Alert thresholds can be fine-tuned based on normal operation
Ongoing Monitoring
- Continuous data collection according to your cron schedule
- Real-time metric updates on the dashboard
- Automatic alerts when thresholds are exceeded
- Historical data accumulation for trend analysis
Alert Notifications
- Threshold breaches: Immediate alerts when limits are exceeded
- Recovery notifications: Alerts when metrics return to normal
- Missing data alerts: Notifications when servers stop reporting
- System information changes: Alerts for hardware or OS changes
Common Issues and Troubleshooting
No Data Received
- Check script execution: Verify the monitoring script runs without errors
- Network connectivity: Ensure server can reach monitoring endpoints
- Authentication issues: Verify API key and server ID are correct
- Firewall restrictions: Check if outbound HTTPS is blocked
- Cron job setup: Confirm cron job is properly configured and running
Inaccurate Metrics
- Script permissions: Ensure monitoring script has necessary permissions
- System commands: Verify required system utilities are available
- Virtualization issues: Some metrics may be limited in virtualized environments
- Container environments: Docker/container metrics may need special handling
False Alerts
- Threshold adjustment: Fine-tune alert thresholds based on normal operations
- Temporary spikes: Consider average values over time vs. instant readings
- Scheduled events: Account for regular maintenance or backup jobs
- Baseline establishment: Allow time for system to establish normal patterns
Missing Alerts
- Notification handlers: Verify alert channels are properly configured
- Threshold values: Check if thresholds are set appropriately
- Alert frequency: Ensure alerts aren't being rate-limited
- System performance: Very high system load might delay script execution
Best Practices
Monitoring Strategy
- Critical servers first: Start with your most important servers
- Baseline establishment: Monitor for a week to understand normal patterns
- Threshold tuning: Adjust alerts based on actual server behavior
- Comprehensive coverage: Monitor all key server roles (web, database, etc.)
Alert Configuration
- Graduated thresholds: Warning at 80%, critical at 95%
- Context-aware alerts: Different thresholds for different server types
- Time-based considerations: Account for peak usage periods
- Alert fatigue prevention: Avoid too many low-priority alerts
Data Collection
- Appropriate frequency: Balance monitoring detail with system overhead
- Consistent scheduling: Use regular intervals for better trend analysis
- Error handling: Ensure monitoring script handles failures gracefully
- Log rotation: Manage local log files to prevent disk space issues
Team Organization
- Clear naming: Use descriptive names that identify server purpose
- Project grouping: Organize servers by environment, client, or function
- Role-based access: Configure appropriate team access for different server groups
- Documentation: Maintain records of server purposes and dependencies
Advanced Monitoring Techniques
Performance Baseline Establishment
- Normal operation patterns: Document typical resource usage
- Peak usage identification: Understand when servers are most loaded
- Seasonal variations: Account for business cycle impacts
- Growth trends: Track how resource usage increases over time
Capacity Planning
- Trend analysis: Project future resource needs based on growth
- Peak capacity planning: Ensure adequate resources for peak loads
- Scaling decisions: Use data to decide when to upgrade or scale out
- Cost optimization: Identify over-provisioned resources
Correlation Analysis
- Multi-metric correlation: Understand how CPU, memory, and disk relate
- Application performance: Correlate server metrics with application performance
- External factors: Consider how external events affect server performance
- Dependency mapping: Understand how server performance affects other systems
Security and Compliance
Data Security
- Secure transmission: All data sent over HTTPS
- API authentication: Secure API keys for data submission
- Access control: Limit who can view server metrics
- Data retention: Understand how long metrics are stored
Compliance Considerations
- Monitoring logs: Maintain records of system performance for audits
- Change tracking: Document server configuration and performance changes
- Incident documentation: Use metrics data for incident reports
- Retention policies: Align data retention with compliance requirements
Integration with Other Monitoring
Holistic Monitoring Strategy
- Application monitoring: Combine server metrics with application performance
- Network monitoring: Include network performance alongside server metrics
- Log analysis: Correlate performance metrics with log data
- User experience: Connect server performance to user impact
Incident Response
- Performance context: Use server metrics during incident investigation
- Root cause analysis: Identify performance bottlenecks causing issues
- Recovery verification: Confirm performance returns to normal after fixes
- Post-incident analysis: Use data to improve monitoring and prevent recurrence
Tips for Success
- Start simple: Begin with basic CPU, memory, and disk monitoring
- Understand your baselines: Learn what "normal" looks like for each server
- Iterate on thresholds: Continuously adjust alerts based on experience
- Monitor the monitors: Ensure your monitoring infrastructure is reliable
- Document everything: Keep records of server purposes, thresholds, and changes
- Regular reviews: Periodically assess monitoring effectiveness and coverage
- Team training: Ensure team members understand how to interpret metrics
- Automation: Use monitoring data to trigger automated responses where appropriate
- Plan for growth: Consider how monitoring needs will change as you scale
- Stay proactive: Use trends to address issues before they become critical