CoreMetric: ML-Powered System Monitoring on macOS
The Problem with Traditional Monitoring
Traditional system monitors rely on hard-coded thresholds: alert when CPU exceeds 90%, warn when memory usage hits 80%, panic when disk I/O saturates. This approach has three critical flaws:
- False Positives: Video encoding legitimately uses 95%+ CPU. A data scientist's ML training regularly consumes 24GB RAM. These aren't anomalies—they're expected workload patterns.
- False Negatives: A crypto-miner using 15% CPU flies under the radar. A memory leak growing by 50MB/hour won't trigger alarms for days. Frozen background processes don't breach thresholds but still harm system health.
- Personalization Gap: A software engineer's "normal" differs drastically from a graphic designer's. Static rules can't adapt to individual machine personalities.
CoreMetric solves this by learning your machine's baseline behavior through a neural network, detecting deviations from normality rather than absolute threshold violations. It runs entirely on-device using the Apple Neural Engine, achieving <1% CPU overhead while processing metrics in real-time.
System Architecture: The Factory vs The Product
CoreMetric splits into two distinct pipelines with radically different environments:
The Factory (Python/Training)
This is where the model learns. A Python daemon collects 24+ hours of telemetry (CPU load, memory pressure, disk I/O, context switches, network activity) and trains a Reconstruction Autoencoder to compress and reconstruct "normal" system states.
┌─────────────────────────────────────────────────┐
│ Python Training Pipeline │
└─────────────────────────────────────────────────┘
psutil.cpu_percent() Raw Telemetry
psutil.virtual_memory() ───────────────► JSONL Logs
psutil.disk_io_counters() (24h+)
│
▼
┌────────────────────┐
│ Preprocessing │
│ • Normalize │ PyTorch Autoencoder
│ • Handle NaNs │ ────────────────────►
│ • Feature scale │ (MPS Training)
└────────────────────┘
│
▼
┌────────────────────────────────────────┐
│ Trained Model + Scaling Parameters │
│ (Mean/Std for normalization) │
└────────────────────────────────────────┘
│
▼
coremltools.convert()
│
▼
CoreMetric.mlpackage
(Quantized for ANE)The Product (Swift/Inference)
The macOS app embeds the trained .mlpackage and uses bare-metal Darwin APIs to collect live metrics. The model runs on the Apple Neural Engine, achieving hardware acceleration with negligible battery impact.
┌─────────────────────────────────────────────────┐
│ Swift macOS Application │
└─────────────────────────────────────────────────┘
host_statistics64() Real-time Metrics
libproc (C-Interop) ────────────────► Swift Collector
IOKit Framework (Every 1s)
│
▼
┌────────────────────┐
│ Normalize Input │
│ (using embedded │ CoreML Model
│ Mean/Std from │ ─────────────────► ANE/GPU
│ training) │ (Inference)
└────────────────────┘
│
▼
┌────────────────────────────────────────┐
│ Reconstruction Error (MSE) │
│ High error = Anomalous state │
└────────────────────────────────────────┘
│
▼
Swift Charts Dashboard
(Visual feedback)The ML Approach: Reconstruction Autoencoders
Why Not Classification?
Traditional supervised learning requires labeled examples: "This is normal, this is malware, this is a memory leak." But anomalies are rare, diverse, and evolve constantly. We'd never collect enough representative samples.
Instead, CoreMetric uses one-class learning: train exclusively on "normal" data, then flag anything the model can't reconstruct as anomalous.
Autoencoder Architecture
Input Layer (8 features)
│
▼
┌─────────┐
│ Encoder │ Linear(8 → 5) + ReLU
│ │ Linear(5 → 3) + ReLU ← Bottleneck (compressed state)
└─────────┘
│
▼
┌─────────┐
│ Decoder │ Linear(3 → 5) + ReLU
│ │ Linear(5 → 8) ← Reconstructed input
└─────────┘
│
▼
Reconstruction Loss (MSE)
│
▼
If MSE > threshold → ANOMALYInput Features (8 Dimensions)
- CPU Load Average (1m): Smoothed CPU usage over 60 seconds
- Memory Pressure: Active + Wired memory as % of total
- Swap Usage: Virtual memory paging activity
- Disk Read/Write Bytes: Per-second throughput
- Context Switches: Kernel thread switching rate (high = thrashing)
- Network Bytes Sent/Received: Per-second bandwidth
Why 8? Enough to capture system state without overwhelming the model. The bottleneck layer (3 neurons) forces the model to learn efficient compressed representations.
Training on Apple Silicon (MPS)
PyTorch natively supports Metal Performance Shaders (MPS) on M-series chips, offloading matrix operations to the GPU:
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
model = Autoencoder(input_dim=8, hidden_dim=5, latent_dim=3).to(device)
# Training loop (MSE loss, Adam optimizer)
for epoch in range(100):
for batch in dataloader:
batch = batch.to(device) # Move to GPU
reconstructed = model(batch)
loss = F.mse_loss(reconstructed, batch)
loss.backward()
optimizer.step()Result: Training on 24h of 1-second samples (~86,000 data points) takes ~2 minutes on an M1 MacBook Pro.
CoreML Conversion: From PyTorch to ANE
Quantization for Efficiency
The Apple Neural Engine (ANE) excels at low-precision arithmetic. We quantize the model from FP32 → FP16, cutting memory usage in half with negligible accuracy loss:
import coremltools as ct
# Convert PyTorch model to CoreML
traced_model = torch.jit.trace(model, example_input)
mlmodel = ct.convert(
traced_model,
inputs=[ct.TensorType(shape=(1, 8))],
compute_precision=ct.precision.FLOAT16 # Quantize to FP16
)
# Embed scaling parameters in metadata
mlmodel.user_defined_metadata['mean'] = json.dumps(mean_values.tolist())
mlmodel.user_defined_metadata['std'] = json.dumps(std_values.tolist())
mlmodel.save("CoreMetric.mlpackage")Why Embed Scaling Parameters?
The model expects normalized inputs (mean=0, std=1). By storing training-time statistics in the .mlpackage metadata, the Swift app auto-calibrates without hardcoding values:
// Swift: Extract metadata from CoreML model
let metadata = try model.model.modelDescription.metadata[MLModelMetadataKey.creatorDefinedKey]
let meanJSON = metadata?["mean"] as? String
let mean = try JSONDecoder().decode([Double].self, from: meanJSON!.data(using: .utf8)!)
// Normalize live metrics using training statistics
let normalizedInput = (rawMetrics - mean) / stdLow-Level Data Collection in Swift
Why Not Use Third-Party Libraries?
Precision matters. A monitoring tool can't introduce overhead that alters system behavior (Heisenberg's monitoring principle). We bypass high-level APIs and talk directly to the Darwin kernel.
CPU Metrics via host_statistics64
import Darwin
func getCPULoad() -> Double {
var loadInfo = host_cpu_load_info()
var count = mach_msg_type_number_t(MemoryLayout<host_cpu_load_info>.size / MemoryLayout<integer_t>.size)
let result = withUnsafeMutablePointer(to: &loadInfo) { pointer in
pointer.withMemoryRebound(to: integer_t.self, capacity: Int(count)) { intPtr in
host_statistics64(mach_host_self(), HOST_CPU_LOAD_INFO, intPtr, &count)
}
}
guard result == KERN_SUCCESS else { return 0.0 }
let user = Double(loadInfo.cpu_ticks.0)
let system = Double(loadInfo.cpu_ticks.1)
let idle = Double(loadInfo.cpu_ticks.2)
let nice = Double(loadInfo.cpu_ticks.3)
let total = user + system + idle + nice
return total > 0 ? (user + system + nice) / total : 0.0
}Memory Metrics via mach_host_self
func getMemoryPressure() -> Double {
var vmStats = vm_statistics64()
var count = mach_msg_type_number_t(MemoryLayout<vm_statistics64>.size / MemoryLayout<integer_t>.size)
let result = withUnsafeMutablePointer(to: &vmStats) { pointer in
pointer.withMemoryRebound(to: integer_t.self, capacity: Int(count)) { intPtr in
host_statistics64(mach_host_self(), HOST_VM_INFO64, intPtr, &count)
}
}
guard result == KERN_SUCCESS else { return 0.0 }
let pageSize = vm_kernel_page_size
let active = Double(vmStats.active_count) * Double(pageSize)
let wired = Double(vmStats.wire_count) * Double(pageSize)
// Get total physical memory
var size = UInt64(0)
var sizeLen = size_t(MemoryLayout<UInt64>.size)
sysctlbyname("hw.memsize", &size, &sizeLen, nil, 0)
return (active + wired) / Double(size)
}Disk I/O via IOKit
IOKit provides access to hardware statistics. We query IOBlockStorageDriver for read/write byte counts:
import IOKit
func getDiskIO() -> (readBytes: UInt64, writeBytes: UInt64) {
let matchingDict = IOServiceMatching("IOBlockStorageDriver")
var iterator: io_iterator_t = 0
guard IOServiceGetMatchingServices(kIOMainPortDefault, matchingDict, &iterator) == KERN_SUCCESS else {
return (0, 0)
}
var totalRead: UInt64 = 0
var totalWrite: UInt64 = 0
while case let entry = IOIteratorNext(iterator), entry != 0 {
if let stats = IORegistryEntryCreateCFProperty(entry, "Statistics" as CFString, kCFAllocatorDefault, 0)?.takeRetainedValue() as? [String: Any] {
totalRead += (stats["Bytes (Read)"] as? UInt64) ?? 0
totalWrite += (stats["Bytes (Write)"] as? UInt64) ?? 0
}
IOObjectRelease(entry)
}
IOObjectRelease(iterator)
return (totalRead, totalWrite)
}Real-Time Inference on ANE
CoreML Prediction Pipeline
import CoreML
class AnomalyDetector {
private let model: CoreMetric
private let mean: [Double]
private let std: [Double]
func detectAnomaly(metrics: SystemMetrics) -> (score: Double, isAnomaly: Bool) {
// 1. Normalize input using training statistics
let normalized = zip(metrics.toArray(), zip(mean, std)).map {
($0 - $1.0) / $1.1
}
// 2. Create MLMultiArray input
let input = try! MLMultiArray(shape: [1, 8], dataType: .double)
for (i, value) in normalized.enumerated() {
input[i] = NSNumber(value: value)
}
// 3. Run inference (automatically uses ANE if available)
let prediction = try! model.prediction(input: CoreMetricInput(input: input))
// 4. Calculate reconstruction error (MSE)
let reconstructed = prediction.output
let mse = zip(normalized, (0..<8).map { reconstructed[$0].doubleValue }).map {
pow($0 - $1, 2)
}.reduce(0, +) / 8.0
// 5. Compare against learned threshold (95th percentile from training)
let threshold = 0.015 // Tuned during training
return (mse, mse > threshold)
}
}ANE Acceleration Verification
CoreML automatically selects the best compute unit (ANE > GPU > CPU). Verify ANE usage via Instruments:
# Terminal: Profile app while running inference
instruments -t "Neural Engine" -D profile.trace MLMonitor.app
# Check ANE utilization in Instruments UI (should show spikes at inference time)SwiftUI Dashboard: Visualizing Anomalies
Real-Time Charts with Swift Charts
import SwiftUI
import Charts
struct AnomalyChart: View {
@State private var anomalyScores: [AnomalyPoint] = []
@State private var threshold: Double = 0.015
var body: some View {
Chart {
ForEach(anomalyScores) { point in
LineMark(
x: .value("Time", point.timestamp),
y: .value("Score", point.score)
)
.foregroundStyle(point.isAnomaly ? .red : .blue)
}
// Threshold line
RuleMark(y: .value("Threshold", threshold))
.foregroundStyle(.orange)
.lineStyle(StrokeStyle(dash: [5, 5]))
}
.chartYScale(domain: 0...0.05)
.chartXAxis {
AxisMarks(values: .stride(by: .minute))
}
}
}Anomaly Alerts
func handleAnomaly(score: Double, metrics: SystemMetrics) {
// Send macOS notification
let content = UNMutableNotificationContent()
content.title = "System Anomaly Detected"
content.body = """
Reconstruction error: \(String(format: "%.4f", score))
CPU: \(metrics.cpuLoad)% | Memory: \(metrics.memoryPressure)%
"""
content.sound = .default
let request = UNNotificationRequest(identifier: UUID().uuidString, content: content, trigger: nil)
UNUserNotificationCenter.current().add(request)
}Performance Benchmarks
Overhead Analysis
| Metric | Baseline (No Monitoring) | CoreMetric Running | Overhead |
|---|---|---|---|
| CPU Usage | 2.3% | 2.8% | 0.5% |
| Memory | 4.2 GB | 4.23 GB | 30 MB |
| Energy Impact | Low | Low | Negligible |
| Inference Latency | — | 1.2 ms | — |
Tested on: M1 MacBook Pro, macOS 14.5, 16GB RAM. Metrics collected every 1 second for 1 hour.
ANE vs GPU vs CPU Performance
| Compute Unit | Inference Time | Power Draw |
|---|---|---|
| ANE (FP16) | 1.2 ms | 0.3 W |
| GPU (FP32) | 3.8 ms | 2.1 W |
| CPU (FP32) | 12.5 ms | 4.5 W |
ANE delivers 10× faster inference with 15× lower power consumption compared to CPU.
Privacy Guarantees
- Zero Cloud Dependencies: All data processing happens on-device. No telemetry servers.
- No Process Inspection: CoreMetric only reads system-level metrics (CPU, RAM). It never inspects process names, arguments, or file paths.
- Local Storage: Training data stays in
~/Library/Application Support/CoreMetric/data/, encrypted via FileVault. - Sandboxed App: macOS App Sandbox enforces strict file access controls. CoreMetric can't access documents, photos, or other apps' data.
Real-World Anomaly Examples
Detected: Crypto-Miner
- Symptoms: Sustained 15% CPU usage during idle hours, elevated context switches
- Reconstruction Error: 0.042 (2.8× threshold)
- Why It Worked: User's baseline CPU during idle: 2-5%. A constant 15% is statistically abnormal.
Detected: Memory Leak in Electron App
- Symptoms: Memory pressure climbing from 60% → 85% over 4 hours, no corresponding disk I/O or CPU spike
- Reconstruction Error: 0.038 (2.5× threshold)
- Why It Worked: Gradual memory growth without proportional CPU/disk activity is atypical.
False Positive: Xcode Build
- Symptoms: CPU spiked to 95%, disk I/O at 200 MB/s
- Reconstruction Error: 0.011 (below threshold)
- Why It Passed: User compiles code daily. Model learned this pattern as normal.
Challenges & Solutions
1. Cold Start Problem
Issue: New machines lack training data. Model can't detect anomalies without baseline.
Solution: Pre-trained "generic macOS" model bundled with app. User-specific model replaces it after 24h of collection.
2. Non-Stationary Behavior
Issue: Usage patterns evolve (e.g., user switches from web dev to ML training). Model becomes stale.
Solution: Weekly incremental retraining with exponential decay on old data (recent 7 days weighted 80%, older data 20%).
3. Sparse Anomaly Labels
Issue: Hard to tune threshold without labeled anomalies.
Solution: Set threshold at 95th percentile of training set reconstruction errors (assumes 5% of training data contains mild anomalies).
Future Roadmap
- Process-Level Attribution: When anomaly detected, identify which process caused it (opt-in, privacy-preserving)
- Temporal Patterns: Add LSTM layer to capture time-series dependencies (e.g., daily/weekly cycles)
- Federated Learning: Aggregate anonymized model updates across users to improve detection (fully encrypted, GDPR-compliant)
- Energy Anomalies: Detect abnormal battery drain patterns using
IOPMCopySleepWakeTimelineAPI
Lessons Learned
MPS Training: Fast but Finicky
Metal Performance Shaders dramatically accelerate training on Apple Silicon, but debugging is harder than CUDA. Use torch.autograd.set_detect_anomaly(True) to catch gradient issues early.
ANE Quantization Requires Testing
FP16 quantization introduced a 2% accuracy drop initially. Solution: Re-tune threshold post-quantization using validation set.
Darwin APIs Are Underdocumented
Apple's low-level kernel APIs lack comprehensive guides. Reading XNU source code and reverse-engineering top's implementation was necessary. Key resources: XNU GitHub and man 3 host_statistics.
Privacy-First Design Builds Trust
Users immediately asked: "Does this send data to the cloud?" Clear privacy guarantees (local-only processing, sandboxing) must be front-and-center in documentation.
Conclusion
CoreMetric demonstrates how modern ML techniques (autoencoders, one-class learning) can transform system monitoring from reactive threshold-based alerts to proactive anomaly detection. By leveraging Apple's hardware acceleration (ANE, MPS) and respecting user privacy (on-device processing), it achieves the trifecta of effectiveness, efficiency, and trust.
The project is a technical exercise in bridging two ecosystems—Python's ML maturity and Swift's native macOS integration—while staying true to Apple's design principles: performance, privacy, and polish.
GitHub: egekaya1/CoreMetric · Status: Work in Progress · License: MIT