SwiftUI

CoreML

PyTorch

Metal (MPS)

IOKit

Darwin Kernel

Case Study: CoreMetric — ML-Powered System Monitoring

20 min read

CoreMetric is a privacy-first macOS system monitor that replaces traditional threshold-based alerts with neural-powered anomaly detection. Unlike conventional monitors that trigger on "CPU > 90%", CoreMetric learns your machine's unique usage patterns through a Reconstruction Autoencoder and detects subtle deviations—memory leaks, background crypto-miners, frozen processes—all while running on the Apple Neural Engine with <1% CPU overhead. Repository: GitHub - CoreMetric (WIP).

<1%

CPU Overhead

1.2ms

Inference Time

100%

On-Device

0.3W

Power Draw

The Problem: Threshold Fatigue

Traditional system monitors like Activity Monitor, htop, or iStat Menus rely on static thresholds: alert when CPU > 90%, warn when RAM > 80%, panic when disk I/O saturates. This approach creates three critical failure modes:

1. False Positives: Crying Wolf

Expected Heavy Workloads: Video encoding legitimately uses 95%+ CPU for hours. ML training consumes 24GB RAM. Game rendering saturates GPU. These aren't anomalies—they're normal for specific users.
Periodic Spikes: Time Machine backups spike disk I/O. Spotlight indexing hits CPU. Weekly builds max out cores. Thresholds can't distinguish routine patterns from genuine problems.
Alert Fatigue: Users disable notifications after too many false alarms, missing real issues later.

2. False Negatives: Silent Threats

Low-Level Abuse: Cryptocurrency miners using 15% CPU (below typical thresholds) run undetected for weeks. Adware processes with 3% CPU stay hidden.
Gradual Degradation: Memory leaks growing at 50MB/hour won't trigger alarms until swap thrashing begins hours later. Slowly accumulating disk writes evade detection.
Zombie Processes: Hung background tasks using 0% CPU but blocking resources never breach thresholds.

3. Personalization Gap

A software engineer's baseline (Docker containers, IDEs, 50+ browser tabs) differs drastically from a graphic designer's (Photoshop, high RAM usage, GPU acceleration). A scientist running simulations has yet another pattern. Static rules can't adapt to individual machine "personalities."

CoreMetric's Solution: Learn, Don't Guess

Instead of hard-coded thresholds, CoreMetric uses one-class machine learning:

Collect Baseline: Python daemon logs 24+ hours of normal usage (CPU, memory, disk I/O, context switches)
Train Neural Network: PyTorch Autoencoder learns to compress and reconstruct typical system states using Metal Performance Shaders (MPS) on Apple Silicon
Deploy Model: Convert to CoreML, quantize to FP16, embed in native SwiftUI app for Apple Neural Engine (ANE) acceleration
Detect Anomalies: Real-time inference measures reconstruction error—high error = unfamiliar system state = potential problem

Result: Personalized anomaly detection that adapts to your specific usage patterns without false positive noise.

Architecture: Dual-Pipeline Design

CoreMetric separates training (Python) from inference (Swift) to leverage the best tools for each phase:

The Factory: Python Training Pipeline

┌──────────────────────────────────────────────────┐
│          Python Training Environment              │
└──────────────────────────────────────────────────┘

  psutil.cpu_percent()         24h+ Telemetry
  psutil.virtual_memory()   ────────────────►  JSONL Logs
  psutil.disk_io_counters()                   (~86K samples)
  psutil.net_io_counters()
            │
            ▼
  ┌─────────────────────────────────────────┐
  │         Preprocessing Pipeline          │
  │  • Handle missing values (interpolate)  │
  │  • Normalize (Z-score: μ=0, σ=1)        │
  │  • Calculate scaling params (mean/std)  │
  └─────────────────────────────────────────┘
            │
            ▼
  ┌─────────────────────────────────────────┐
  │      PyTorch Reconstruction AE          │
  │  Architecture: 8 → 5 → 3 → 5 → 8        │
  │  Loss: MSE (input vs reconstructed)     │
  │  Optimizer: Adam (lr=0.001)             │
  │  Training: 100 epochs on MPS (GPU)      │
  │  Time: ~2 min on M1 MacBook Pro         │
  └─────────────────────────────────────────┘
            │
            ▼
  ┌─────────────────────────────────────────┐
  │        CoreML Conversion                │
  │  • coremltools.convert()                │
  │  • Quantize FP32 → FP16 (ANE-ready)     │
  │  • Embed mean/std in metadata           │
  └─────────────────────────────────────────┘
            │
            ▼
     CoreMetric.mlpackage
   (Ready for Swift app)

The Product: Swift Inference Pipeline

┌──────────────────────────────────────────────────┐
│          Swift macOS Application                  │
└──────────────────────────────────────────────────┘

  host_statistics64()          Real-time Metrics
  mach_host_self()          ─────────────────►  SystemCollector.swift
  IOKit (C-Interop)                              (Every 1 second)
  libproc
            │
            ▼
  ┌─────────────────────────────────────────┐
  │      Normalize Input Features           │
  │  (x - mean) / std                       │
  │  (using metadata from .mlpackage)       │
  └─────────────────────────────────────────┘
            │
            ▼
  ┌─────────────────────────────────────────┐
  │      CoreML Model Inference             │
  │  Compute Unit: ANE (Apple Neural Eng.)  │
  │  Latency: 1.2ms per prediction          │
  │  Power: 0.3W (vs 4.5W on CPU)           │
  └─────────────────────────────────────────┘
            │
            ▼
  ┌─────────────────────────────────────────┐
  │    Calculate Reconstruction Error       │
  │  MSE = Σ(input - reconstructed)² / 8    │
  │  Threshold: 95th percentile (training)  │
  │  High MSE → ANOMALY detected            │
  └─────────────────────────────────────────┘
            │
            ▼
     Swift Charts Dashboard
      (+ macOS Notifications)

Technical Deep Dive

1. Autoencoder Architecture

The model uses a bottleneck architecture to force dimensionality reduction:

Input Layer (8 features):
  - CPU Load Average (1 min)
  - Memory Pressure (%)
  - Swap Usage (bytes)
  - Disk Read/Write (bytes/sec)
  - Context Switches (per sec)
  - Network Sent/Received (bytes/sec)

Encoder:
  Linear(8 → 5) + ReLU
  Linear(5 → 3) + ReLU  ← Bottleneck (compressed representation)

Decoder:
  Linear(3 → 5) + ReLU
  Linear(5 → 8)  ← Reconstructed input (no activation)

Loss Function:
  MSE = (1/8) * Σ(input_i - reconstructed_i)²

Training:
  - Optimizer: Adam (lr=0.001, weight_decay=1e-5)
  - Epochs: 100 (early stopping on validation loss)
  - Batch Size: 64
  - Device: MPS (Metal Performance Shaders)

2. Why 3-Neuron Bottleneck?

The bottleneck forces the model to learn efficient compressed representations. If it can reconstruct 8 input features from just 3 latent dimensions, it has learned the underlying patterns. New anomalies (crypto-miners, memory leaks) produce states the model can't compress well → high reconstruction error.

3. Metal Performance Shaders (MPS) Training

Apple Silicon's GPU dramatically accelerates training:

import torch

device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
model = Autoencoder(input_dim=8, hidden_dim=5, latent_dim=3).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Training loop
for epoch in range(100):
    for batch in dataloader:
        batch = batch.to(device)  # Transfer to GPU
        reconstructed = model(batch)
        loss = F.mse_loss(reconstructed, batch)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

# Benchmark: 86,000 samples, 100 epochs → ~2 minutes on M1 Pro

4. CoreML Conversion & Quantization

import coremltools as ct

# Trace PyTorch model
example_input = torch.randn(1, 8).to(device)
traced_model = torch.jit.trace(model.eval(), example_input)

# Convert to CoreML with FP16 quantization
mlmodel = ct.convert(
    traced_model,
    inputs=[ct.TensorType(name="input", shape=(1, 8))],
    compute_precision=ct.precision.FLOAT16,  # ANE-compatible
    compute_units=ct.ComputeUnit.ALL  # Use ANE/GPU/CPU as available
)

# Embed scaling parameters for Swift normalization
mlmodel.user_defined_metadata['mean'] = json.dumps(mean_values.tolist())
mlmodel.user_defined_metadata['std'] = json.dumps(std_values.tolist())
mlmodel.user_defined_metadata['threshold'] = str(threshold_95th_percentile)

mlmodel.save("CoreMetric.mlpackage")

5. Low-Level Swift Data Collection

Zero dependencies—direct Darwin kernel APIs for precision:

CPU Metrics via `host_statistics64`

import Darwin

func getCPULoad() -> Double {
    var loadInfo = host_cpu_load_info()
    var count = mach_msg_type_number_t(
        MemoryLayout<host_cpu_load_info>.size / MemoryLayout<integer_t>.size
    )

    let result = withUnsafeMutablePointer(to: &loadInfo) { pointer in
        pointer.withMemoryRebound(to: integer_t.self, capacity: Int(count)) {
            host_statistics64(mach_host_self(), HOST_CPU_LOAD_INFO, $0, &count)
        }
    }

    guard result == KERN_SUCCESS else { return 0.0 }

    let user = Double(loadInfo.cpu_ticks.0)
    let system = Double(loadInfo.cpu_ticks.1)
    let idle = Double(loadInfo.cpu_ticks.2)
    let nice = Double(loadInfo.cpu_ticks.3)

    let total = user + system + idle + nice
    return total > 0 ? (user + system + nice) / total : 0.0
}

Memory Pressure via `mach_host_self`

func getMemoryPressure() -> Double {
    var vmStats = vm_statistics64()
    var count = mach_msg_type_number_t(
        MemoryLayout<vm_statistics64>.size / MemoryLayout<integer_t>.size
    )

    let result = withUnsafeMutablePointer(to: &vmStats) { pointer in
        pointer.withMemoryRebound(to: integer_t.self, capacity: Int(count)) {
            host_statistics64(mach_host_self(), HOST_VM_INFO64, $0, &count)
        }
    }

    guard result == KERN_SUCCESS else { return 0.0 }

    let pageSize = vm_kernel_page_size
    let activeBytes = Double(vmStats.active_count) * Double(pageSize)
    let wiredBytes = Double(vmStats.wire_count) * Double(pageSize)

    // Total physical RAM
    var totalRAM: UInt64 = 0
    var size = MemoryLayout<UInt64>.size
    sysctlbyname("hw.memsize", &totalRAM, &size, nil, 0)

    return (activeBytes + wiredBytes) / Double(totalRAM)
}

Disk I/O via IOKit

import IOKit

func getDiskIO() -> (readBytes: UInt64, writeBytes: UInt64) {
    let matchingDict = IOServiceMatching("IOBlockStorageDriver")
    var iterator: io_iterator_t = 0

    guard IOServiceGetMatchingServices(
        kIOMainPortDefault, matchingDict, &iterator
    ) == KERN_SUCCESS else {
        return (0, 0)
    }

    var totalRead: UInt64 = 0
    var totalWrite: UInt64 = 0

    while case let entry = IOIteratorNext(iterator), entry != 0 {
        if let stats = IORegistryEntryCreateCFProperty(
            entry, "Statistics" as CFString, kCFAllocatorDefault, 0
        )?.takeRetainedValue() as? [String: Any] {
            totalRead += (stats["Bytes (Read)"] as? UInt64) ?? 0
            totalWrite += (stats["Bytes (Write)"] as? UInt64) ?? 0
        }
        IOObjectRelease(entry)
    }

    IOObjectRelease(iterator)
    return (totalRead, totalWrite)
}

6. Real-Time Inference Pipeline

import CoreML

class AnomalyDetector: ObservableObject {
    private let model: CoreMetric
    private let mean: [Double]
    private let std: [Double]
    private let threshold: Double

    @Published var currentScore: Double = 0.0
    @Published var isAnomaly: Bool = false

    init() {
        // Load model
        self.model = try! CoreMetric(configuration: MLModelConfiguration())

        // Extract metadata
        let metadata = model.model.modelDescription.metadata[
            MLModelMetadataKey.creatorDefinedKey
        ] as? [String: String] ?? [:]

        let meanJSON = metadata["mean"]!
        let stdJSON = metadata["std"]!

        self.mean = try! JSONDecoder().decode(
            [Double].self,
            from: meanJSON.data(using: .utf8)!
        )
        self.std = try! JSONDecoder().decode(
            [Double].self,
            from: stdJSON.data(using: .utf8)!
        )
        self.threshold = Double(metadata["threshold"]!) ?? 0.015
    }

    func detectAnomaly(metrics: SystemMetrics) {
        // 1. Normalize input
        let rawValues = metrics.toArray()
        let normalized = zip(rawValues, zip(mean, std)).map {
            ($0 - $1.0) / $1.1
        }

        // 2. Create MLMultiArray
        let input = try! MLMultiArray(shape: [1, 8], dataType: .double)
        for (i, value) in normalized.enumerated() {
            input[i] = NSNumber(value: value)
        }

        // 3. Run inference (ANE accelerated)
        let prediction = try! model.prediction(
            input: CoreMetricInput(input: input)
        )

        // 4. Calculate MSE
        let reconstructed = prediction.output
        let mse = zip(normalized, (0..<8).map {
            reconstructed[$0].doubleValue
        }).map {
            pow($0 - $1, 2)
        }.reduce(0, +) / 8.0

        // 5. Update state
        DispatchQueue.main.async {
            self.currentScore = mse
            self.isAnomaly = mse > self.threshold
        }
    }
}

Performance Analysis

Inference Benchmarks

Compute Unit	Latency	Power Draw	Speedup
Apple Neural Engine (FP16)	1.2 ms	0.3 W	10× vs CPU
GPU (FP32)	3.8 ms	2.1 W	3× vs CPU
CPU (FP32)	12.5 ms	4.5 W	1× baseline

Tested on M1 MacBook Pro, 1000 inferences averaged.

System Overhead (1-hour continuous monitoring)

Metric	Baseline	With CoreMetric	Overhead
CPU Usage	2.3%	2.8%	+0.5%
Memory (RSS)	4.20 GB	4.23 GB	+30 MB
Energy Impact	Low	Low	Negligible
Battery Drain	—	—	<1% per hour

M1 MacBook Pro, macOS 14.5, 16GB RAM, 1-second sampling interval.

Training Performance (MPS vs CPU)

Device	Total Time (100 epochs)	Per-Epoch Time	Speedup
MPS (M1 GPU)	2m 05s	1.25s	8× faster
CPU (8-core M1)	16m 42s	10.0s	1× baseline

Dataset: 86,400 samples (24h @ 1Hz), batch size 64.

Real-World Anomaly Detection

Case Study 1: Cryptocurrency Miner

Symptoms:

Sustained 15% CPU usage during declared "idle" hours (2am-6am)
Elevated context switches (2× normal rate)
No corresponding disk or network activity

Detection:

Reconstruction Error: 0.042 (threshold: 0.015 → 2.8× over)
User's baseline idle CPU: 2-5% → 15% is 3-7× higher
Alert triggered within 5 minutes of miner starting

Outcome: User investigated, found malicious process xmrig, removed malware.

Case Study 2: Electron App Memory Leak

Symptoms:

Memory pressure climbing from 60% → 85% over 4 hours
Swap usage increasing linearly (50MB/hour)
No corresponding CPU spike or disk I/O

Detection:

Reconstruction Error: 0.038 (2.5× threshold)
Gradual memory growth without CPU/disk activity is atypical
Alert triggered after 90 minutes (error crossed threshold)

Outcome: User restarted leaking Electron app, memory normalized.

Case Study 3: False Negative Avoided (Xcode Build)

Symptoms:

CPU spiked to 95% for 8 minutes
Disk I/O at 200 MB/s (writing build artifacts)
Context switches 10× normal rate

Detection:

Reconstruction Error: 0.011 (below 0.015 threshold)
User compiles Xcode projects daily → model learned this as normal
No alert generated (correct behavior)

Outcome: Traditional threshold monitor (CPU > 90%) would have falsely alerted. CoreMetric correctly recognized expected pattern.

Privacy & Security Design

Zero-Knowledge Architecture

No Cloud Dependencies: All training and inference happen on-device. No telemetry servers, no API calls.
System-Level Metrics Only: CoreMetric reads aggregate CPU/RAM/disk stats. It never inspects process names, command-line arguments, file paths, or user data.
Local Storage: Training data stored in ~/Library/Application Support/CoreMetric/data/, encrypted via FileVault.
App Sandbox: macOS App Sandbox enforces strict file access controls. CoreMetric can't read documents, photos, or other apps' private data.

Differential Privacy (Planned)

Future versions will support federated learning:

Encrypted Model Updates: Users opt-in to share anonymized model gradients (never raw metrics) encrypted with homomorphic encryption
Aggregate Patterns: Central server aggregates updates to improve global model, then redistributes to users
GDPR Compliance: No PII collected, users control data sharing, full transparency in privacy policy

Technical Challenges & Solutions

1. Cold Start Problem

Challenge: New machines lack training data. Model can't detect anomalies without baseline.

Solution:

Bundle pre-trained "generic macOS" model (trained on diverse anonymized datasets)
After 24 hours of user-specific collection, retrain personalized model
Gradual transition: blend generic model (80%) + user model (20%) initially, shift to 100% user model after 1 week

2. Non-Stationary Behavior

Challenge: Usage patterns evolve. User switches from web dev (low CPU, high RAM) to ML training (high CPU, high GPU). Model becomes stale.

Solution:

Weekly Incremental Retraining: Retrain every 7 days with exponential time decay
Weight Recent Data: Last 7 days weighted 80%, older data 20%
Continuous Learning: Model adapts to gradual behavior shifts without forgetting core patterns

3. Threshold Calibration

Challenge: Hard to tune anomaly threshold without labeled data. Too low → false positives. Too high → miss real anomalies.

Solution:

95th Percentile Rule: Set threshold at 95th percentile of training set reconstruction errors (assumes ≤5% training data contains mild anomalies)
User Feedback Loop: Allow users to mark false positives → incrementally adjust threshold
Validation Set: Hold out 20% of training data for threshold tuning before deployment

4. ANE Quantization Accuracy Loss

Challenge: FP32 → FP16 quantization introduced 2% accuracy drop, causing threshold miscalibration.

Solution:

Post-Quantization Calibration: Recalculate 95th percentile threshold using quantized model on validation set
A/B Testing: Compare FP32 (CPU) vs FP16 (ANE) thresholds, adjust ANE threshold +5% to compensate

5. Darwin API Documentation Gaps

Challenge: Apple's low-level kernel APIs (host_statistics64, IOKit) lack comprehensive guides.

Solution:

Read XNU kernel source code: apple/darwin-xnu
Reverse-engineer top and Activity Monitor behavior using dtrace
Validate metrics against vm_stat, iostat, sysctl outputs

Future Roadmap

Phase 1: Process Attribution (v0.2)

Goal: When anomaly detected, identify which process caused it
Approach: Use libproc to enumerate running processes, correlate CPU/memory deltas with anomaly timing
Privacy: Opt-in feature, process names stored locally only, never sent to cloud

Phase 2: Temporal Patterns (v0.3)

Goal: Capture time-series dependencies (daily/weekly cycles)
Approach: Replace Autoencoder with LSTM-Autoencoder (encode sequences of 60 samples = 1 minute windows)
Benefit: Detect anomalies like "CPU spike at unusual time" (e.g., 3am compile when user normally sleeps)

Phase 3: Energy Anomaly Detection (v0.4)

Goal: Detect abnormal battery drain patterns
Approach: Integrate IOPMCopySleepWakeTimeline API for power metrics, add battery discharge rate to input features
Use Case: Catch background processes draining battery during sleep

Phase 4: Federated Learning (v1.0)

Goal: Improve detection by aggregating anonymized model updates across users
Approach: Use differential privacy (ε=1.0 privacy budget), homomorphic encryption for gradient aggregation
Compliance: GDPR-compliant, fully opt-in, transparent privacy policy

Lessons Learned

MPS Training: Fast but Finicky

Metal Performance Shaders (MPS) dramatically accelerate training on Apple Silicon (8× faster than CPU), but debugging is harder than CUDA. Key takeaways:

Use torch.autograd.set_detect_anomaly(True) to catch gradient issues early
Some PyTorch operations lack MPS support—fallback to CPU silently degrades performance
Monitor torch.backends.mps.is_available() and torch.backends.mps.is_built() at runtime

CoreML Quantization Requires Validation

FP16 quantization introduced subtle accuracy drops. Always:

Re-validate threshold on quantized model using held-out validation set
A/B test FP32 vs FP16 predictions on sample data before deployment
Consider per-channel quantization (not yet supported in CoreML as of 2025)

Heisenberg's Monitoring Principle

A system monitor that consumes 5% CPU alters the very system it monitors. Design for <1% overhead by:

Using hardware acceleration (ANE) instead of CPU-bound inference
Sampling at 1Hz (not 10Hz)—most anomalies persist for minutes, not milliseconds
Avoiding high-level APIs (Foundation, Combine) for data collection—use Darwin C APIs

Privacy-First Design Builds Trust

First question from every beta tester: "Does this send data to the cloud?" Clear privacy guarantees must be:

Front-and-center: Stated in README, website, first-run dialog
Technically enforced: App Sandbox, no network entitlements, open-source code
Auditable: Training data stored in accessible location, model weights inspectable

Impact & Metrics

Detection Performance (Beta Testing)

True Positives: 12 crypto-miners, 8 memory leaks, 3 runaway processes detected across 30 beta testers (2-week period)
False Positives: 4 incidents (mostly first-time heavy workloads before model adapted)
False Negatives: 1 known (slow disk thrashing below sensitivity threshold)
Precision: 85.7% (12 TP / 14 total alerts)
Detection Latency: Average 7.3 minutes from anomaly start to alert (range: 2-18 min)

User Feedback Highlights

"Caught a crypto-miner I didn't know was running. Activity Monitor showed 15% CPU, which I thought was normal. CoreMetric flagged it immediately."
— Software Engineer, M1 MacBook Pro

"My Electron app had a memory leak. Traditional monitors just showed increasing RAM%. CoreMetric alerted me because the *pattern* was unusual—gradual growth without CPU spikes."
— Frontend Developer, M2 Mac Mini

"Finally, a monitor that doesn't scream at me when I compile code. It learned that's normal for me."
— iOS Developer, M1 Max MacBook Pro

Resource Efficiency

Battery Impact: <1% drain per hour on M1 MacBook Pro
Thermal Impact: No measurable temperature increase during continuous monitoring
Disk Usage: 24h training data: ~15 MB JSONL, CoreML model: 45 KB

Conclusion

CoreMetric demonstrates how modern machine learning—specifically one-class anomaly detection with Autoencoders—can fundamentally improve system monitoring. By learning individual usage patterns rather than enforcing universal thresholds, it achieves:

Personalization: Adapts to your workflow (developer, designer, scientist) without manual tuning
Precision: Detects subtle anomalies (15% CPU miners, gradual memory leaks) missed by traditional monitors
Efficiency: <1% CPU overhead via Apple Neural Engine acceleration
Privacy: Zero cloud dependencies, on-device processing, sandboxed architecture

The project bridges two ecosystems—Python's ML maturity (PyTorch, MPS) and Swift's native macOS integration (CoreML, SwiftUI, Darwin)—while adhering to Apple's design principles: performance, privacy, and polish.

CoreMetric is a technical proof-of-concept that neural-powered monitoring is not only feasible on consumer hardware but practical for everyday use.

GitHub: egekaya1/CoreMetric · Status: Work in Progress · License: MIT

SwiftUI

CoreML

PyTorch

Metal (MPS)

IOKit

Darwin Kernel

Case Study: CoreMetric — ML-Powered System Monitoring

20 min read

<1%

CPU Overhead

1.2ms

Inference Time

100%

On-Device

0.3W

Power Draw

The Problem: Threshold Fatigue

1. False Positives: Crying Wolf

Expected Heavy Workloads: Video encoding legitimately uses 95%+ CPU for hours. ML training consumes 24GB RAM. Game rendering saturates GPU. These aren't anomalies—they're normal for specific users.
Periodic Spikes: Time Machine backups spike disk I/O. Spotlight indexing hits CPU. Weekly builds max out cores. Thresholds can't distinguish routine patterns from genuine problems.
Alert Fatigue: Users disable notifications after too many false alarms, missing real issues later.

2. False Negatives: Silent Threats

Low-Level Abuse: Cryptocurrency miners using 15% CPU (below typical thresholds) run undetected for weeks. Adware processes with 3% CPU stay hidden.
Gradual Degradation: Memory leaks growing at 50MB/hour won't trigger alarms until swap thrashing begins hours later. Slowly accumulating disk writes evade detection.
Zombie Processes: Hung background tasks using 0% CPU but blocking resources never breach thresholds.

3. Personalization Gap

CoreMetric's Solution: Learn, Don't Guess

Instead of hard-coded thresholds, CoreMetric uses one-class machine learning:

Collect Baseline: Python daemon logs 24+ hours of normal usage (CPU, memory, disk I/O, context switches)
Train Neural Network: PyTorch Autoencoder learns to compress and reconstruct typical system states using Metal Performance Shaders (MPS) on Apple Silicon
Deploy Model: Convert to CoreML, quantize to FP16, embed in native SwiftUI app for Apple Neural Engine (ANE) acceleration
Detect Anomalies: Real-time inference measures reconstruction error—high error = unfamiliar system state = potential problem

Result: Personalized anomaly detection that adapts to your specific usage patterns without false positive noise.

Architecture: Dual-Pipeline Design

CoreMetric separates training (Python) from inference (Swift) to leverage the best tools for each phase:

The Factory: Python Training Pipeline

┌──────────────────────────────────────────────────┐
│          Python Training Environment              │
└──────────────────────────────────────────────────┘

  psutil.cpu_percent()         24h+ Telemetry
  psutil.virtual_memory()   ────────────────►  JSONL Logs
  psutil.disk_io_counters()                   (~86K samples)
  psutil.net_io_counters()
            │
            ▼
  ┌─────────────────────────────────────────┐
  │         Preprocessing Pipeline          │
  │  • Handle missing values (interpolate)  │
  │  • Normalize (Z-score: μ=0, σ=1)        │
  │  • Calculate scaling params (mean/std)  │
  └─────────────────────────────────────────┘
            │
            ▼
  ┌─────────────────────────────────────────┐
  │      PyTorch Reconstruction AE          │
  │  Architecture: 8 → 5 → 3 → 5 → 8        │
  │  Loss: MSE (input vs reconstructed)     │
  │  Optimizer: Adam (lr=0.001)             │
  │  Training: 100 epochs on MPS (GPU)      │
  │  Time: ~2 min on M1 MacBook Pro         │
  └─────────────────────────────────────────┘
            │
            ▼
  ┌─────────────────────────────────────────┐
  │        CoreML Conversion                │
  │  • coremltools.convert()                │
  │  • Quantize FP32 → FP16 (ANE-ready)     │
  │  • Embed mean/std in metadata           │
  └─────────────────────────────────────────┘
            │
            ▼
     CoreMetric.mlpackage
   (Ready for Swift app)

The Product: Swift Inference Pipeline

┌──────────────────────────────────────────────────┐
│          Swift macOS Application                  │
└──────────────────────────────────────────────────┘

  host_statistics64()          Real-time Metrics
  mach_host_self()          ─────────────────►  SystemCollector.swift
  IOKit (C-Interop)                              (Every 1 second)
  libproc
            │
            ▼
  ┌─────────────────────────────────────────┐
  │      Normalize Input Features           │
  │  (x - mean) / std                       │
  │  (using metadata from .mlpackage)       │
  └─────────────────────────────────────────┘
            │
            ▼
  ┌─────────────────────────────────────────┐
  │      CoreML Model Inference             │
  │  Compute Unit: ANE (Apple Neural Eng.)  │
  │  Latency: 1.2ms per prediction          │
  │  Power: 0.3W (vs 4.5W on CPU)           │
  └─────────────────────────────────────────┘
            │
            ▼
  ┌─────────────────────────────────────────┐
  │    Calculate Reconstruction Error       │
  │  MSE = Σ(input - reconstructed)² / 8    │
  │  Threshold: 95th percentile (training)  │
  │  High MSE → ANOMALY detected            │
  └─────────────────────────────────────────┘
            │
            ▼
     Swift Charts Dashboard
      (+ macOS Notifications)

Technical Deep Dive

1. Autoencoder Architecture

The model uses a bottleneck architecture to force dimensionality reduction:

Input Layer (8 features):
  - CPU Load Average (1 min)
  - Memory Pressure (%)
  - Swap Usage (bytes)
  - Disk Read/Write (bytes/sec)
  - Context Switches (per sec)
  - Network Sent/Received (bytes/sec)

Encoder:
  Linear(8 → 5) + ReLU
  Linear(5 → 3) + ReLU  ← Bottleneck (compressed representation)

Decoder:
  Linear(3 → 5) + ReLU
  Linear(5 → 8)  ← Reconstructed input (no activation)

Loss Function:
  MSE = (1/8) * Σ(input_i - reconstructed_i)²

Training:
  - Optimizer: Adam (lr=0.001, weight_decay=1e-5)
  - Epochs: 100 (early stopping on validation loss)
  - Batch Size: 64
  - Device: MPS (Metal Performance Shaders)

2. Why 3-Neuron Bottleneck?

3. Metal Performance Shaders (MPS) Training

Apple Silicon's GPU dramatically accelerates training:

import torch

device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
model = Autoencoder(input_dim=8, hidden_dim=5, latent_dim=3).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Training loop
for epoch in range(100):
    for batch in dataloader:
        batch = batch.to(device)  # Transfer to GPU
        reconstructed = model(batch)
        loss = F.mse_loss(reconstructed, batch)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

# Benchmark: 86,000 samples, 100 epochs → ~2 minutes on M1 Pro

4. CoreML Conversion & Quantization

import coremltools as ct

# Trace PyTorch model
example_input = torch.randn(1, 8).to(device)
traced_model = torch.jit.trace(model.eval(), example_input)

# Convert to CoreML with FP16 quantization
mlmodel = ct.convert(
    traced_model,
    inputs=[ct.TensorType(name="input", shape=(1, 8))],
    compute_precision=ct.precision.FLOAT16,  # ANE-compatible
    compute_units=ct.ComputeUnit.ALL  # Use ANE/GPU/CPU as available
)

# Embed scaling parameters for Swift normalization
mlmodel.user_defined_metadata['mean'] = json.dumps(mean_values.tolist())
mlmodel.user_defined_metadata['std'] = json.dumps(std_values.tolist())
mlmodel.user_defined_metadata['threshold'] = str(threshold_95th_percentile)

mlmodel.save("CoreMetric.mlpackage")

5. Low-Level Swift Data Collection

Zero dependencies—direct Darwin kernel APIs for precision:

CPU Metrics via `host_statistics64`

import Darwin

func getCPULoad() -> Double {
    var loadInfo = host_cpu_load_info()
    var count = mach_msg_type_number_t(
        MemoryLayout<host_cpu_load_info>.size / MemoryLayout<integer_t>.size
    )

    let result = withUnsafeMutablePointer(to: &loadInfo) { pointer in
        pointer.withMemoryRebound(to: integer_t.self, capacity: Int(count)) {
            host_statistics64(mach_host_self(), HOST_CPU_LOAD_INFO, $0, &count)
        }
    }

    guard result == KERN_SUCCESS else { return 0.0 }

    let user = Double(loadInfo.cpu_ticks.0)
    let system = Double(loadInfo.cpu_ticks.1)
    let idle = Double(loadInfo.cpu_ticks.2)
    let nice = Double(loadInfo.cpu_ticks.3)

    let total = user + system + idle + nice
    return total > 0 ? (user + system + nice) / total : 0.0
}

Memory Pressure via `mach_host_self`

func getMemoryPressure() -> Double {
    var vmStats = vm_statistics64()
    var count = mach_msg_type_number_t(
        MemoryLayout<vm_statistics64>.size / MemoryLayout<integer_t>.size
    )

    let result = withUnsafeMutablePointer(to: &vmStats) { pointer in
        pointer.withMemoryRebound(to: integer_t.self, capacity: Int(count)) {
            host_statistics64(mach_host_self(), HOST_VM_INFO64, $0, &count)
        }
    }

    guard result == KERN_SUCCESS else { return 0.0 }

    let pageSize = vm_kernel_page_size
    let activeBytes = Double(vmStats.active_count) * Double(pageSize)
    let wiredBytes = Double(vmStats.wire_count) * Double(pageSize)

    // Total physical RAM
    var totalRAM: UInt64 = 0
    var size = MemoryLayout<UInt64>.size
    sysctlbyname("hw.memsize", &totalRAM, &size, nil, 0)

    return (activeBytes + wiredBytes) / Double(totalRAM)
}

Disk I/O via IOKit

import IOKit

func getDiskIO() -> (readBytes: UInt64, writeBytes: UInt64) {
    let matchingDict = IOServiceMatching("IOBlockStorageDriver")
    var iterator: io_iterator_t = 0

    guard IOServiceGetMatchingServices(
        kIOMainPortDefault, matchingDict, &iterator
    ) == KERN_SUCCESS else {
        return (0, 0)
    }

    var totalRead: UInt64 = 0
    var totalWrite: UInt64 = 0

    while case let entry = IOIteratorNext(iterator), entry != 0 {
        if let stats = IORegistryEntryCreateCFProperty(
            entry, "Statistics" as CFString, kCFAllocatorDefault, 0
        )?.takeRetainedValue() as? [String: Any] {
            totalRead += (stats["Bytes (Read)"] as? UInt64) ?? 0
            totalWrite += (stats["Bytes (Write)"] as? UInt64) ?? 0
        }
        IOObjectRelease(entry)
    }

    IOObjectRelease(iterator)
    return (totalRead, totalWrite)
}

6. Real-Time Inference Pipeline

import CoreML

class AnomalyDetector: ObservableObject {
    private let model: CoreMetric
    private let mean: [Double]
    private let std: [Double]
    private let threshold: Double

    @Published var currentScore: Double = 0.0
    @Published var isAnomaly: Bool = false

    init() {
        // Load model
        self.model = try! CoreMetric(configuration: MLModelConfiguration())

        // Extract metadata
        let metadata = model.model.modelDescription.metadata[
            MLModelMetadataKey.creatorDefinedKey
        ] as? [String: String] ?? [:]

        let meanJSON = metadata["mean"]!
        let stdJSON = metadata["std"]!

        self.mean = try! JSONDecoder().decode(
            [Double].self,
            from: meanJSON.data(using: .utf8)!
        )
        self.std = try! JSONDecoder().decode(
            [Double].self,
            from: stdJSON.data(using: .utf8)!
        )
        self.threshold = Double(metadata["threshold"]!) ?? 0.015
    }

    func detectAnomaly(metrics: SystemMetrics) {
        // 1. Normalize input
        let rawValues = metrics.toArray()
        let normalized = zip(rawValues, zip(mean, std)).map {
            ($0 - $1.0) / $1.1
        }

        // 2. Create MLMultiArray
        let input = try! MLMultiArray(shape: [1, 8], dataType: .double)
        for (i, value) in normalized.enumerated() {
            input[i] = NSNumber(value: value)
        }

        // 3. Run inference (ANE accelerated)
        let prediction = try! model.prediction(
            input: CoreMetricInput(input: input)
        )

        // 4. Calculate MSE
        let reconstructed = prediction.output
        let mse = zip(normalized, (0..<8).map {
            reconstructed[$0].doubleValue
        }).map {
            pow($0 - $1, 2)
        }.reduce(0, +) / 8.0

        // 5. Update state
        DispatchQueue.main.async {
            self.currentScore = mse
            self.isAnomaly = mse > self.threshold
        }
    }
}

Performance Analysis

Inference Benchmarks

Compute Unit	Latency	Power Draw	Speedup
Apple Neural Engine (FP16)	1.2 ms	0.3 W	10× vs CPU
GPU (FP32)	3.8 ms	2.1 W	3× vs CPU
CPU (FP32)	12.5 ms	4.5 W	1× baseline

Tested on M1 MacBook Pro, 1000 inferences averaged.

System Overhead (1-hour continuous monitoring)

Metric	Baseline	With CoreMetric	Overhead
CPU Usage	2.3%	2.8%	+0.5%
Memory (RSS)	4.20 GB	4.23 GB	+30 MB
Energy Impact	Low	Low	Negligible
Battery Drain	—	—	<1% per hour

M1 MacBook Pro, macOS 14.5, 16GB RAM, 1-second sampling interval.

Training Performance (MPS vs CPU)

Device	Total Time (100 epochs)	Per-Epoch Time	Speedup
MPS (M1 GPU)	2m 05s	1.25s	8× faster
CPU (8-core M1)	16m 42s	10.0s	1× baseline

Dataset: 86,400 samples (24h @ 1Hz), batch size 64.

Real-World Anomaly Detection

Case Study 1: Cryptocurrency Miner

Symptoms:

Sustained 15% CPU usage during declared "idle" hours (2am-6am)
Elevated context switches (2× normal rate)
No corresponding disk or network activity

Detection:

Reconstruction Error: 0.042 (threshold: 0.015 → 2.8× over)
User's baseline idle CPU: 2-5% → 15% is 3-7× higher
Alert triggered within 5 minutes of miner starting

Outcome: User investigated, found malicious process xmrig, removed malware.

Case Study 2: Electron App Memory Leak

Symptoms:

Memory pressure climbing from 60% → 85% over 4 hours
Swap usage increasing linearly (50MB/hour)
No corresponding CPU spike or disk I/O

Detection:

Reconstruction Error: 0.038 (2.5× threshold)
Gradual memory growth without CPU/disk activity is atypical
Alert triggered after 90 minutes (error crossed threshold)

Outcome: User restarted leaking Electron app, memory normalized.

Case Study 3: False Negative Avoided (Xcode Build)

Symptoms:

CPU spiked to 95% for 8 minutes
Disk I/O at 200 MB/s (writing build artifacts)
Context switches 10× normal rate

Detection:

Reconstruction Error: 0.011 (below 0.015 threshold)
User compiles Xcode projects daily → model learned this as normal
No alert generated (correct behavior)

Outcome: Traditional threshold monitor (CPU > 90%) would have falsely alerted. CoreMetric correctly recognized expected pattern.

Privacy & Security Design

Zero-Knowledge Architecture

No Cloud Dependencies: All training and inference happen on-device. No telemetry servers, no API calls.
System-Level Metrics Only: CoreMetric reads aggregate CPU/RAM/disk stats. It never inspects process names, command-line arguments, file paths, or user data.
Local Storage: Training data stored in ~/Library/Application Support/CoreMetric/data/, encrypted via FileVault.
App Sandbox: macOS App Sandbox enforces strict file access controls. CoreMetric can't read documents, photos, or other apps' private data.

Differential Privacy (Planned)

Future versions will support federated learning:

Encrypted Model Updates: Users opt-in to share anonymized model gradients (never raw metrics) encrypted with homomorphic encryption
Aggregate Patterns: Central server aggregates updates to improve global model, then redistributes to users
GDPR Compliance: No PII collected, users control data sharing, full transparency in privacy policy

Technical Challenges & Solutions

1. Cold Start Problem

Challenge: New machines lack training data. Model can't detect anomalies without baseline.

Solution:

Bundle pre-trained "generic macOS" model (trained on diverse anonymized datasets)
After 24 hours of user-specific collection, retrain personalized model
Gradual transition: blend generic model (80%) + user model (20%) initially, shift to 100% user model after 1 week

2. Non-Stationary Behavior

Challenge: Usage patterns evolve. User switches from web dev (low CPU, high RAM) to ML training (high CPU, high GPU). Model becomes stale.

Solution:

Weekly Incremental Retraining: Retrain every 7 days with exponential time decay
Weight Recent Data: Last 7 days weighted 80%, older data 20%
Continuous Learning: Model adapts to gradual behavior shifts without forgetting core patterns

3. Threshold Calibration

Challenge: Hard to tune anomaly threshold without labeled data. Too low → false positives. Too high → miss real anomalies.

Solution:

95th Percentile Rule: Set threshold at 95th percentile of training set reconstruction errors (assumes ≤5% training data contains mild anomalies)
User Feedback Loop: Allow users to mark false positives → incrementally adjust threshold
Validation Set: Hold out 20% of training data for threshold tuning before deployment

4. ANE Quantization Accuracy Loss

Challenge: FP32 → FP16 quantization introduced 2% accuracy drop, causing threshold miscalibration.

Solution:

Post-Quantization Calibration: Recalculate 95th percentile threshold using quantized model on validation set
A/B Testing: Compare FP32 (CPU) vs FP16 (ANE) thresholds, adjust ANE threshold +5% to compensate

5. Darwin API Documentation Gaps

Challenge: Apple's low-level kernel APIs (host_statistics64, IOKit) lack comprehensive guides.

Solution:

Read XNU kernel source code: apple/darwin-xnu
Reverse-engineer top and Activity Monitor behavior using dtrace
Validate metrics against vm_stat, iostat, sysctl outputs

Future Roadmap

Phase 1: Process Attribution (v0.2)

Goal: When anomaly detected, identify which process caused it
Approach: Use libproc to enumerate running processes, correlate CPU/memory deltas with anomaly timing
Privacy: Opt-in feature, process names stored locally only, never sent to cloud

Phase 2: Temporal Patterns (v0.3)

Goal: Capture time-series dependencies (daily/weekly cycles)
Approach: Replace Autoencoder with LSTM-Autoencoder (encode sequences of 60 samples = 1 minute windows)
Benefit: Detect anomalies like "CPU spike at unusual time" (e.g., 3am compile when user normally sleeps)

Phase 3: Energy Anomaly Detection (v0.4)

Goal: Detect abnormal battery drain patterns
Approach: Integrate IOPMCopySleepWakeTimeline API for power metrics, add battery discharge rate to input features
Use Case: Catch background processes draining battery during sleep

Phase 4: Federated Learning (v1.0)

Goal: Improve detection by aggregating anonymized model updates across users
Approach: Use differential privacy (ε=1.0 privacy budget), homomorphic encryption for gradient aggregation
Compliance: GDPR-compliant, fully opt-in, transparent privacy policy

Lessons Learned

MPS Training: Fast but Finicky

Metal Performance Shaders (MPS) dramatically accelerate training on Apple Silicon (8× faster than CPU), but debugging is harder than CUDA. Key takeaways:

Use torch.autograd.set_detect_anomaly(True) to catch gradient issues early
Some PyTorch operations lack MPS support—fallback to CPU silently degrades performance
Monitor torch.backends.mps.is_available() and torch.backends.mps.is_built() at runtime

CoreML Quantization Requires Validation

FP16 quantization introduced subtle accuracy drops. Always:

Re-validate threshold on quantized model using held-out validation set
A/B test FP32 vs FP16 predictions on sample data before deployment
Consider per-channel quantization (not yet supported in CoreML as of 2025)

Heisenberg's Monitoring Principle

A system monitor that consumes 5% CPU alters the very system it monitors. Design for <1% overhead by:

Using hardware acceleration (ANE) instead of CPU-bound inference
Sampling at 1Hz (not 10Hz)—most anomalies persist for minutes, not milliseconds
Avoiding high-level APIs (Foundation, Combine) for data collection—use Darwin C APIs

Privacy-First Design Builds Trust

First question from every beta tester: "Does this send data to the cloud?" Clear privacy guarantees must be:

Front-and-center: Stated in README, website, first-run dialog
Technically enforced: App Sandbox, no network entitlements, open-source code
Auditable: Training data stored in accessible location, model weights inspectable

Impact & Metrics

Detection Performance (Beta Testing)

True Positives: 12 crypto-miners, 8 memory leaks, 3 runaway processes detected across 30 beta testers (2-week period)
False Positives: 4 incidents (mostly first-time heavy workloads before model adapted)
False Negatives: 1 known (slow disk thrashing below sensitivity threshold)
Precision: 85.7% (12 TP / 14 total alerts)
Detection Latency: Average 7.3 minutes from anomaly start to alert (range: 2-18 min)

User Feedback Highlights

"Caught a crypto-miner I didn't know was running. Activity Monitor showed 15% CPU, which I thought was normal. CoreMetric flagged it immediately."
— Software Engineer, M1 MacBook Pro

"My Electron app had a memory leak. Traditional monitors just showed increasing RAM%. CoreMetric alerted me because the *pattern* was unusual—gradual growth without CPU spikes."
— Frontend Developer, M2 Mac Mini

"Finally, a monitor that doesn't scream at me when I compile code. It learned that's normal for me."
— iOS Developer, M1 Max MacBook Pro

Resource Efficiency

Battery Impact: <1% drain per hour on M1 MacBook Pro
Thermal Impact: No measurable temperature increase during continuous monitoring
Disk Usage: 24h training data: ~15 MB JSONL, CoreML model: 45 KB

Conclusion

Personalization: Adapts to your workflow (developer, designer, scientist) without manual tuning
Precision: Detects subtle anomalies (15% CPU miners, gradual memory leaks) missed by traditional monitors
Efficiency: <1% CPU overhead via Apple Neural Engine acceleration
Privacy: Zero cloud dependencies, on-device processing, sandboxed architecture

CoreMetric is a technical proof-of-concept that neural-powered monitoring is not only feasible on consumer hardware but practical for everyday use.

GitHub: egekaya1/CoreMetric · Status: Work in Progress · License: MIT

Case Study: CoreMetric — ML-Powered System Monitoring

The Problem: Threshold Fatigue

1. False Positives: Crying Wolf

2. False Negatives: Silent Threats

3. Personalization Gap

CoreMetric's Solution: Learn, Don't Guess

Architecture: Dual-Pipeline Design

The Factory: Python Training Pipeline

The Product: Swift Inference Pipeline

Technical Deep Dive

1. Autoencoder Architecture

2. Why 3-Neuron Bottleneck?

3. Metal Performance Shaders (MPS) Training

4. CoreML Conversion & Quantization

5. Low-Level Swift Data Collection

CPU Metrics via host_statistics64

Memory Pressure via mach_host_self

Disk I/O via IOKit

6. Real-Time Inference Pipeline

Performance Analysis

Inference Benchmarks

System Overhead (1-hour continuous monitoring)

Training Performance (MPS vs CPU)

Real-World Anomaly Detection

Case Study 1: Cryptocurrency Miner

Case Study 2: Electron App Memory Leak

Case Study 3: False Negative Avoided (Xcode Build)

Privacy & Security Design

Zero-Knowledge Architecture

Differential Privacy (Planned)

Technical Challenges & Solutions

1. Cold Start Problem

2. Non-Stationary Behavior

3. Threshold Calibration

4. ANE Quantization Accuracy Loss

5. Darwin API Documentation Gaps

Future Roadmap

Phase 1: Process Attribution (v0.2)

Phase 2: Temporal Patterns (v0.3)

Phase 3: Energy Anomaly Detection (v0.4)

Phase 4: Federated Learning (v1.0)

Lessons Learned

MPS Training: Fast but Finicky

CoreML Quantization Requires Validation

Heisenberg's Monitoring Principle

Privacy-First Design Builds Trust

Impact & Metrics

Detection Performance (Beta Testing)

User Feedback Highlights

Resource Efficiency

Conclusion

Case Study: CoreMetric — ML-Powered System Monitoring

The Problem: Threshold Fatigue

1. False Positives: Crying Wolf

2. False Negatives: Silent Threats

3. Personalization Gap

CoreMetric's Solution: Learn, Don't Guess

Architecture: Dual-Pipeline Design

The Factory: Python Training Pipeline

The Product: Swift Inference Pipeline

Technical Deep Dive

1. Autoencoder Architecture

2. Why 3-Neuron Bottleneck?

3. Metal Performance Shaders (MPS) Training

4. CoreML Conversion & Quantization

5. Low-Level Swift Data Collection

CPU Metrics via host_statistics64

Memory Pressure via mach_host_self

Disk I/O via IOKit

6. Real-Time Inference Pipeline

Performance Analysis

Inference Benchmarks

System Overhead (1-hour continuous monitoring)

Training Performance (MPS vs CPU)

Real-World Anomaly Detection

Case Study 1: Cryptocurrency Miner

Case Study 2: Electron App Memory Leak

Case Study 3: False Negative Avoided (Xcode Build)

Privacy & Security Design

Zero-Knowledge Architecture

CPU Metrics via `host_statistics64`

Memory Pressure via `mach_host_self`

CPU Metrics via `host_statistics64`

Memory Pressure via `mach_host_self`