SwiftUI

CoreML

PyTorch

Metal (MPS)

IOKit

Darwin

CoreMetric: ML-Powered System Monitoring on macOS

November 27, 2025

18 min read

GitHub

The Problem with Traditional Monitoring

Traditional system monitors rely on hard-coded thresholds: alert when CPU exceeds 90%, warn when memory usage hits 80%, panic when disk I/O saturates. This approach has three critical flaws:

False Positives: Video encoding legitimately uses 95%+ CPU. A data scientist's ML training regularly consumes 24GB RAM. These aren't anomalies—they're expected workload patterns.
False Negatives: A crypto-miner using 15% CPU flies under the radar. A memory leak growing by 50MB/hour won't trigger alarms for days. Frozen background processes don't breach thresholds but still harm system health.
Personalization Gap: A software engineer's "normal" differs drastically from a graphic designer's. Static rules can't adapt to individual machine personalities.

CoreMetric solves this by learning your machine's baseline behavior through a neural network, detecting deviations from normality rather than absolute threshold violations. It runs entirely on-device using the Apple Neural Engine, achieving <1% CPU overhead while processing metrics in real-time.

System Architecture: The Factory vs The Product

CoreMetric splits into two distinct pipelines with radically different environments:

The Factory (Python/Training)

This is where the model learns. A Python daemon collects 24+ hours of telemetry (CPU load, memory pressure, disk I/O, context switches, network activity) and trains a Reconstruction Autoencoder to compress and reconstruct "normal" system states.

┌─────────────────────────────────────────────────┐
│         Python Training Pipeline                │
└─────────────────────────────────────────────────┘

  psutil.cpu_percent()         Raw Telemetry
  psutil.virtual_memory()   ───────────────►  JSONL Logs
  psutil.disk_io_counters()                     (24h+)
            │
            ▼
  ┌────────────────────┐
  │   Preprocessing    │
  │  • Normalize       │     PyTorch Autoencoder
  │  • Handle NaNs     │  ────────────────────►
  │  • Feature scale   │       (MPS Training)
  └────────────────────┘
            │
            ▼
  ┌────────────────────────────────────────┐
  │   Trained Model + Scaling Parameters   │
  │       (Mean/Std for normalization)     │
  └────────────────────────────────────────┘
            │
            ▼
       coremltools.convert()
            │
            ▼
    CoreMetric.mlpackage
   (Quantized for ANE)

The Product (Swift/Inference)

The macOS app embeds the trained .mlpackage and uses bare-metal Darwin APIs to collect live metrics. The model runs on the Apple Neural Engine, achieving hardware acceleration with negligible battery impact.

┌─────────────────────────────────────────────────┐
│          Swift macOS Application                │
└─────────────────────────────────────────────────┘

  host_statistics64()        Real-time Metrics
  libproc (C-Interop)     ────────────────►  Swift Collector
  IOKit Framework                              (Every 1s)
            │
            ▼
  ┌────────────────────┐
  │  Normalize Input   │
  │  (using embedded   │     CoreML Model
  │   Mean/Std from    │  ─────────────────►  ANE/GPU
  │   training)        │     (Inference)
  └────────────────────┘
            │
            ▼
  ┌────────────────────────────────────────┐
  │   Reconstruction Error (MSE)           │
  │   High error = Anomalous state         │
  └────────────────────────────────────────┘
            │
            ▼
     Swift Charts Dashboard
      (Visual feedback)

The ML Approach: Reconstruction Autoencoders

Why Not Classification?

Traditional supervised learning requires labeled examples: "This is normal, this is malware, this is a memory leak." But anomalies are rare, diverse, and evolve constantly. We'd never collect enough representative samples.

Instead, CoreMetric uses one-class learning: train exclusively on "normal" data, then flag anything the model can't reconstruct as anomalous.

Autoencoder Architecture

Input Layer (8 features)
    │
    ▼
┌─────────┐
│ Encoder │  Linear(8 → 5) + ReLU
│         │  Linear(5 → 3) + ReLU  ← Bottleneck (compressed state)
└─────────┘
    │
    ▼
┌─────────┐
│ Decoder │  Linear(3 → 5) + ReLU
│         │  Linear(5 → 8)  ← Reconstructed input
└─────────┘
    │
    ▼
Reconstruction Loss (MSE)
    │
    ▼
If MSE > threshold → ANOMALY

Input Features (8 Dimensions)

CPU Load Average (1m): Smoothed CPU usage over 60 seconds
Memory Pressure: Active + Wired memory as % of total
Swap Usage: Virtual memory paging activity
Disk Read/Write Bytes: Per-second throughput
Context Switches: Kernel thread switching rate (high = thrashing)
Network Bytes Sent/Received: Per-second bandwidth

Why 8? Enough to capture system state without overwhelming the model. The bottleneck layer (3 neurons) forces the model to learn efficient compressed representations.

Training on Apple Silicon (MPS)

PyTorch natively supports Metal Performance Shaders (MPS) on M-series chips, offloading matrix operations to the GPU:

device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
model = Autoencoder(input_dim=8, hidden_dim=5, latent_dim=3).to(device)

# Training loop (MSE loss, Adam optimizer)
for epoch in range(100):
    for batch in dataloader:
        batch = batch.to(device)  # Move to GPU
        reconstructed = model(batch)
        loss = F.mse_loss(reconstructed, batch)
        loss.backward()
        optimizer.step()

Result: Training on 24h of 1-second samples (~86,000 data points) takes ~2 minutes on an M1 MacBook Pro.

CoreML Conversion: From PyTorch to ANE

Quantization for Efficiency

The Apple Neural Engine (ANE) excels at low-precision arithmetic. We quantize the model from FP32 → FP16, cutting memory usage in half with negligible accuracy loss:

import coremltools as ct

# Convert PyTorch model to CoreML
traced_model = torch.jit.trace(model, example_input)
mlmodel = ct.convert(
    traced_model,
    inputs=[ct.TensorType(shape=(1, 8))],
    compute_precision=ct.precision.FLOAT16  # Quantize to FP16
)

# Embed scaling parameters in metadata
mlmodel.user_defined_metadata['mean'] = json.dumps(mean_values.tolist())
mlmodel.user_defined_metadata['std'] = json.dumps(std_values.tolist())

mlmodel.save("CoreMetric.mlpackage")

Why Embed Scaling Parameters?

The model expects normalized inputs (mean=0, std=1). By storing training-time statistics in the .mlpackage metadata, the Swift app auto-calibrates without hardcoding values:

// Swift: Extract metadata from CoreML model
let metadata = try model.model.modelDescription.metadata[MLModelMetadataKey.creatorDefinedKey]
let meanJSON = metadata?["mean"] as? String
let mean = try JSONDecoder().decode([Double].self, from: meanJSON!.data(using: .utf8)!)

// Normalize live metrics using training statistics
let normalizedInput = (rawMetrics - mean) / std

Low-Level Data Collection in Swift

Why Not Use Third-Party Libraries?

Precision matters. A monitoring tool can't introduce overhead that alters system behavior (Heisenberg's monitoring principle). We bypass high-level APIs and talk directly to the Darwin kernel.

CPU Metrics via `host_statistics64`

import Darwin

func getCPULoad() -> Double {
    var loadInfo = host_cpu_load_info()
    var count = mach_msg_type_number_t(MemoryLayout<host_cpu_load_info>.size / MemoryLayout<integer_t>.size)

    let result = withUnsafeMutablePointer(to: &loadInfo) { pointer in
        pointer.withMemoryRebound(to: integer_t.self, capacity: Int(count)) { intPtr in
            host_statistics64(mach_host_self(), HOST_CPU_LOAD_INFO, intPtr, &count)
        }
    }

    guard result == KERN_SUCCESS else { return 0.0 }

    let user = Double(loadInfo.cpu_ticks.0)
    let system = Double(loadInfo.cpu_ticks.1)
    let idle = Double(loadInfo.cpu_ticks.2)
    let nice = Double(loadInfo.cpu_ticks.3)

    let total = user + system + idle + nice
    return total > 0 ? (user + system + nice) / total : 0.0
}

Memory Metrics via `mach_host_self`

func getMemoryPressure() -> Double {
    var vmStats = vm_statistics64()
    var count = mach_msg_type_number_t(MemoryLayout<vm_statistics64>.size / MemoryLayout<integer_t>.size)

    let result = withUnsafeMutablePointer(to: &vmStats) { pointer in
        pointer.withMemoryRebound(to: integer_t.self, capacity: Int(count)) { intPtr in
            host_statistics64(mach_host_self(), HOST_VM_INFO64, intPtr, &count)
        }
    }

    guard result == KERN_SUCCESS else { return 0.0 }

    let pageSize = vm_kernel_page_size
    let active = Double(vmStats.active_count) * Double(pageSize)
    let wired = Double(vmStats.wire_count) * Double(pageSize)

    // Get total physical memory
    var size = UInt64(0)
    var sizeLen = size_t(MemoryLayout<UInt64>.size)
    sysctlbyname("hw.memsize", &size, &sizeLen, nil, 0)

    return (active + wired) / Double(size)
}

Disk I/O via IOKit

IOKit provides access to hardware statistics. We query IOBlockStorageDriver for read/write byte counts:

import IOKit

func getDiskIO() -> (readBytes: UInt64, writeBytes: UInt64) {
    let matchingDict = IOServiceMatching("IOBlockStorageDriver")
    var iterator: io_iterator_t = 0

    guard IOServiceGetMatchingServices(kIOMainPortDefault, matchingDict, &iterator) == KERN_SUCCESS else {
        return (0, 0)
    }

    var totalRead: UInt64 = 0
    var totalWrite: UInt64 = 0

    while case let entry = IOIteratorNext(iterator), entry != 0 {
        if let stats = IORegistryEntryCreateCFProperty(entry, "Statistics" as CFString, kCFAllocatorDefault, 0)?.takeRetainedValue() as? [String: Any] {
            totalRead += (stats["Bytes (Read)"] as? UInt64) ?? 0
            totalWrite += (stats["Bytes (Write)"] as? UInt64) ?? 0
        }
        IOObjectRelease(entry)
    }

    IOObjectRelease(iterator)
    return (totalRead, totalWrite)
}

Real-Time Inference on ANE

CoreML Prediction Pipeline

import CoreML

class AnomalyDetector {
    private let model: CoreMetric
    private let mean: [Double]
    private let std: [Double]

    func detectAnomaly(metrics: SystemMetrics) -> (score: Double, isAnomaly: Bool) {
        // 1. Normalize input using training statistics
        let normalized = zip(metrics.toArray(), zip(mean, std)).map {
            ($0 - $1.0) / $1.1
        }

        // 2. Create MLMultiArray input
        let input = try! MLMultiArray(shape: [1, 8], dataType: .double)
        for (i, value) in normalized.enumerated() {
            input[i] = NSNumber(value: value)
        }

        // 3. Run inference (automatically uses ANE if available)
        let prediction = try! model.prediction(input: CoreMetricInput(input: input))

        // 4. Calculate reconstruction error (MSE)
        let reconstructed = prediction.output
        let mse = zip(normalized, (0..<8).map { reconstructed[$0].doubleValue }).map {
            pow($0 - $1, 2)
        }.reduce(0, +) / 8.0

        // 5. Compare against learned threshold (95th percentile from training)
        let threshold = 0.015  // Tuned during training
        return (mse, mse > threshold)
    }
}

ANE Acceleration Verification

CoreML automatically selects the best compute unit (ANE > GPU > CPU). Verify ANE usage via Instruments:

# Terminal: Profile app while running inference
instruments -t "Neural Engine" -D profile.trace MLMonitor.app

# Check ANE utilization in Instruments UI (should show spikes at inference time)

SwiftUI Dashboard: Visualizing Anomalies

Real-Time Charts with Swift Charts

import SwiftUI
import Charts

struct AnomalyChart: View {
    @State private var anomalyScores: [AnomalyPoint] = []
    @State private var threshold: Double = 0.015

    var body: some View {
        Chart {
            ForEach(anomalyScores) { point in
                LineMark(
                    x: .value("Time", point.timestamp),
                    y: .value("Score", point.score)
                )
                .foregroundStyle(point.isAnomaly ? .red : .blue)
            }

            // Threshold line
            RuleMark(y: .value("Threshold", threshold))
                .foregroundStyle(.orange)
                .lineStyle(StrokeStyle(dash: [5, 5]))
        }
        .chartYScale(domain: 0...0.05)
        .chartXAxis {
            AxisMarks(values: .stride(by: .minute))
        }
    }
}

Anomaly Alerts

func handleAnomaly(score: Double, metrics: SystemMetrics) {
    // Send macOS notification
    let content = UNMutableNotificationContent()
    content.title = "System Anomaly Detected"
    content.body = """
    Reconstruction error: \(String(format: "%.4f", score))
    CPU: \(metrics.cpuLoad)% | Memory: \(metrics.memoryPressure)%
    """
    content.sound = .default

    let request = UNNotificationRequest(identifier: UUID().uuidString, content: content, trigger: nil)
    UNUserNotificationCenter.current().add(request)
}

Performance Benchmarks

Overhead Analysis

Metric	Baseline (No Monitoring)	CoreMetric Running	Overhead
CPU Usage	2.3%	2.8%	0.5%
Memory	4.2 GB	4.23 GB	30 MB
Energy Impact	Low	Low	Negligible
Inference Latency	—	1.2 ms	—

Tested on: M1 MacBook Pro, macOS 14.5, 16GB RAM. Metrics collected every 1 second for 1 hour.

ANE vs GPU vs CPU Performance

Compute Unit	Inference Time	Power Draw
ANE (FP16)	1.2 ms	0.3 W
GPU (FP32)	3.8 ms	2.1 W
CPU (FP32)	12.5 ms	4.5 W

ANE delivers 10× faster inference with 15× lower power consumption compared to CPU.

Privacy Guarantees

Zero Cloud Dependencies: All data processing happens on-device. No telemetry servers.
No Process Inspection: CoreMetric only reads system-level metrics (CPU, RAM). It never inspects process names, arguments, or file paths.
Local Storage: Training data stays in ~/Library/Application Support/CoreMetric/data/, encrypted via FileVault.
Sandboxed App: macOS App Sandbox enforces strict file access controls. CoreMetric can't access documents, photos, or other apps' data.

Real-World Anomaly Examples

Detected: Crypto-Miner

Symptoms: Sustained 15% CPU usage during idle hours, elevated context switches
Reconstruction Error: 0.042 (2.8× threshold)
Why It Worked: User's baseline CPU during idle: 2-5%. A constant 15% is statistically abnormal.

Detected: Memory Leak in Electron App

Symptoms: Memory pressure climbing from 60% → 85% over 4 hours, no corresponding disk I/O or CPU spike
Reconstruction Error: 0.038 (2.5× threshold)
Why It Worked: Gradual memory growth without proportional CPU/disk activity is atypical.

False Positive: Xcode Build

Symptoms: CPU spiked to 95%, disk I/O at 200 MB/s
Reconstruction Error: 0.011 (below threshold)
Why It Passed: User compiles code daily. Model learned this pattern as normal.

Challenges & Solutions

1. Cold Start Problem

Issue: New machines lack training data. Model can't detect anomalies without baseline.

Solution: Pre-trained "generic macOS" model bundled with app. User-specific model replaces it after 24h of collection.

2. Non-Stationary Behavior

Issue: Usage patterns evolve (e.g., user switches from web dev to ML training). Model becomes stale.

Solution: Weekly incremental retraining with exponential decay on old data (recent 7 days weighted 80%, older data 20%).

3. Sparse Anomaly Labels

Issue: Hard to tune threshold without labeled anomalies.

Solution: Set threshold at 95th percentile of training set reconstruction errors (assumes 5% of training data contains mild anomalies).

Future Roadmap

Process-Level Attribution: When anomaly detected, identify which process caused it (opt-in, privacy-preserving)
Temporal Patterns: Add LSTM layer to capture time-series dependencies (e.g., daily/weekly cycles)
Federated Learning: Aggregate anonymized model updates across users to improve detection (fully encrypted, GDPR-compliant)
Energy Anomalies: Detect abnormal battery drain patterns using IOPMCopySleepWakeTimeline API

Lessons Learned

MPS Training: Fast but Finicky

Metal Performance Shaders dramatically accelerate training on Apple Silicon, but debugging is harder than CUDA. Use torch.autograd.set_detect_anomaly(True) to catch gradient issues early.

ANE Quantization Requires Testing

FP16 quantization introduced a 2% accuracy drop initially. Solution: Re-tune threshold post-quantization using validation set.

Darwin APIs Are Underdocumented

Apple's low-level kernel APIs lack comprehensive guides. Reading XNU source code and reverse-engineering top's implementation was necessary. Key resources: XNU GitHub and man 3 host_statistics.

Privacy-First Design Builds Trust

Users immediately asked: "Does this send data to the cloud?" Clear privacy guarantees (local-only processing, sandboxing) must be front-and-center in documentation.

Conclusion

CoreMetric demonstrates how modern ML techniques (autoencoders, one-class learning) can transform system monitoring from reactive threshold-based alerts to proactive anomaly detection. By leveraging Apple's hardware acceleration (ANE, MPS) and respecting user privacy (on-device processing), it achieves the trifecta of effectiveness, efficiency, and trust.

The project is a technical exercise in bridging two ecosystems—Python's ML maturity and Swift's native macOS integration—while staying true to Apple's design principles: performance, privacy, and polish.

GitHub: egekaya1/CoreMetric · Status: Work in Progress · License: MIT

SwiftUI

CoreML

PyTorch

Metal (MPS)

IOKit

Darwin

CoreMetric: ML-Powered System Monitoring on macOS

November 27, 2025

18 min read

GitHub

The Problem with Traditional Monitoring

Traditional system monitors rely on hard-coded thresholds: alert when CPU exceeds 90%, warn when memory usage hits 80%, panic when disk I/O saturates. This approach has three critical flaws:

False Positives: Video encoding legitimately uses 95%+ CPU. A data scientist's ML training regularly consumes 24GB RAM. These aren't anomalies—they're expected workload patterns.
False Negatives: A crypto-miner using 15% CPU flies under the radar. A memory leak growing by 50MB/hour won't trigger alarms for days. Frozen background processes don't breach thresholds but still harm system health.
Personalization Gap: A software engineer's "normal" differs drastically from a graphic designer's. Static rules can't adapt to individual machine personalities.

System Architecture: The Factory vs The Product

CoreMetric splits into two distinct pipelines with radically different environments:

The Factory (Python/Training)

┌─────────────────────────────────────────────────┐
│         Python Training Pipeline                │
└─────────────────────────────────────────────────┘

  psutil.cpu_percent()         Raw Telemetry
  psutil.virtual_memory()   ───────────────►  JSONL Logs
  psutil.disk_io_counters()                     (24h+)
            │
            ▼
  ┌────────────────────┐
  │   Preprocessing    │
  │  • Normalize       │     PyTorch Autoencoder
  │  • Handle NaNs     │  ────────────────────►
  │  • Feature scale   │       (MPS Training)
  └────────────────────┘
            │
            ▼
  ┌────────────────────────────────────────┐
  │   Trained Model + Scaling Parameters   │
  │       (Mean/Std for normalization)     │
  └────────────────────────────────────────┘
            │
            ▼
       coremltools.convert()
            │
            ▼
    CoreMetric.mlpackage
   (Quantized for ANE)

The Product (Swift/Inference)

┌─────────────────────────────────────────────────┐
│          Swift macOS Application                │
└─────────────────────────────────────────────────┘

  host_statistics64()        Real-time Metrics
  libproc (C-Interop)     ────────────────►  Swift Collector
  IOKit Framework                              (Every 1s)
            │
            ▼
  ┌────────────────────┐
  │  Normalize Input   │
  │  (using embedded   │     CoreML Model
  │   Mean/Std from    │  ─────────────────►  ANE/GPU
  │   training)        │     (Inference)
  └────────────────────┘
            │
            ▼
  ┌────────────────────────────────────────┐
  │   Reconstruction Error (MSE)           │
  │   High error = Anomalous state         │
  └────────────────────────────────────────┘
            │
            ▼
     Swift Charts Dashboard
      (Visual feedback)

The ML Approach: Reconstruction Autoencoders

Why Not Classification?

Instead, CoreMetric uses one-class learning: train exclusively on "normal" data, then flag anything the model can't reconstruct as anomalous.

Autoencoder Architecture

Input Layer (8 features)
    │
    ▼
┌─────────┐
│ Encoder │  Linear(8 → 5) + ReLU
│         │  Linear(5 → 3) + ReLU  ← Bottleneck (compressed state)
└─────────┘
    │
    ▼
┌─────────┐
│ Decoder │  Linear(3 → 5) + ReLU
│         │  Linear(5 → 8)  ← Reconstructed input
└─────────┘
    │
    ▼
Reconstruction Loss (MSE)
    │
    ▼
If MSE > threshold → ANOMALY

Input Features (8 Dimensions)

CPU Load Average (1m): Smoothed CPU usage over 60 seconds
Memory Pressure: Active + Wired memory as % of total
Swap Usage: Virtual memory paging activity
Disk Read/Write Bytes: Per-second throughput
Context Switches: Kernel thread switching rate (high = thrashing)
Network Bytes Sent/Received: Per-second bandwidth

Why 8? Enough to capture system state without overwhelming the model. The bottleneck layer (3 neurons) forces the model to learn efficient compressed representations.

Training on Apple Silicon (MPS)

PyTorch natively supports Metal Performance Shaders (MPS) on M-series chips, offloading matrix operations to the GPU:

device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
model = Autoencoder(input_dim=8, hidden_dim=5, latent_dim=3).to(device)

# Training loop (MSE loss, Adam optimizer)
for epoch in range(100):
    for batch in dataloader:
        batch = batch.to(device)  # Move to GPU
        reconstructed = model(batch)
        loss = F.mse_loss(reconstructed, batch)
        loss.backward()
        optimizer.step()

Result: Training on 24h of 1-second samples (~86,000 data points) takes ~2 minutes on an M1 MacBook Pro.

CoreML Conversion: From PyTorch to ANE

Quantization for Efficiency

The Apple Neural Engine (ANE) excels at low-precision arithmetic. We quantize the model from FP32 → FP16, cutting memory usage in half with negligible accuracy loss:

import coremltools as ct

# Convert PyTorch model to CoreML
traced_model = torch.jit.trace(model, example_input)
mlmodel = ct.convert(
    traced_model,
    inputs=[ct.TensorType(shape=(1, 8))],
    compute_precision=ct.precision.FLOAT16  # Quantize to FP16
)

# Embed scaling parameters in metadata
mlmodel.user_defined_metadata['mean'] = json.dumps(mean_values.tolist())
mlmodel.user_defined_metadata['std'] = json.dumps(std_values.tolist())

mlmodel.save("CoreMetric.mlpackage")

Why Embed Scaling Parameters?

The model expects normalized inputs (mean=0, std=1). By storing training-time statistics in the .mlpackage metadata, the Swift app auto-calibrates without hardcoding values:

// Swift: Extract metadata from CoreML model
let metadata = try model.model.modelDescription.metadata[MLModelMetadataKey.creatorDefinedKey]
let meanJSON = metadata?["mean"] as? String
let mean = try JSONDecoder().decode([Double].self, from: meanJSON!.data(using: .utf8)!)

// Normalize live metrics using training statistics
let normalizedInput = (rawMetrics - mean) / std

Low-Level Data Collection in Swift

Why Not Use Third-Party Libraries?

Precision matters. A monitoring tool can't introduce overhead that alters system behavior (Heisenberg's monitoring principle). We bypass high-level APIs and talk directly to the Darwin kernel.

CPU Metrics via `host_statistics64`

import Darwin

func getCPULoad() -> Double {
    var loadInfo = host_cpu_load_info()
    var count = mach_msg_type_number_t(MemoryLayout<host_cpu_load_info>.size / MemoryLayout<integer_t>.size)

    let result = withUnsafeMutablePointer(to: &loadInfo) { pointer in
        pointer.withMemoryRebound(to: integer_t.self, capacity: Int(count)) { intPtr in
            host_statistics64(mach_host_self(), HOST_CPU_LOAD_INFO, intPtr, &count)
        }
    }

    guard result == KERN_SUCCESS else { return 0.0 }

    let user = Double(loadInfo.cpu_ticks.0)
    let system = Double(loadInfo.cpu_ticks.1)
    let idle = Double(loadInfo.cpu_ticks.2)
    let nice = Double(loadInfo.cpu_ticks.3)

    let total = user + system + idle + nice
    return total > 0 ? (user + system + nice) / total : 0.0
}

Memory Metrics via `mach_host_self`

func getMemoryPressure() -> Double {
    var vmStats = vm_statistics64()
    var count = mach_msg_type_number_t(MemoryLayout<vm_statistics64>.size / MemoryLayout<integer_t>.size)

    let result = withUnsafeMutablePointer(to: &vmStats) { pointer in
        pointer.withMemoryRebound(to: integer_t.self, capacity: Int(count)) { intPtr in
            host_statistics64(mach_host_self(), HOST_VM_INFO64, intPtr, &count)
        }
    }

    guard result == KERN_SUCCESS else { return 0.0 }

    let pageSize = vm_kernel_page_size
    let active = Double(vmStats.active_count) * Double(pageSize)
    let wired = Double(vmStats.wire_count) * Double(pageSize)

    // Get total physical memory
    var size = UInt64(0)
    var sizeLen = size_t(MemoryLayout<UInt64>.size)
    sysctlbyname("hw.memsize", &size, &sizeLen, nil, 0)

    return (active + wired) / Double(size)
}

Disk I/O via IOKit

IOKit provides access to hardware statistics. We query IOBlockStorageDriver for read/write byte counts:

import IOKit

func getDiskIO() -> (readBytes: UInt64, writeBytes: UInt64) {
    let matchingDict = IOServiceMatching("IOBlockStorageDriver")
    var iterator: io_iterator_t = 0

    guard IOServiceGetMatchingServices(kIOMainPortDefault, matchingDict, &iterator) == KERN_SUCCESS else {
        return (0, 0)
    }

    var totalRead: UInt64 = 0
    var totalWrite: UInt64 = 0

    while case let entry = IOIteratorNext(iterator), entry != 0 {
        if let stats = IORegistryEntryCreateCFProperty(entry, "Statistics" as CFString, kCFAllocatorDefault, 0)?.takeRetainedValue() as? [String: Any] {
            totalRead += (stats["Bytes (Read)"] as? UInt64) ?? 0
            totalWrite += (stats["Bytes (Write)"] as? UInt64) ?? 0
        }
        IOObjectRelease(entry)
    }

    IOObjectRelease(iterator)
    return (totalRead, totalWrite)
}

Real-Time Inference on ANE

CoreML Prediction Pipeline

import CoreML

class AnomalyDetector {
    private let model: CoreMetric
    private let mean: [Double]
    private let std: [Double]

    func detectAnomaly(metrics: SystemMetrics) -> (score: Double, isAnomaly: Bool) {
        // 1. Normalize input using training statistics
        let normalized = zip(metrics.toArray(), zip(mean, std)).map {
            ($0 - $1.0) / $1.1
        }

        // 2. Create MLMultiArray input
        let input = try! MLMultiArray(shape: [1, 8], dataType: .double)
        for (i, value) in normalized.enumerated() {
            input[i] = NSNumber(value: value)
        }

        // 3. Run inference (automatically uses ANE if available)
        let prediction = try! model.prediction(input: CoreMetricInput(input: input))

        // 4. Calculate reconstruction error (MSE)
        let reconstructed = prediction.output
        let mse = zip(normalized, (0..<8).map { reconstructed[$0].doubleValue }).map {
            pow($0 - $1, 2)
        }.reduce(0, +) / 8.0

        // 5. Compare against learned threshold (95th percentile from training)
        let threshold = 0.015  // Tuned during training
        return (mse, mse > threshold)
    }
}

ANE Acceleration Verification

CoreML automatically selects the best compute unit (ANE > GPU > CPU). Verify ANE usage via Instruments:

# Terminal: Profile app while running inference
instruments -t "Neural Engine" -D profile.trace MLMonitor.app

# Check ANE utilization in Instruments UI (should show spikes at inference time)

SwiftUI Dashboard: Visualizing Anomalies

Real-Time Charts with Swift Charts

import SwiftUI
import Charts

struct AnomalyChart: View {
    @State private var anomalyScores: [AnomalyPoint] = []
    @State private var threshold: Double = 0.015

    var body: some View {
        Chart {
            ForEach(anomalyScores) { point in
                LineMark(
                    x: .value("Time", point.timestamp),
                    y: .value("Score", point.score)
                )
                .foregroundStyle(point.isAnomaly ? .red : .blue)
            }

            // Threshold line
            RuleMark(y: .value("Threshold", threshold))
                .foregroundStyle(.orange)
                .lineStyle(StrokeStyle(dash: [5, 5]))
        }
        .chartYScale(domain: 0...0.05)
        .chartXAxis {
            AxisMarks(values: .stride(by: .minute))
        }
    }
}

Anomaly Alerts

func handleAnomaly(score: Double, metrics: SystemMetrics) {
    // Send macOS notification
    let content = UNMutableNotificationContent()
    content.title = "System Anomaly Detected"
    content.body = """
    Reconstruction error: \(String(format: "%.4f", score))
    CPU: \(metrics.cpuLoad)% | Memory: \(metrics.memoryPressure)%
    """
    content.sound = .default

    let request = UNNotificationRequest(identifier: UUID().uuidString, content: content, trigger: nil)
    UNUserNotificationCenter.current().add(request)
}

Performance Benchmarks

Overhead Analysis

Metric	Baseline (No Monitoring)	CoreMetric Running	Overhead
CPU Usage	2.3%	2.8%	0.5%
Memory	4.2 GB	4.23 GB	30 MB
Energy Impact	Low	Low	Negligible
Inference Latency	—	1.2 ms	—

Tested on: M1 MacBook Pro, macOS 14.5, 16GB RAM. Metrics collected every 1 second for 1 hour.

ANE vs GPU vs CPU Performance

Compute Unit	Inference Time	Power Draw
ANE (FP16)	1.2 ms	0.3 W
GPU (FP32)	3.8 ms	2.1 W
CPU (FP32)	12.5 ms	4.5 W

ANE delivers 10× faster inference with 15× lower power consumption compared to CPU.

Privacy Guarantees

Zero Cloud Dependencies: All data processing happens on-device. No telemetry servers.
No Process Inspection: CoreMetric only reads system-level metrics (CPU, RAM). It never inspects process names, arguments, or file paths.
Local Storage: Training data stays in ~/Library/Application Support/CoreMetric/data/, encrypted via FileVault.
Sandboxed App: macOS App Sandbox enforces strict file access controls. CoreMetric can't access documents, photos, or other apps' data.

Real-World Anomaly Examples

Detected: Crypto-Miner

Symptoms: Sustained 15% CPU usage during idle hours, elevated context switches
Reconstruction Error: 0.042 (2.8× threshold)
Why It Worked: User's baseline CPU during idle: 2-5%. A constant 15% is statistically abnormal.

Detected: Memory Leak in Electron App

Symptoms: Memory pressure climbing from 60% → 85% over 4 hours, no corresponding disk I/O or CPU spike
Reconstruction Error: 0.038 (2.5× threshold)
Why It Worked: Gradual memory growth without proportional CPU/disk activity is atypical.

False Positive: Xcode Build

Symptoms: CPU spiked to 95%, disk I/O at 200 MB/s
Reconstruction Error: 0.011 (below threshold)
Why It Passed: User compiles code daily. Model learned this pattern as normal.

Challenges & Solutions

1. Cold Start Problem

Issue: New machines lack training data. Model can't detect anomalies without baseline.

Solution: Pre-trained "generic macOS" model bundled with app. User-specific model replaces it after 24h of collection.

2. Non-Stationary Behavior

Issue: Usage patterns evolve (e.g., user switches from web dev to ML training). Model becomes stale.

Solution: Weekly incremental retraining with exponential decay on old data (recent 7 days weighted 80%, older data 20%).

3. Sparse Anomaly Labels

Issue: Hard to tune threshold without labeled anomalies.

Solution: Set threshold at 95th percentile of training set reconstruction errors (assumes 5% of training data contains mild anomalies).

Future Roadmap

Process-Level Attribution: When anomaly detected, identify which process caused it (opt-in, privacy-preserving)
Temporal Patterns: Add LSTM layer to capture time-series dependencies (e.g., daily/weekly cycles)
Federated Learning: Aggregate anonymized model updates across users to improve detection (fully encrypted, GDPR-compliant)
Energy Anomalies: Detect abnormal battery drain patterns using IOPMCopySleepWakeTimeline API

Lessons Learned

MPS Training: Fast but Finicky

Metal Performance Shaders dramatically accelerate training on Apple Silicon, but debugging is harder than CUDA. Use torch.autograd.set_detect_anomaly(True) to catch gradient issues early.

ANE Quantization Requires Testing

FP16 quantization introduced a 2% accuracy drop initially. Solution: Re-tune threshold post-quantization using validation set.

Darwin APIs Are Underdocumented

Apple's low-level kernel APIs lack comprehensive guides. Reading XNU source code and reverse-engineering top's implementation was necessary. Key resources: XNU GitHub and man 3 host_statistics.

Privacy-First Design Builds Trust

Users immediately asked: "Does this send data to the cloud?" Clear privacy guarantees (local-only processing, sandboxing) must be front-and-center in documentation.

Conclusion

GitHub: egekaya1/CoreMetric · Status: Work in Progress · License: MIT

CoreMetric: ML-Powered System Monitoring on macOS

The Problem with Traditional Monitoring

System Architecture: The Factory vs The Product

The Factory (Python/Training)

The Product (Swift/Inference)

The ML Approach: Reconstruction Autoencoders

Why Not Classification?

Autoencoder Architecture

Input Features (8 Dimensions)

Training on Apple Silicon (MPS)

CoreML Conversion: From PyTorch to ANE

Quantization for Efficiency

Why Embed Scaling Parameters?

Low-Level Data Collection in Swift

Why Not Use Third-Party Libraries?

CPU Metrics via host_statistics64

Memory Metrics via mach_host_self

Disk I/O via IOKit

Real-Time Inference on ANE

CoreML Prediction Pipeline

ANE Acceleration Verification

SwiftUI Dashboard: Visualizing Anomalies

Real-Time Charts with Swift Charts

Anomaly Alerts

Performance Benchmarks

Overhead Analysis

ANE vs GPU vs CPU Performance

Privacy Guarantees

Real-World Anomaly Examples

Detected: Crypto-Miner

Detected: Memory Leak in Electron App

False Positive: Xcode Build

Challenges & Solutions

1. Cold Start Problem

2. Non-Stationary Behavior

3. Sparse Anomaly Labels

Future Roadmap

Lessons Learned

MPS Training: Fast but Finicky

ANE Quantization Requires Testing

Darwin APIs Are Underdocumented

Privacy-First Design Builds Trust

Conclusion

CoreMetric: ML-Powered System Monitoring on macOS

The Problem with Traditional Monitoring

System Architecture: The Factory vs The Product

The Factory (Python/Training)

The Product (Swift/Inference)

The ML Approach: Reconstruction Autoencoders

Why Not Classification?

Autoencoder Architecture

Input Features (8 Dimensions)

Training on Apple Silicon (MPS)

CoreML Conversion: From PyTorch to ANE

Quantization for Efficiency

Why Embed Scaling Parameters?

Low-Level Data Collection in Swift

Why Not Use Third-Party Libraries?

CPU Metrics via host_statistics64

Memory Metrics via mach_host_self

Disk I/O via IOKit

Real-Time Inference on ANE

CoreML Prediction Pipeline

ANE Acceleration Verification

SwiftUI Dashboard: Visualizing Anomalies

Real-Time Charts with Swift Charts

Anomaly Alerts

Performance Benchmarks

Overhead Analysis

ANE vs GPU vs CPU Performance

Privacy Guarantees

Real-World Anomaly Examples

Detected: Crypto-Miner

Detected: Memory Leak in Electron App

False Positive: Xcode Build

Challenges & Solutions

1. Cold Start Problem

2. Non-Stationary Behavior

3. Sparse Anomaly Labels

Future Roadmap

CPU Metrics via `host_statistics64`

Memory Metrics via `mach_host_self`

CPU Metrics via `host_statistics64`

Memory Metrics via `mach_host_self`