Refactoring ImSwitch File Storage Path Management

by Alex Johnson 50 views

Overview

The current file storage path configuration in ImSwitch is facing challenges due to multiple conflicting entry points and a lack of a unified approach. This article proposes a comprehensive refactoring to establish a single source of truth for managing file storage, accommodating both static and dynamic storage locations such as USB drives and SD cards. The current method of loading configurations via entry_point.sh in Docker differs from how main.py loads them, creating inconsistencies in global storage paths (e.g., /home/pi/dataset vs. /datasets).

Current Problems

The existing system suffers from several issues that need addressing to improve reliability and ease of use.

1. Multiple Configuration Sources

One of the primary issues is the existence of multiple configuration sources, leading to confusion and potential conflicts. Specifically, configuration settings can be defined through command-line arguments and directly within the main.py file. This duplication creates ambiguity regarding precedence and conflict resolution. For instance, it's unclear which setting takes precedence if a path is defined both as a command-line argument and within main.py. The absence of a clear hierarchy makes it challenging to manage and maintain the system's configuration effectively. Furthermore, this approach introduces confusion between absolute and relative paths, complicating the process of specifying storage locations correctly. The entry_point.sh script used for Docker further exacerbates these issues. It is overly complex and lacks sufficient documentation, making it difficult to understand and modify. This script's complexity hinders the deployment and configuration of ImSwitch in containerized environments, reducing its overall usability. Addressing these issues requires a unified configuration system that provides a single source of truth, simplifies path management, and offers clear guidelines for developers and users.

2. Limited Dynamic Storage Support

The current system's limited support for dynamic storage is a significant drawback, particularly in environments where external storage devices like USB drives or SD cards are frequently used. The system lacks the ability to detect external storage devices at runtime, meaning it cannot automatically recognize when a USB drive or SD card is mounted. This limitation necessitates manual configuration and intervention, which is inconvenient and prone to errors. Moreover, there are no API endpoints available to query the available storage locations, making it difficult for users or applications to programmatically determine where data can be stored. This lack of an API also prevents the development of automated storage management tools. Another critical issue is the absence of automatic switching capabilities. The system cannot automatically switch to using an external storage device when it is mounted, requiring users to manually reconfigure the storage path. Furthermore, the system does not persist storage preferences across reboots, meaning that users must reconfigure their storage settings every time the system restarts. This lack of persistence is frustrating and undermines the user experience. Enhancing dynamic storage support requires implementing features such as runtime detection of external devices, API endpoints for querying storage locations, automatic switching capabilities, and persistence of storage preferences. These improvements would significantly enhance the flexibility and usability of the system.

3. Docker-Specific Challenges

Running ImSwitch within Docker containers introduces unique challenges related to file storage path management. External drives mounted to a container typically appear under /media or /Volumes, which differs from the paths used in native environments. This discrepancy necessitates different configuration paths for native and containerized deployments, adding complexity to the setup process. Detecting new mounts during runtime within a Docker container is also problematic. The system needs to be able to recognize when a new external drive is mounted to /media or /Volumes so that it can automatically switch to using it for data storage. The current configuration paths differ significantly between native and containerized environments. In native setups, paths like /home/pi/dataset are common, while Docker containers often use /datasets. This inconsistency requires users to be aware of their deployment environment and adjust configurations accordingly. The entry_point.sh script is intended to handle these differences, but its complexity and lack of documentation make it difficult to use and maintain. Resolving these Docker-specific challenges requires implementing a unified configuration system that abstracts away the differences between native and containerized environments. This system should automatically detect external drives mounted within containers and adjust the storage paths accordingly. Additionally, simplifying and documenting the entry_point.sh script would improve the overall Docker deployment experience. By addressing these issues, ImSwitch can be made more portable and easier to deploy across different environments.

Proposed Solution

To address these challenges, a phased approach is proposed to refactor the file storage path management system.

Phase 1: Unified Configuration System

The goal of this phase is to create a centralized configuration manager that simplifies and standardizes path management.

1.1 Single Source of Truth

To address the problem of multiple configuration sources, a centralized configuration manager will be created. This manager will serve as the single source of truth for all storage path configurations, ensuring consistency and reducing ambiguity. It will handle both absolute and relative paths consistently, providing a unified approach to path management. The configuration manager will establish a clear precedence for configuration settings: command-line arguments take precedence over environment variables, which in turn take precedence over settings in a config file, and finally, defaults. This hierarchy ensures that users can override default settings as needed while maintaining a consistent and predictable configuration. Before accepting paths, the configuration manager will validate that they exist and are writable, preventing errors due to invalid or inaccessible storage locations. This validation step ensures that the system operates reliably and avoids data loss. The new configuration manager will maintain backward compatibility with existing setups, ensuring that users can migrate to the new system without disrupting their current workflows. This backward compatibility is crucial for a smooth transition and minimizes the impact on existing deployments. By implementing these features, the centralized configuration manager will provide a reliable, consistent, and easy-to-manage storage path configuration system. This will significantly improve the overall usability and maintainability of ImSwitch.

1.2 Configuration Structure

A well-defined configuration structure is essential for managing storage paths effectively. The StorageConfiguration class will encapsulate all relevant storage settings, providing a clear and organized way to access and modify them. The default_data_path attribute specifies the primary data storage location. This path can be either absolute or relative, offering flexibility in configuring the storage location. The config_path attribute defines the location of the configuration file. Like the data path, this can also be absolute or relative. The enable_external_scanning attribute enables or disables the scanning of external mount points. When enabled, the system will automatically detect and manage external storage devices. The external_mount_paths attribute lists the directories to monitor for external storage devices (e.g., ["/media", "/Volumes"]). This allows the system to detect when new storage devices are mounted. The active_data_path attribute represents the current active storage path at runtime. This is the path that the system is currently using for data storage. The fallback_data_path attribute specifies the path to use when external storage is unavailable. This ensures that data can still be stored even if the preferred storage location is not accessible. The persist_storage_preferences attribute determines whether storage preferences should be persisted across reboots. When enabled, the system will remember the last used storage location and automatically switch to it when available. By encapsulating these settings in a StorageConfiguration class, the system provides a clear and organized way to manage storage paths. This structure enhances the maintainability and usability of the system, making it easier to configure and deploy.

class StorageConfiguration:
    # Primary data storage location
    default_data_path: str  # Can be absolute or relative
    
    # Configuration file location
    config_path: str  # Can be absolute or relative
    
    # External mount point scanning
    enable_external_scanning: bool
    external_mount_paths: List[str]  # e.g., ["/media", "/Volumes"]
    
    # Current active storage (runtime state)
    active_data_path: str
    
    # Fallback when external storage unavailable
    fallback_data_path: str
    
    # Persistence
    persist_storage_preferences: bool

Phase 2: External Storage Detection & Management

This phase focuses on improving the system's ability to detect and manage external storage devices dynamically.

2.1 Storage Monitor Service

To enhance the system's ability to manage external storage, a background service will be implemented. This service will continuously monitor configured mount point directories, such as /media and /Volumes, to detect when new storage devices are mounted or unmounted. Upon detecting a new storage device, the service will validate the new storage location, ensuring it is writable and has sufficient space for data storage. This validation step prevents errors and ensures data integrity. The service will emit events via WebSocket to notify the frontend of any changes in storage status. This real-time notification allows the user interface to update dynamically, providing users with up-to-date information about available storage devices. In Docker environments, the service will leverage tools like shutil to interact with the file system. This ensures that the service can operate effectively within containers. By implementing this background service, the system will be able to automatically detect and manage external storage devices, providing a seamless and user-friendly experience. This will significantly improve the system's ability to adapt to dynamic storage environments.

2.2 Path Watching Strategies

To effectively monitor external storage devices, two complementary approaches will be used: active polling and filesystem watching.

Option A: Active Polling

The active polling strategy involves periodically scanning mount directories for new storage devices. This approach provides a reliable way to detect changes in storage status, even in environments where filesystem events are not supported. The StorageScanner class will implement this strategy. The scan_external_mounts method will periodically scan the specified base paths for new storage devices. This method will filter out system volumes (e.g., "Macintosh HD", "System Volume Information") to avoid false positives. It will also check the writability and available space of each detected storage device to ensure it is suitable for data storage. By using active polling, the system can reliably detect external storage devices and provide users with up-to-date information about available storage.

class StorageScanner:
    def scan_external_mounts(self, base_paths: List[str]) -> List[ExternalStorage]:
        """
        Periodically scan mount directories for new storage devices
        Filter out system volumes (e.g., "Macintosh HD", "System Volume Information")
        Check writability and available space
        """

Option B: Filesystem Watching

The filesystem watching strategy involves using operating system-specific APIs to react to mount and unmount events in real-time. This approach provides a more efficient way to monitor storage devices compared to active polling. The StorageWatcher class will implement this strategy. On Linux, it will use inotify to monitor filesystem events. On macOS, it will use FSEvents. As a fallback, it will use polling if neither inotify nor FSEvents are available. The watch_mount_points method will monitor the specified base paths and invoke a callback function when a mount or unmount event occurs. This allows the system to react immediately to changes in storage status. By using filesystem watching, the system can provide real-time updates about external storage devices, enhancing the user experience.

class StorageWatcher:
    def watch_mount_points(self, base_paths: List[str], callback: Callable):
        """
        Use inotify (Linux), FSEvents (macOS), or polling as fallback
        React to mount/unmount events in real-time
        """

2.3 Automatic Fallback Logic

To ensure data is always saved reliably, an automatic fallback logic will be implemented. The system will first check if the preferred external storage is available. If yes, the system will use the external storage for data storage. If no, the system will fall back to the default internal path. This ensures that data can always be stored, even if the preferred storage location is not accessible. Before saving any data, the system will always ensure that the data path is valid. This prevents errors and ensures data integrity. The system will log all storage transitions for debugging purposes. This helps to identify and resolve any issues related to storage management. By implementing this automatic fallback logic, the system can provide a reliable and robust data storage solution.

1. Check if preferred external storage is available
2. If yes → use external storage
3. If no → fallback to default internal path
4. Always ensure data path is valid before saving
5. Log storage transitions for debugging

Phase 3: REST API Endpoints

To allow external applications and the frontend to interact with the storage management system, a set of REST API endpoints will be implemented.

3.1 Storage Status Endpoint

This endpoint provides information about the current storage status, including the active path, fallback path, available external drives, and scan settings. The response includes the active_path, which indicates the current storage location, and the fallback_path, which specifies the path used when external storage is unavailable. The available_external_drives array lists the available external storage devices, including their paths, labels, writability status, free space, and active status. The scan_enabled flag indicates whether external storage scanning is enabled, and the mount_paths array lists the directories being monitored for external storage devices.

GET /api/storage/status
Response:
{
  "active_path": "/media/usb-drive-1/datasets",
  "fallback_path": "/datasets",
  "available_external_drives": [
    {
      "path": "/media/usb-drive-1",
      "label": "USB_DRIVE",
      "writable": true,
      "free_space_gb": 128.5,
      "is_active": true
    }
  ],
  "scan_enabled": true,
  "mount_paths": ["/media", "/Volumes"]
}

3.2 List External Drives

This endpoint provides a list of all available external drives, including their paths, labels, writability status, free space, total space, and filesystem type.

GET /api/storage/external-drives
Response:
{
  "drives": [
    {
      "path": "/media/usb-drive-1",
      "label": "USB_DRIVE",
      "writable": true,
      "free_space_gb": 128.5,
      "total_space_gb": 256.0,
      "filesystem": "ext4"
    }
  ]
}

3.3 Set Active Storage Path

This endpoint allows users to set the active storage path. The request body includes the desired path and a flag indicating whether the change should be persisted across reboots. The response indicates whether the operation was successful and includes the new active path and persistence status.

POST /api/storage/set-active-path
Request:
{
  "path": "/media/usb-drive-1/datasets",
  "persist": true
}
Response:
{
  "success": true,
  "active_path": "/media/usb-drive-1/datasets",
  "persisted": true
}

3.4 Get Configuration Paths

This endpoint provides the current configuration paths, including the config path, data path, and active data path.

GET /api/storage/config-paths
Response:
{
  "config_path": "/home/user/ImSwitchConfig",
  "data_path": "/datasets",
  "active_data_path": "/media/usb-drive-1/datasets"
}

3.5 Update Configuration Paths

This endpoint allows users to update the configuration paths. The request body includes the new config path and data path, as well as a flag indicating whether the changes should be persisted.

POST /api/storage/update-config-path
Request:
{
  "config_path": "/custom/config/path",
  "data_path": "/custom/data/path",
  "persist": true
}

Phase 4: WebSocket Event System

To provide real-time updates to the frontend, a WebSocket event system will be implemented. This system will emit events when storage-related changes occur.

4.1 Real-time Notifications

The WebSocket event system will provide real-time notifications for various storage-related events. The storage:device-mounted event is emitted when a new external drive is detected. The storage:device-unmounted event is emitted when an external drive is removed. The storage:path-changed event is emitted when the active storage path is changed. The storage:low-space-warning event is emitted when the available space falls below a specified threshold. Each event payload includes a timestamp and data specific to the event type.

Example event payload:

{
  "event": "storage:device-mounted",
  "timestamp": "2025-11-12T10:30:00Z",
  "data": {
    "path": "/media/usb-drive-1",
    "label": "USB_DRIVE",
    "free_space_gb": 128.5,
    "auto_switched": false
  }
}

Phase 5: Frontend Integration (not part of this issue but we should prepare this for later steps)

5.1 Storage Management UI Component

To provide users with a way to manage storage, a UI component will be created. This component will display the current active storage path, show available external drives, allow users to switch storage locations, indicate storage status (available space, health), and receive WebSocket notifications for real-time updates.

5.2 User Workflow

The user workflow will be as follows: When the user starts ImSwitch (Docker or native) and no USB stick is connected, data saves to the default path (/datasets). When the user inserts a USB stick, the frontend receives a WebSocket notification: "New storage device detected." The UI shows a notification: "USB_DRIVE (128 GB free) available. Switch to it?" When the user clicks "Switch," a POST request is sent to /api/storage/set-active-path with persist=true. All new data saves to the USB stick. When the user removes the USB stick, the system automatically falls back to /datasets. After a reboot, the system remembers the USB preference and auto-switches when available.

Implementation Details

Directory Structure Refactoring

Current Code (dirtools.py) Problems:

  • Module-level conditionals make path logic hard to follow
  • No clear separation between config and data paths
  • External drive scanning is partial and unclear

Proposed Refactoring

class StoragePathManager:
    """
    Single source of truth for all storage path management
    """
    def __init__(self, config: StorageConfiguration):
        self.config = config
        self.scanner = StorageScanner()
        self.watcher = StorageWatcher()
        
    def get_active_data_path(self) -> str:
        """Returns current active data storage path"""
        
    def get_config_path(self) -> str:
        """Returns configuration file path"""
        
    def set_data_path(self, path: str, persist: bool = False) -> bool:
        """Set new data path with validation"""
        
    def scan_external_drives(self) -> List[ExternalStorage]:
        """Scan for available external storage"""
        
    def is_path_valid(self, path: str) -> bool:
        """Validate path exists and is writable"""

Configuration File Updates

Current: main.py hardcoded arguments Proposed: Configuration file + CLI override

# config/storage_config.json
{
  "data_paths": {
    "default": "/datasets",
    "fallback": "/home/user/ImSwitchConfig/data",
    "preferred_external": null,
    "persist_preference": true
  },
  "config_paths": {
    "default": "/home/user/ImSwitchConfig"
  },
  "external_scanning": {
    "enabled": true,
    "mount_points": ["/media", "/Volumes"],
    "scan_interval_seconds": 10,
    "auto_switch": false
  }
}

Docker-Specific Considerations

docker-compose.yml additions:

volumes:
  - /media:/media:rw  # External drive mount point
  - /Volumes:/Volumes:rw  # macOS external drives
  - ./datasets:/datasets:rw  # Default data location
  - ./config:/config:rw  # Configuration location

environment:
  - IMSWITCH_DATA_PATH=/datasets
  - IMSWITCH_CONFIG_PATH=/config
  - IMSWITCH_SCAN_EXTERNAL=true
  - IMSWITCH_EXTERNAL_MOUNT_PATHS=/media,/Volumes

Migration Plan

Phase 1: Backward Compatibility Layer

  • Keep existing arguments functional
  • Add deprecation warnings
  • Provide migration guide

Phase 2: New API Implementation

  • Implement StoragePathManager
  • Add REST endpoints
  • Add WebSocket events

Phase 3: Frontend Integration

  • Create storage management UI
  • Test with real hardware

Phase 4: Documentation & Rollout

  • Update README
  • Create Docker deployment guide
  • Provide configuration examples

Related Files to Modify

Backend

  • imswitch/__init__.py - Global configuration constants
  • imswitch/__main__.py - CLI argument handling
  • imswitch/imcommon/model/dirtools.py - Path management logic
  • imswitch/imcontrol/model/configfiletools.py - Config file handling
  • main.py - Entry point configuration
  • /docker/entrypoint.sh - Entry point for docker

New Files Needed

  • imswitch/imcommon/model/storage_manager.py - Core storage management
  • imswitch/imcommon/model/storage_scanner.py - External drive detection
  • imswitch/imcontrol/controller/StorageController.py - API endpoints
  • imswitch/config/storage_config.json - Default configuration

Frontend (microscope-app) (later, prepare description for frontend team)

  • src/backendapi/storageAPI.js - Storage API client
  • src/components/StorageManager.jsx - Storage UI component
  • src/state/slices/storageSlice.js - Redux storage state

References

  • Current implementation: imswitch/imcommon/model/dirtools.py
  • Configuration: imswitch/imcontrol/model/configfiletools.py
  • Entry point: main.py

To further enhance your understanding of file system management and best practices, consider exploring resources on The Linux Documentation Project, which provides extensive guides and tutorials on various aspects of Linux system administration.