Skip to main content

Command Palette

Search for a command to run...

System Maintenance

Updated
8 min read
A

I am a Student, who finds beauty in simple things. I like to teach sometimes.

Essential Utilities

Effective system administration hinges on maintaining system health and performance. This involves regular cleanup, ensuring data recoverability, and actively monitoring operational parameters. Several tools are fundamental to these tasks.

Disk Cleanup Tools

The primary function of disk cleanup utilities is to liberate storage capacity on a computer's hard drives. These tools achieve this by identifying and removing files that are no longer required for system operation or by the user. The accumulation of such files can consume considerable disk space and, in some cases, slightly degrade system performance by increasing the overhead for file system management.

Disk cleanup tools typically scan the storage for predefined categories of dispensable files. These categories often include:

  • Temporary Internet Files: Cached web pages, images, and other media from browser activity.

  • Downloaded Program Files: Installers or ActiveX controls that are not needed after initial use.

  • Recycle Bin/Trash: Files deleted by the user but not yet permanently removed from the system.

  • Temporary System Files: Files created by the operating system or applications for transient purposes.

  • System Error Memory Dump Files: Files created when system errors occur, used for debugging but often large.

  • Previous Windows installations/Old OS versions: Files retained after an operating system upgrade that allow rollback but consume significant space.

  • Log Files: Application and system logs that can grow extensively over time.

  • Thumbnails: Cached image previews.

  • Delivery Optimization Files: Files used by peer-to-peer update mechanisms in some operating systems.

Upon completion of a scan, these utilities present a report, usually quantifying the space that can be reclaimed from each category. The user can then select which categories of files to delete. Modern operating systems like Windows include built-in tools such as Disk Cleanup (cleanmgr.exe) and, more recently, Storage Sense, which can automate some of these cleanup processes based on user-defined schedules or when disk space is low. Storage Sense offers more automated management, such as automatically clearing the Recycle Bin after a certain period or deleting temporary files that are no longer in use.

Backups and Restore

Backup and restore mechanisms are critical for data protection and business continuity. A backup is a copy of data stored on a separate medium, intended for recovery in case the original data is lost or corrupted due to hardware failure, software issues, human error, or malicious attacks. The restore process involves retrieving data from these backup copies and returning it to its original location or an alternate system.

Several backup strategies are employed, each with distinct characteristics regarding backup time, storage requirements, and restoration complexity:

  • Full Backup: This method copies all selected data. While it is the most straightforward for restoration (as only one backup set is needed), it is also the most time-consuming and requires the largest amount of storage space. Full backups often serve as a baseline for other backup types.

  • Incremental Backup: An incremental backup copies only the data that has changed since the last backup, regardless of whether the last backup was full or incremental. This results in smaller backup sizes and faster backup operations. However, restoration can be more complex as it requires the last full backup and all subsequent incremental backups in sequence.

  • Differential Backup: This type copies all data that has changed since the last full backup. Differential backups are quicker to perform than full backups and require less storage. Restoration is simpler than with incremental backups, needing only the last full backup and the latest differential backup. Subsequent differential backups will grow in size until the next full backup is performed.

Two key metrics guide backup strategy formulation:

  • Recovery Time Objective (RTO): The maximum acceptable duration for which a system or application can be offline after a failure or disaster. This objective dictates how quickly the restoration process must be completed.

  • Recovery Point Objective (RPO): The maximum acceptable amount of data loss, measured in time, from the point of failure. This objective determines the minimum frequency of backups. For instance, an RPO of one hour means that backups must be performed at least every hour, and in the event of a failure, no more than one hour's worth of data would be lost.

The restoration process involves identifying the appropriate backup set (based on the RPO and the nature of the data loss), accessing the backup media, and using backup software or system tools to copy the data back to the desired location, overwriting existing corrupted files or filling in missing ones.

Monitoring: uptime, vmstat, iostat

Continuous monitoring of system parameters provides insights into performance and stability, enabling proactive issue resolution. Several command-line utilities are indispensable for this in Unix-like operating systems.

uptime

The uptime command provides a concise summary of how long the system has been running, the number of currently logged-in users, and the system load averages for the past 1, 5, and 15 minutes.

A typical output looks like:

10:00:01 up 35 days, 18:02, 2 users, load average: 0.08, 0.15, 0.12

  • 10:00:01: The current system time.

  • up 35 days, 18:02: The duration the system has been operational since the last boot.

  • 2 users: The number of users currently logged into the system.

  • load average: 0.08, 0.15, 0.12: These three figures represent the average number of processes in the system's run queue (i.e., running or waiting for CPU time) or in an uninterruptible sleep state (typically waiting for I/O) over the last 1, 5, and 15 minutes, respectively. A load average of 1.00 on a single-core CPU implies it is fully utilized; on a multi-core system, a load of 1.00 per core indicates full utilization of that core.

The system's uptime information can also be read directly from the /proc/uptime pseudo-file. This file contains two numbers: the total number of seconds the system has been up, and the total number of seconds the system has spent in an idle state (this second value is cumulative across all CPU cores).

vmstat

The vmstat (virtual memory statistics) command reports information about processes, memory, paging, block I/O, traps, disk, and CPU activity. It is useful for identifying system bottlenecks. vmstat can provide a single report or continuous reports at specified intervals. The command vmstat [delay [count]] allows specifying an interval (delay) in seconds between updates and the number of updates (count).

Key fields in vmstat output include:

  • Procs:

    • r: The number of runnable processes (running or waiting for run time).

    • b: The number of processes in uninterruptible sleep1 (usually waiting for I/O).

  • Memory:

    • swpd: The amount of virtual memory used (in kilobytes, unless otherwise specified).

    • free: The amount of idle memory (KB).

    • buff: The amount of memory used as buffers (KB).

    • cache: The amount of memory used as page cache (KB).

  • Swap:

    • si: Amount of memory swapped in from disk (KB/s).

    • so: Amount of memory swapped out to disk (KB/s).

  • IO:

    • bi: Blocks received from a block device (blocks/s).

    • bo: Blocks sent to a block device (blocks/s).

  • System:

    • in: The number of interrupts per second, including the clock.

    • cs: The number of context switches2 per second.

  • CPU: (Percentages of total CPU time)

    • us: Time spent running non-kernel code (user time, including nice time).

    • sy: Time spent running kernel code (system time).

    • id: Time spent idle. Prior to Linux 2.5.41, this includes3 I/O-wait time.

    • wa: Time spent waiting for I/O. Prior to Linux 2.5.41,4 shown as 0.

    • st: Time stolen from a virtual machine (by the hypervisor).

iostat

The iostat (input/output statistics) command is used for monitoring system input/output device loading by observing the time the devices are active in relation to their average transfer rates.5 It can report CPU utilization statistics and device I/O statistics. The command iostat [options] [interval [count]] allows periodic reporting.

The CPU utilization report from iostat typically includes:

  • %user: Percentage of CPU utilization that occurred while executing at the user level (application).

  • %nice: Percentage of CPU utilization that occurred while executing at the user level6 with nice priority.

  • %system: Percentage of CPU utilization that occurred while executing at the system level (kernel).

  • %iowait: Percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.7

  • %steal: Percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another8 virtual processor.

  • %idle: Percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.9

The device utilization report provides metrics for each block device or partition:

  • Device:: The device or partition name (e.g., sda, dm-0).

  • tps: Transfers per second that were issued to the device. A transfer is an I/O request to the device. Multiple logical requests can be combined into a single I/O request to the device.10

  • Blk_read/s or kB_read/s or MB_read/s: Amount of data read from the device expressed in blocks, kilobytes, or megabytes per second.

  • Blk_wrtn/s or kB_wrtn/s or MB_wrtn/s: Amount of data written to the device expressed in blocks, kilobytes, or megabytes per second.

  • Blk_read or kB_read or MB_read: Total blocks/kilobytes/megabytes read from this device since boot (or since last report if interval is used).

  • Blk_wrtn or kB_wrtn or MB_wrtn: Total blocks/kilobytes/megabytes written to this device since boot (or since last report if interval is used).

Using options like -x provides extended statistics offering more detailed performance data for devices (e.g., average queue length, average wait times, service times, and %util which is the percentage of CPU time during which I/O requests were issued to the device). The -k and -m options display statistics in kilobytes and megabytes per second, respectively, which can be more human-readable than blocks.

By consistently applying these tools and strategies, system administrators can maintain efficient, reliable, and recoverable computing environments.

More from this blog

Aman Pathak

58 posts

Things I would speak if the person in front of me is me