Describe new zone resource management features in the Solaris 10 OS 8/07 release
New Zones Features
On September 4, 2007, Solaris 10 8/07 became available.
This update to Solaris 10 has many new features. Of those, many enhance Solaris Containers either directly or indirectly. This update brings the most important changes to Containers since they were introduced in March of 2005. A brief introduction to them seems appropriate, but first a review of the previous update.
Solaris 10 11/06 added four features to Containers. One of them is called "configurable privileges" and allows the platform administrator to tailor the abilities of a Container to the needs of its application.
At least as important as that feature was the new ability to move (also called 'migrate') a Container from one Solaris 10 computer to another. This uses the 'detach' and 'attach' sub-commands to zoneadm(1M).
Other, minor new features, included:
- rename a zone (i.e. Container)
- move a zone to a different place in the file system on the same computer
New Features in Solaris 10 8/07 that Enhance Containers
New Resource Management Features
Solaris 10 8/07 has improved the resource management features of Containers. Some of these are new resource management features and some are improvements to the user interface. First I will describe three new "RM" features.
Earlier releases of Solaris 10 included the Resource Capping Daemon. This tool enabled you to place a 'soft cap' on the amount of RAM (physical memory) that an application, user or group of users could use. Excess usage would be detected by rcapd. When it did, physical memory pages owned by that entity would be paged out until the memory usage decreased below the cap.
Although it was possible to apply this tool to a zone, it was cumbersome and required cooperation from the administrator of the Container. In other words, the root user of a capped Container could change the cap. This made it inappropriate for potentially hostile environments, including service providers.
Solaris 10 8/07 enables the platform administrator to set a physical memory cap on a Container using an enhanced version of rcapd. Cooperation of the Container's administrator is not necessary - only the platform administrator can enable or disable this service or modify the caps. Further, usage has been greatly simplified to the following syntax:
global# zonecfg -z myzone
zonecfg:myzone> add capped-memory
zonecfg:myzone:capped-memory> set physical=500m
zonecfg:myzone:capped-memory> end
zonecfg:myzone> exit
The next time the Container boots, this cap (500MB of RAM) will be applied to it. The cap can be also be modified while the Container is running, with:
global# rcapadm -z myzone -m 600m
Because this cap does not reserve RAM, you can over-subscribe RAM usage. The only drawback is the possibility of paging.
For more details, see the online documentation.
Virtual memory (i.e. swap space) can also be capped. This is a 'hard cap.' In a Container which has a swap cap, an attempt by a process to allocate more VM than is allowed will fail. (If you are familiar with system calls: malloc() will fail with ENOMEM.)
The syntax is very similar to the physical memory cap:
global# zonecfg -z myzone
zonecfg:myzone> add capped-memory
zonecfg:myzone:capped-memory> set swap=1g
zonecfg:myzone:capped-memory> end
zonecfg:myzone> exit
This limit can also be changed for a running Container:
global# prctl -n zone.max-swap -v 2g -t privileged -r -e deny -i zone myzone
Just as with the physical memory cap, if you want to change the setting for a running Container and for the next time it boots, you must use zonecfg and prctl or rcapadm.
The third new memory cap is locked memory. This is the amount of physical memory that a Container can lock down, i.e. prevent from being paged out. By default a Container now has the proc_lock_memory privilege, so it is wise to set this cap for all Containers.
Here is an example:
global# zonecfg -z myzone
zonecfg:myzone> add capped-memory
zonecfg:myzone:capped-memory> set locked=100m
zonecfg:myzone:capped-memory> end
zonecfg:myzone> exit
Simplified Resource Management Features
Dedicated CPUs
Many existing resource management features have a new, simplified user interface. For example, "dedicated-cpus" re-use the existing Dynamic Resource Pools features. But instead of needing many commands to configure them, configuration can be as simple as:
global# zonecfg -z myzone
zonecfg:myzone> add dedicated-cpu
zonecfg:myzone:dedicated-cpu> set ncpus=1-3
zonecfg:myzone:dedicated-cpu> end
zonecfg:myzone> exit
After using that command, when that Container boots, Solaris:
- removes a CPU from the default pool
- assigns that CPU to a newly created temporary pool
- associates that Container with that pool, i.e. only schedules that Container's processes on that CPU
Further, if the load on that CPU exceeds a default threshold and another CPU can be moved from another pool, Solaris will do that, up to the maximum configured amount of three CPUs. Finally, when the Container is stopped, the temporary pool is destroyed and its CPU(s) are placed back in the default pool.
Also, three existing project resource controls were applied to Containers:
global# zonecfg -z myzone
zonecfg:myzone> set max-shm-memory=100m
zonecfg:myzone> set max-shm-ids=100
zonecfg:myzone> set max-msg-ids=100
zonecfg:myzone> set max-sem-ids=100
zonecfg:myzone> exit
Fair Share Scheduler
A commonly used method to prevent "CPU hogs" from impacting other workloads is to assign a number of CPU shares to each workload, or to each zone. The relative number of shares assigned per zone guarantees a relative minimum amount of CPU power. This is less wasteful than dedicating a CPU to a Container that will not completely utilize the dedicated CPU(s).
Several steps were needed to configure this in the past. Solaris 10 8/07 simplifies this greatly: now just two steps are needed. The system must use FSS as the default scheduler. This command tells the system to use FSS as the default scheduler the next time it boots.
global# dispadmin -d FSS
Also, the Container must be assigned some shares:
global# zonecfg -z myzone
zonecfg:myzone> set cpu-shares=100
zonecfg:myzone> exit
Shared Memory Accounting
One feature simplification is not a reduced number of commands, but reduced complexity in resource monitoring. Prior to Solaris 10 8/07, the accounting of shared memory pages had an unfortunate subtlety. If two processes in a Container shared some memory, per-Container summaries counted the shared memory usage once for every process that was sharing the memory. It would appear that a Container was using more memory than it really was.
This was changed in 8/07. Now, in the per-Container usage section of prstat and similar tools, shared memory pages are only counted once per Container.
Global Zone Resource Management
Solaris 10 8/07 adds the ability to persistently assign resource controls to the global zone and its processes. These controls can be applied:
- pool
- cpu-shares
- capped-memory: physical, swap, locked
- dedicated-cpu: ncpus, importance
Example:
global# zonecfg -z global
zonecfg:myzone> set cpu-shares=100
zonecfg:myzone> set scheduling-class=FSS
zonecfg:myzone> exit
Use those features with caution. For example, assigning a physical memory cap of 100MB to the global zone will surely cause problems...
New Boot Arguments
The following boot arguments can now be used:
Argument or Option | Meaning |
-s | Boot to the single-user milestone |
-m | Boot to the specified milestone |
-i | Boot the specified program as 'init'. This is only useful with branded zones. |
Allowed syntaxes include:
global# zoneadm -z myzone boot -- -s
global# zoneadm -z yourzone reboot -- -i /sbin/myinit
ozone# reboot -- -m verbose
In addition, these boot arguments can be stored with zonecfg, for later boots.
global# zonecfg -z myzone
zonecfg:myzone> set bootargs="-m verbose"
zonecfg:myzone> exit
Configurable Privileges
Of the existing three DTrace privileges, dtrace_proc and dtrace_user can now be assigned to a Container. This allows the use of DTrace from within a Container. Of course, even the root user in a Container is still not allowed to view or modify kernel data, but DTrace can be used in a Container to look at system call information and profiling data for user processes.
Also, the privilege proc_priocntl can be added to a Container to enable the root user of that Container to change the scheduling class of its processes.
IP Instances
This is a new feature that allows a Container to have exclusive access to one or more network interfaces. No other Container, even the global zone, can send or receive packets on that NIC.
This also allows a Container to control its own network configuration, including routing, IP Filter, the ability to be a DHCP client, and others. The syntax is simple:
global# zonecfg -z myzone
zonecfg:myzone> set ip-type=exclusive
zonecfg:myzone> add net
zonecfg:myzone:net> set physical=bge1
zonecfg:myzone:net> end
zonecfg:myzone> exit
IP Filter Improvements
Some network architectures call for two systems to communicate via a firewall box or other piece of network equipment. It is often desirable to create two Containers that communicate via an external device, for similar reasons. Unfortunately, prior to Solaris 10 8/07 that was not possible. In 8/07 the global zone administrator can configure such a network architecture with the existing IP Filter commands.
Upgrading and Patching Containers with Live Upgrade
Solaris 10 8/07 adds the ability to use Live Upgrade tools on a system with Containers. This makes it possible to apply an update to a zoned system, e.g. updating from Solaris 10 11/06 to Solaris 10 8/07. It also drastically reduces the downtime necessary to apply some patches.
The latter ability requires more explanation. An existing challenge in the maintenance of zones is patching - each zone must be patched when a patch is applied. If the patch must be applied while the system is down, the downtime can be significant.
Fortunately, Live Upgrade can create an Alternate Boot Environment (ABE) and the ABE can be patched while the Original Boot Environment (OBE) is still running its Containers and their applications. After the patches have been applied, the system can be re-booted into the ABE. Downtime is limited to the time it takes to re-boot the system.
An additional benefit can be seen if there is a problem with the patch and that particular application environment. Instead of backing out the patch, the system can be re-booted into the OBE while the problem is investigated.
Branded Zones (Branded Containers)
Some times it would be useful to run an application in a Container, but the application is not yet available for Solaris, or is not available for the version of Solaris that is being run. To run an application like that, perhaps a special Solaris environment could be created that only runs applications for that version of Solaris, or for that operating system.
Solaris 10 8/07 contains a new framework called Branded Zones. This framework enables the creation and installation of Containers that are not the default 'native' type of Containers, but have been tailored to run 'non-native' applications.
Solaris Containers for Linux Applications
The first brand to be integrated into Solaris 10 is the brand called 'lx'. This brand is intended for x86 appplications which run well on CentOS 3 or Red Hat Linux 3. This brand is specific to x86 computers. The name of this feature isSolaris Containers for Linux Applications.
No comments:
Post a Comment