Tuesday, May 3, 2011

Solaris Zones Partitioning Technology


Contents:

1.0 Introduction
The Solaris Zones[1][2] feature in the Solaris Operating System is a partitioning technology used to virtualize operating system services and provide an isolated and secure environment for hosting and running applications. A zone is a virtualized operating system environment created within a single instance of the Solaris Operating System. Two types of zone exist: global and non-global.
A global zone contains a fully functional installation of the Solaris OS that is bootable by the system hardware. An installation of the Solaris OS becomes the global zone when it is booted by the system hardware. Only one global zone runs on a system. The global zone administrator creates non-global zones with zonecfg(1M) and zoneadm(1M). The global zone controls the installation, maintenance, operation, and destruction of all non-global zones.
The Solaris Zones feature provides service virtualization and namespace isolation to processes running in a non-global zone. Processes in a non-global zone are isolated from processes in other zones. This isolation prevents processes running in a non-global zone from monitoring or affecting processes running in other zones. Even processes running with superuser credentials in a non-global zone cannot view or affect activity in other zones.
A zone also provides an abstract layer separating applications from the physical attributes of the machine on which they are deployed. Examples of these attributes include physical device paths and network interface names.
Zones can be used on any machine that is supported on the Solaris 10 release. The upper limit for the number of zones on a single physical server is 8192. The number of zones that can effectively be hosted on a single physical server is dependent upon the total resource requirements of applications running in all of the zones combined.
1.1 Solaris Zones Feature: A Component of Solaris Containers
A Solaris container is a virtualized runtime environment that has established limits for a workload to consume system resources, such as the CPU.Solaris Containers use Solaris OS resource-management features with Solaris Zones to deliver a virtualized environment that has fixed resource boundaries for workloads.
A workload is an aggregation of all the processes of an application or group of applications. In addition to the process entity, the Solaris OS adds two facilities to identify workloads: the project and the task. If Solaris resource-management features are not used, the Solaris OS responds to workload demands by giving all activity on the system equal access to resources. Resource-management features in the Solaris OS enable system administrators to treat workloads individually and to allocate the quantity of resources that a workload receives. With these resource-management features, a system administrator can do the following:
  • Restrict access to a specific resource
  • Offer resources to workloads on a preferential basis
  • Isolate workloads from each other
The workload abstractions have been extended to work with Solaris Zones. Each zone has its own project(4) database. Zone-wide limits are added to the resource controls. Working together, Solaris Zones and Solaris resource-management features allow the system administrator to create a virtualized operating environment, a zone, which has its specific resource boundaries.

2.0 Benefits of Solaris Zones Software
The major benefit of Solaris Zones is lowering total cost of ownership (TCO) by using server consolidation. Solaris Zones software offers the following possibilities:
  • It is possible to create many virtualized operating environments -- zones -- over a single instance of operating system on a single physical server.
  • Each non-global zone has its own virtualized identity, file systems, devices, networking, operating system resources, and security.
  • Each non-global zone is isolated from every other zone except the global zone.
  • Application failure is isolated and contained within a non-global zone.
  • Inter-zone communication occurs only through networking.
  • No application porting is required as applications still use the Solaris ABI/API.
  • Each non-global zone can be individually rebooted and shut down without affecting other zones.
  • System resources such as CPU and network bandwidth can be partitioned for each zone. An administrator can use Solaris resource-management mechanisms to give finer granularity of resource control.
  • Administration of an application environment can be delegated to a non-global zone administrator.

3.0 Zone Creation and Bring-Up
3.1 Zone Configuration and Installation
The global zone administrator uses zonecfg(1M)zoneadm(1M), and zlogin(1M) to manage zones. zonecfg(1M) creates a zone configuration file, /etc/zones/my-zone.xml, for each non-global zone. my-zone.xml describes my-zone configuration. zoneadm(1M) takes my-zone.xml as input and uses the live_upgrade(5) mechanism to create a boot environment (BE) at the location specified in the zonepath field of/etc/zones/my-zone.xml. Also, zoneadm(1M) launches a zoneadmd(1M) daemon process to manage the zone state transition after a non-global zone BE is created.
A non-global zone can be in one of these states:
  • CONFIGURED
  • INCOMPLETE
  • INSTALLED
  • READY
  • RUNNING
  • SHUTTING_DOWN
  • DOWN
During a normal non-global zone bring-up process, a zone goes through these states: CONFIGURED -> INSTALLED -> READY -> RUNNING (see Figure 1).
Normal Non-Global Zone Bring-Up Process
Figure 1: Normal Non-Global Zone Bring-Up Process States
3.2 Zone Bring-Up Using zoneadmd(1M)
zoneadmd(1M) is a system daemon for creating the non-global zone virtual platform and managing state transition of the virtual platform. The virtual platform components include network interfaces, devices, zoneadmd(1M) daemon, and zone console. One zoneadmd(1M) process exists for each non-global zone on the system. The functions of zoneadmd(1M) are:
  • To implement a door server for clients to request zone state changes. The global administrator uses zoneadm(1M)zonecfg(1M), andzlogin(1M)to manage zones. These commands communicate with zoneadmd(1M) through Solaris libdoor(3LIB).
  • To interface with zoneadm(1M) and zonecfg(1M), and zlogin(1M) to create, bring-up, and tear down the non-global zone virtual platform. During a typical non-global zone bring-up, zoneadmd(1M) performs the following actions:
    • Creates and initializes kernel zone structure and hooks.
    • Creates /dev directories and files.
    • Mounts file systems. The inherited-pkg-dir directories in my-zone.xml and /dev are mounted as a loopback file system. Other file systems in vfstab/proc/system/contract, and swap file systems are mounted as usual.
    • Communicates with devfsadmd(1M) to lay out devices for the zone.
    • Creates and configures the logical network interface for the zone.
    • Instantiates the zone console device. One instance of zcons(7D) driver exists for each non-global zone. Each instance of the driver represents a global-zone/non-global-zone pair.
    • Configures process runtime attributes such as resource controls, pool bindings, and fine-grained privileges.
    • Launches the zone's init(1M) process.
    • Creates zsched (the kernel dummy process for a non-global zone). During a non-global zone booting, zsched starts init(1M).
3.3 Sample Zone Configuration and Bring-Up
Here is a quick sample zone configuration where the zone name is my-zone and the IPv4 address is 10.0.0.1:
global# zonecfg -z my-zone
zonecfg:my-zone> create
/* default is sparse root model, See section 3.4 for details*/
zonecfg:my-zone> set zonepath=/export/home/my-zone
t address=10.0.0.1 zonecfg:
zonecfg:my-zone> add net zonecfg:my-zone:net> s emy-zone:net> set physical=eri0 zonecfg:my-zone:net> end
y-zone> ^D
zonecfg:my-zone> verify zonecfg:my-zone> commit zonecfg:
m
At this point, a zone configuration file, /etc/zones/my-zone.xml, has been created containing the above parameters and several inherited-pkg-dir fields for loopback-mounted file systems. Once a zone configuration file is established, the global zone administrator uses zoneadm(1M)to install the zone configuration:
global# zoneadm -z my-zone install
At the completion of the zoneadm(1M) install command, a boot environment is created with the live_upgrade(5) facilities. Zone boot is similar to booting a regular Solaris environment, except that zoneadm(1M) is used to create the zone runtime:
global# zoneadm -z my-zone boot 
This boots the zone. The appropriate file systems are mounted inside the zone, zoneadmd(1M) is started, and so on. When a zone is booted for the first time after installation, it has no internal configuration for naming schemes, no locale or time zone, no root password, and so on. It is necessary to access the zone's console to answer the prompts and set these up. This should be done using the zlogin(1M) command:
# zlogin -C my-zone
[connected to zone my-zone console]
3.4 Zone Root File System
Two ways exist to configure a non-global zone's root file system: whole-root model and sparse-root model.
The whole-root model provides the maximum configurability by installing all of the required and any selected optional Solaris software packages into the private file systems of the zone. The advantages of this model include the ability for zone administrators to customize their zone's file-system layout (for example, creating a /usr/local) and add arbitrary unbundled or third-party packages. The disadvantages of this model include the loss of sharing of text segments from executables and shared libraries by the virtual memory system, and a much heavier disk footprint -- approximately an additional 2 Gbyte -- for each non-global zone configured as such. The global zone administrator uses the sub-command create -b ofzonecfg(1M) to create a zone with the whole root mode (or alternatively to remove the inherited-pkg-dir directories in my-zone.xml).
The sparse-root model optimizes the sharing of objects by installing only a subset of the root packages (those with the pkginfo(4) parameterSUNW_PKGTYPE set to root) and using read-only loopback file systems to gain access to other files. This is similar to the way a diskless client is configured, where /usr and other file systems are mounted over the network with NFS. By default with this model, the directories /lib/platform,/sbin and /usr are mounted as loopback file systems. The advantages of this model are greater performance due to the efficient sharing of executables and shared libraries, and a much smaller disk footprint for the zone itself. The sparse-root model only requires approximately 100 Mbyte of file system space for the zone itself.

4.0 Zone Security
Each non-global zone has a security boundary around it. The security boundary is maintained by:
  • Adopting Solaris 10 Process Rights Management (privileges(5))
  • Name spaces (for example, /proc/dev) isolation, and
  • Inter-zone communication using only the network (looped back inside IP)
4.1 Process Rights Management
The traditional UNIX privilege model associates all privileges with the effective uid 0 (root). This all-or-nothing approach has a number of shortcomings:
  • It is not possible to extend an ordinary user's capabilities with a restricted set of privileges.
  • Each privileged process has complete reign of the system. A privileged process can be leveraged to gain full access to the system.
The Solaris 10 OS addresses these shortcomings with the implementation of the principle of Process Rights Management[3], which restricts a user to no more privilege than necessary to perform a job. Process Rights Management extends the Solaris process model with privilege sets. Each privilege set contains zero or more privileges. Each process has four sets of privileges. One of the privilege sets, the Effective privilege set, determines whether a process can use a particular privilege.
These four privilege sets are:
  • Effective privilege set -- the set of privileges that a program uses at the time of execution. For a privilege to be effective, that privilege must also be in the permitted set.
  • Permitted privilege set -- the set of privileges that is available for use. Privileges can be available to a program from inheritance or from assignment. The permitted set is a subset of the inheritable set. Privileges can be removed from the permitted set, but privileges cannot be added to the set. A privilege-aware program removes the privileges that a program never uses from the program's permitted set. In this way, the program is protected from using an incorrectly assigned or inherited privilege.
  • Inheritable privilege set -- the set of privileges that a process can inherit from a parent process. Which privileges a child process actually inherits are controlled by how the process was started, and by the permitted set on the child process. For users, the inheritable set includes a basic set of privileges. Programs that are started with a call to fork(2) inherit all privileges from the parent process and can add new privileges to the process. Programs that are started with a call to exec(2) inherit all the privileges from the parent process. However, such programs cannot add any new privileges. That is, the program's permitted set equals its inheritable set. The inheritable set is limited by the value of the limit set.
  • Limit privilege set -- the upper bound of privileges that a process and its offspring can inherit. By default, the limit set is all privileges. Thus, if a user is assigned a profile that includes a program that has been assigned privilege, the user can run that program because the assigned privileges are within the user's limit set. All privileges in the permitted set might not be used at the time of execution. The limit set is only enforced at exec(2) time, allowing a process to drop privileges on exec(2), while still using them until that point in time. The permitted set is a subset of the inheritable set. The inheritable set is limited by the value of the limit set.
The exec(2) privilege set transformation rules are as follows.
Formula 1
where
C.E - the Effective privilege set of the parent process
C.P - the Permitted privilege set of the parent process
C.I - the Inheritable privilege set of the parent process
C.L - the Limit privilege set of the parent process
ess C'.P - the Permitted privilege set of the child pro
C'.E - the Effective privilege set of the child pro ccess C'.I - the Inheritable privilege set of the child process
C'.L - the Limit privilege set of the child process
All kernel security policy checks are performed using privileges only.
Formula 2
The kernel provides a basic set of privileges that a user requires in order to use the system. At login, each user inherits the basic set. The basic set can be modified with ppriv(1). The limit set is typically the full set of privileges. Currently 48 privileges are defined for a privilege set.privileges(5) lists these privileges and their definitions. If a process is privilege-aware, its effective privilege set determines the behavior of the process.
The privilege model can be ignored by a process if the Privilege Aware State (PAS) of the process is not privilege aware (NPA). The privilege state of a process is extended with a PAS which can take the values:
  • Privilege aware (PA) -- completely ignores the effective UID
  • Not privilege aware (NPA) -- behaves almost exactly like a traditional process
A process can attempt to become NPA using setpflags(2). The kernel attempts to drop PA on exec(2) if PA is not inherited.
4.2 Zone Process Privileges
All processes running in a non-global zone are privilege-aware. That means all processes in a non-global zone are constrained by the privilege sets that are assigned to them when the process is created. When the system creates a non-global zone, a kernel dummy process, zsched, is created as the root process of the zone. All processes in a non-global zone are descendants of zsched. The inheritable privilege set of zsched determines the effective privilege set of processes in the zone.
A list follows showing the privileges that processes in a non-global zone have. Because of the restricted privileges of a process in a non-global zone, certain systems may return errors. In most cases, EPERM is returned for a process that does possess the privilege. Some system calls that check PRIV_CPC_CPU or PRIV_NET_RAWACCESS may return EACCESS. Section 6.0 summarizes the system calls, library functions, and commands that can return errors when they are called in a non-global zone.
All Privileges Zone Privileges
=========================================================
PRIV_CONTRACT_EVENT PRIV_CONTRACT_EVENT
ERVER PRIV_CPC_CPU PRIV_DTRACE_PROC PRIV_DTRA
PRIV_CONTRACT_OBSERVER PRIV_CONTRACT_OB SCE_USER PRIV_DTRACE_KERNEL PRIV_FILE_CHOWN PRIV_FILE_CHOWN
TE PRIV_FILE_DAC_EXECUTE PRIV_FILE_DAC_RE
PRIV_FILE_CHOWN_SELF PRIV_FILE_CHOWN_SELF PRIV_FILE_DAC_EXEC UAD PRIV_FILE_DAC_READ PRIV_FILE_DAC_SEARCH PRIV_FILE_DAC_SEARCH
ILE_LINK_ANY PRIV_FILE_OWNER PRIV_FILE
PRIV_FILE_DAC_WRITE PRIV_FILE_DAC_WRITE PRIV_FILE_LINK_ANY PRIV_ F_OWNER PRIV_FILE_SETID PRIV_FILE_SETID PRIV_IPC_DAC_READ PRIV_IPC_DAC_READ
_ICMPACCESS PRIV_NET_ICMPACCESS PRIV_
PRIV_IPC_DAC_WRITE PRIV_IPC_DAC_WRITE PRIV_IPC_OWNER PRIV_IPC_OWNER PRIV_NE TNET_PRIVADDR PRIV_NET_PRIVADDR PRIV_NET_RAWACCESS PRIV_PROC_CHROOT PRIV_PROC_CHROOT PRIV_PROC_CLOCK_HIGHRES
PRIV_PROC_INFO PRIV_PROC_LOCK_
PRIV_PROC_AUDIT PRIV_PROC_AUDIT PRIV_PROC_EXEC PRIV_PROC_EXEC PRIV_PROC_FORK PRIV_PROC_FORK PRIV_PROC_INF OMEMORY PRIV_PROC_OWNER PRIV_PROC_OWNER PRIV_PROC_PRIOCNTL PRIV_PROC_SESSION PRIV_PROC_SESSION PRIV_PROC_SETID PRIV_PROC_SETID
_SYS_AUDIT PRIV_SYS_CONFIG PRIV_S
PRIV_PROC_TASKID PRIV_PROC_TASKID PRIV_PROC_ZONE PRIV_SYS_ACCT PRIV_SYS_ACCT PRIV_SYS_ADMIN PRIV_SYS_ADMIN PRIV_SYS_AUDIT PRI VYS_DEVICES PRIV_SYS_IPC_CONFIG PRIV_SYS_LINKDIR PRIV_SYS_MOUNT PRIV_SYS_MOUNT PRIV_SYS_NET_CONFIG PRIV_SYS_NFS PRIV_SYS_NFS PRIV_SYS_RESOURCE PRIV_SYS_RESOURCE
PRIV_SYS_SUSER_COMPAT
PRIV_SYS_TIME

5.0 Zone Resources and Service Virtualization
Solaris Zones provide a robust partitioning solution to virtualize the machine operating environment to a process. Processes in a non-global zone have access to resources and services that they require for an operating environment while their activities are isolated from the processes in other zones. These include:
  • Networking interfaces
  • File systems
  • Interprocess communication (IPC)
  • Devices
  • Process
  • Resource management facilities
  • Packaging database
5.1 Networking
Each non-global zone has its own logical network and loopback interface. Bindings between upper layer streams and logical interfaces are restricted such that a stream may only establish bindings to logical interfaces in the same zone. Likewise, packets from a logical interface can only be passed to upper layer streams in the same zone as the logical interface.
Bindings to the loopback address are kept within a zone with one exception: when a stream in one zone attempts to access the IP address of an interface in another zone.
In addition, the following networking virtualization and restrictions apply to a non-global zone:
  • Controlling the bandwidth that a zone uses is possible. This can be done using the bundled IPQoS functionality and configuring bandwidth parameters for each of the IP addresses that are configured for a particular zone.
  • IPQoS and IPSec configurations can only be done in the global zone. A zone-specific configuration can be created by specifying a zone's IP address to the configuration.
  • Raw access to layers below the transport layer (for example, IP, ARP, and DLPI to the link layer) is not allowed in a non-global zone. Thus, using DLPI to directly communicate with a link layer (NIC device driver) results in an error. snoop(1M) does not work in a non-global zone because it uses DLPI to directly access interface drivers.
  • The following networking features remain as system-wide features that can only be configured by a global administrator:
    • Routing
    • IP multipathing (IPMP)
    • Mobile IP
    • DHCP client
    • Network Cache and Accelerator (NCA)
    • Networking tuning using /etc/system and ndd(1M)
    • IP filter
5.2 File Systems
Each non-global zone has its own file system name space, although a file system can be shared among zones. The global zone file systems are loopback-mounted into a zone using lofs(7FS). In addition to lofsautofstmpfsmntfsctfsprocfs, and NFS, a client can be locally mounted in a non-global zone.
To hide a global-zone directory, for example /usr/local, from non-global zones, a global-zone administrator can create an empty directory in the global zone and configure a loopback mount for the non-global zone on top of the directory in question:
global# zonecfg -z my-zone
zonecfg:my-zone> add fs
zonecfg:my-zone:fs> set dir=/usr/local
zonecfg:my-zone:fs> set special=/empty
zonecfg:my-zone:fs> add options ro
zonecfg:my-zone:fs> set type=lofs
zonecfg:my-zone:fs> end
Several ways are available to add a file system to a non-global zone:
  • Use an LOFS mount:
  • global# newfs /dev/rdsk/c1t0d0s0
    global# mount /dev/dsk/c1t0d0s0 /mystuff
    global# zonecfg -z my-zone
    zonecfg:my-zone:fs> set
    zonecfg:my-zone> add f s dir=/usr/mystuff
    gt; set special=/mystuff zonecfg:my-zone:fs
    zonecfg:my-zone:fs &> set type=lofs
    t; end
    zonecfg:my-zone:fs& g
  • Use a UFS mount:
  • global# newfs /dev/rdsk/c1t0d0s0
    global# zonecfg -z my-zone
    zonecfg:my-zone:fs> set
    zonecfg:my-zone> add fs dir=/usr/mystuff
    1t0d0s0 zonecfg:my-zone:fs> set raw=/dev/rdsk/c1t
    zonecfg:my-zone:fs> set special=/dev/dsk/ c0d0s0 zonecfg:my-zone:fs> set type=ufs
    zonecfg:my-zone:fs> end
  • Export the device node and mount from the non-global zone:
  • global# zonecfg -z my-zone
    zonecfg:my-zone> add device
    zonecfg:my-zone:device> set match=/dev/rdsk/c1t0d0s0
    zonecfg:my-zone:fs> end
    ice zonecfg:my-zone:device>
    zonecfg:my-zone> add de v set match=/dev/dsk/c1t0d0s0 zonecfg:my-zone:fs> end
    /c1t0d0s0 /usr/mystuff
    my-zone# newfs /dev/rdsk/c1t0d0s0 my-zone# mount /dev/ds
    k
  • Mount UFS directly from the global zone:
    # mount /dev/dsk/c1t0d0s0 /export/home/my-zone/root/usr/mystuff
    
  • Add LOFI to the mix:
  • global# newfs /dev/rdsk/c1t0d0s0
    global# mount /dev/dsk/c1t0d0s0 /mystuff
    global# mkfile 1g /mystuff/myfile
    global# zonecfg -z my-zone zonecf
    global# lofiadm -a /mystuff/myfil eg:my-zone> add device
    set match=/dev/rlofi/1 zonecfg:my-zone:fs> en
    zonecfg:my-zone:device&gt ;d zonecfg:my-zone> add device
    ch=/dev/lofi/1 zonecfg:my-zone:fs> end
    zonecfg:my-zone:device> set ma
    t
5.3 Interprocess Communication (IPC)
The basic design principle of Solaris Zones is that a process in a non-global zone is only able to use IPC to communicate with other processes in the same zone. For file-system-based IPC, such as pipes (using fifofs), STREAMS (using namefs), UNIX domain sockets (using sockfs) and POSIX IPC, the unique file system name space of a zone ensures that IPC communication is within a zone.
Other IPCs, such as Solaris doors and System V IPC, have attached a zone ID to the communication objects so processes running in a non-global zone are able to access or control objects associated only with the same zone.
5.4 Devices
Devices, in general, are shared resources in a system. To make devices available in a non-global zone, therefore, requires some restrictions so that system security is not compromised.
  • Devices that expose system data are only available in the global zone. Examples of such devices are: dtrace(7D)kmem(7D)ksyms(7D),kmdb(7D)trapstat(1M)lockstat(7D), and so on.
  • The /dev name space consists of symbolic links (logical paths) to the physical paths in /devices. The /devices name space, which is available only in the global zone, reflects the current state of attached device instances created by the driver. Only the logical path /dev is visible in a non-global zone.
  • During a non-global zone bring-up, zoneadmd(1M) creates a zone-specific /dev and, then, loopback-mounts the /dev directory under the non-global zone root. The global zone administrator uses zonecfg(1M) to specify the devices to appear in a particular zone. The number of /deventries in a non-global zone is significantly less than the number in a global zone.
  • The zone administrator can change device permissions, but not create new entries.
  • Device number is a system-wide property. System calls (for example, mknod(2)), which create a special file mapping to a particular device number, return an error.
  • Solaris Volume Manager meta devices cannot be configured in a non-global zone. However, the global zone administrator can export a meta device to a local zone.
  • The global zone administrator uses the add device sub-command of zonecfg(1M) to include additional devices in a non-global zone. For example, to add /dev/dsk/c1t1d0s0 device node to the non-global zone, the administrator adds the following lines:
  • zonecfg:my-zone> add device
    zonecfg:my-zone:device> set match=/dev/dsk/c1t1d0s0
    zonecfg:my-zone:device> end
  • Utilities that configure hardware or change the /dev entries do not work in a non-global zone. These utilities include:
    • add_drv(1M)/rem_drv(1M)
    • modload(1M)/modunload(1M)
    • autopush(1M)
    • cfgadm(1M)
    • devfsadm(1M)drvconfig(1M)disks(1M)tapes(1M)ports(1M), and devlinks(1M)
  • Zone Console -- The zone console, /dev/zconsole, is implemented by the zcons(7D) driver. Each instance of the zcons driver represents a global-zone/local-zone pair. The driver channels I/O from the global zone to the non-global zone, and back. /dev/console/dev/msglog,/dev/syscon/dev/sysmsg, and /dev/systty are all symbolic links to /dev/zconsole.
  • 5.5 Processes One of the basic principles of the zone design is that processes within a non-global zone must not be able to affect or to see the activity of processes running in other zones. Each process is associated with a zone. Restricting process visibility can be enforced by limiting the process ID exposed through proc(4) accesses and its associated utilities, proc(1). In a non-global zone, the proc(4) file system exposes only the processes of that zone. The proc file system in a global zone shows all processes running on the system, including the processes of all non-global zones. Attempts to signal or to control (for example, using /proc) processes in other zones result in an error code or ESRCH (or ENOENT for proc(4)accesses) rather than EPERM. Only the process in the global zone that has the PRIV_PROC_ZONE privilege can signal or control processes in other zones. 5.6 Resource Management Solaris resource management allows a system administrator to control the resource utilization of a workload. Workload in the Solaris OS is associated with the project, task, and process entities. The traditional per-process resource control has been extended to the task and project entities. The project, like user and group, is a network-wide administrative identifier for related work. The task collects a group of processes into a manageable entity that represents a workload component. The system administrator uses prctl(1) to get or set the resource controls for running processes, tasks, and projects. rctladm(1M) used without options gives a list of system resource controls that can be manipulated by prctl(1). A persistent configuration mechanism for processor sets has been introduced with the resource pool abstraction. poolbind(1M) can be used to bind processes, tasks, or projects to resource pools. These resource-management abstractions have been extended to work with zones. Each zone has its own project(4) database. Zone-wide limits are added to the resource controls to prevent processes in a zone from monopolizing the system. The global zone administrator is able to specify global limits with rctladm(1M). The non-global zone administrator can use rctladm(1M) to specify zone-wide limits. A one zone, one resource pool rule applies to non-global zones. That means that a non-global zone is bound to a pool in its entirety. All resources in the pool are shared among all processes of the zone. Any attempt to bind individual processes, tasks, or projects in a non-global zone to another pool fails. 5.7 Package and Patch Database Each zone maintains its own package and patch database. A package or a patch can be installed individually into a non-global zone or to all zones from the global zone. The behavior of packaging in a zone environment varies according to the following factors:
    • Use of the -G option in pkgadd(1M)
    • Setting of the SUNW_PKG_ALLZONES and SUNW_PKG_HOLLOW variable in the pkginfo file (see pkginfo(4) for details)
    • Type of zone, global or non-global, in which pkgadd(1M) is invoked
    If the package's pkginfo(4) SUNW_PKG_ALLZONES attribute is set to "true", the package can be installed and removed only from the global zone: The package is installed in all non-global zones, and the package is installed to a zone when a zone is installed. The SUNW_PKG_HOLLOW attribute of pkginfo(4) can also affect the package visibility and behavior in a non-global zone. The following list specifies the interaction between the -G option of patchadd(1M) and the SUNW_PKG_ALLZONES variable (see pkginfo(4)) when adding a patch in global and non-global zones using patchadd(1M).
    • Global zone, -G specified
      • If any packages have SUNW_PKG_ALLZONES set to true: patchadd(1M) returns error; nothing changes.
      • If no packages have SUNW_PKG_ALLZONES set to true: Apply patch to package(s) in global zone only.
    • Global zone, -G not specified
      • If any packages have SUNW_PKG_ALLZONES set to true: Apply patch to appropriate package(s) in all zones.
      • If no packages have SUNW_PKG_ALLZONES set to true: Apply patch to appropriate package(s) in all zones.
    • Non-global zone, -G specified or not specified
      • If any packages have SUNW_PKG_ALLZONES set to true: patchadd(1M) returns error; nothing changes.
      • If no packages have SUNW_PKG_ALLZONES set to true: Apply patch package(s) in local zone only.
    6.0 Zone Limitations
    Because of the security, process isolation, and resource partitioning constraints described in the previous sections, some limitations on system calls, device availability, and networking usage have been imposed on processes running in a non-global zone. These limitations are enforced by:
    • Process privileges
    • zoneadmd(1M) during zone creation (for example, /dev/kmem/dev/dtrace)
    • Kernel (for example, /proc)
    The goal of the following sections[4] is to list the interfaces (APIs, CLIs, device files, and so on) that could return an error if used in a non-global zone. 6.1 System Calls
    adjtime(2) -- correct the time to allow synchronization of the system clock
    ioctl(2) -- device control
    -> ioctl(2) with I_POP and STREAMS anchors in place
    link(2)/unlink(2) -- link to a directory
    memcntl(2) -- memory management control
    -> with MC_LOCKMC_LOCKASMC_UNLOCK or MC_UNLOCKAS
    mknod(2) -- make a directory, a special file, or a regular file
    ->with S_IFCHR and S_IFBLK as file type
    msgctl(2) -- message control operations
    -> with IPC_SET and raising msg_qbytes
    ntp_adjtime(2) -- adjust local clock parameters
    p_online(2) -- return or change processor operational status
    ->P_ONLINEP_OFFLINEP_NOINTRP_FAULTEDP_SPARE, and P_FORCED
    priocntl(2) -- process scheduler control
    -> with PC_SETPARMSPC_SETXPARMS, and PC_ADMIN
    priocntlset(2) -- generalized process scheduler control
    -> with PC_SETPARMSPC_SETXPARMS, and PC_ADMIN
    pset_bind(2) -- bind LWPs to a set of processors
    pset_create(2)pset_destroy(2)pset_assign(2) -- manage sets of processors
    pset_setattr(2) -- set processor set attributes
    shmctl(2) -- shared memory control operations
    -> with SHM_LOCK and SHM_UNLOCK
    socket(2) -- create an endpoint for communication
    -> with SOCK_RAW
    stime(2) -- set system time and date
    swapctl(2) -- manage swap space
    -> with SC_ADD and SC_REMOVE swapping resources
    6.2 Library Functions

    clock_settime(3RT) -- set high-resolution clock operations
    cpc_bind_cpu(3CPC) -- bind request sets to hardware counters
    libdevinfo(3LIB) -- device information library
    libcfgadm(3LIB) -- configuration administration library
    libpool(3LIB) -- pool configuration manipulation library
    libkvm(3LIB) -- Kernel Virtual Memory access library
    libtnfctl(3LIB) -- TNF probe control library
    mlock(3C)/munlock(3C) -- lock or unlock pages in memory
    mlockall(3C)/munlockall(3C) -- lock or unlock address space
    plock(3C) -- lock or unlock into memory process, text, or data
    timer_create(3RT) -- create a timer
    -> with CLOCK_HIGHRES
    t_open(3NSL) -- establish a transport endpoint
    -> with /dev/rawip
    settimeofday(3C) -- get or set the date and time
    6.3 Commands
    The commands listed below might not have full functionality in a non-global zone. For example, you can use arp -a in a non-global zone to display all of the current ARP entries of the system. However, other options of arp(1M) return errors if they are used in a non-global zone to manipulate the ARP table.

    add_drv(1M)/rem_drv(1M) -- add/remove a new device driver to the system



    arp(1M) -- address resolution display and control



    autopush(1M) -- configure lists of automatically pushed STREAMS modules



    cfgadm(1M) -- configuration administration



    cpustat(1M) -- monitor system behavior using CPU performance counters



    devfsadm(1M) -- administration command for /dev



    devlinks(1M) -- add /dev entries for miscellaneous devices and pseudo-devices



    dispadmin(1M) -- process scheduler administration



    disks(1M) -- create /dev entries for hard disks attached to the system



    drvconfig(1M) -- apply permission and ownership changes to devices



    dtrace(1M) -- DTrace dynamic tracing compiler and tracing utility



    intrstat(1M) -- report interrupt statistics



    ipf(1M) and related IP filter commands -- alter packet filtering lists for IP packet input and output



    modload(1M)/modunload(1M) -- load/unload a kernel module



    plockstat(1M) -- report user-level lock statistics



    pooladm(1M) -- activate and deactivate the resource pools facility



    poolcfg(1M) -- create and modify resource pool configuration files



    poolbind(1M) -- bind processes, tasks, or projects or query binding of processes to resource pools



    ports(1M) -- creates /dev entries and inittab entries for serial lines



    prtconf(1M) -- print system configuration



    prtdiag(1M) -- display system diagnostic information



    psrset(1M) -- creation and management of processor sets



    route(1M) -- manually manipulate the routing tables



    share(1M) -- make local resource available for mounting by remote systems



    snoop(1M) -- capture and inspect network packets



    tapes(1M) -- create /dev entries for tape drives attached to the system



    trapstat(1M) -- report trap statistics




    date(1) -- write the date and time
    nca(1) -- the Solaris Network Cache and Accelerator (NCA)
    6.4 Device and Interface Special Files

    uscsi(7I) -- user SCSI command interface
    mem(7D)/kmem(7D)/allkmem(7D) -- physical or virtual memory access
    kmdb(7D) -- in situ kernel debugger
    ksyms(7D) -- kernel symbols
    dtrace(7D) -- DTrace dynamic tracing facility
    lockstat(7D) -- DTrace kernel lock instrumentation provider
    cpuid(7D) -- CPU identification driver
    fcip(7D) -- IP/ARP over fibre channel datagram encapsulation drive
    All NIC device nodes that support the DLPI programming interface are not accessible in a non-global zone. Examples of such device nodes are:hme(7D)ce(7D)ge(7D)eri(7D)bge(7D)dmfe(7D)dnet(7D)e1000g(7D)elxl(7D)iprb(7D)pcelx(7D)pcn(7D)qfe(7D),rtls(7D)sk98sol(7D)skfp(7D), and spwr(7D).

1 comment: