Originally Posted by dhirend_6d
What Is a Kernel?
The UNIX kernel is the software that manages the user program's access to the systems hardware and software resources. These resources range from being granted CPU time, accessing memory, reading and writing to the disk drives, connecting to the network, and interacting with the terminal or GUI interface. The kernel makes this all possible by controlling and providing access to memory, processor, input/output devices, disk files, and special services to user programs.
The basic UNIX kernel can be broken into four main subsystems:
These subsystems should be viewed as separate entities that work in concert to provide services to a program that enable it to do meaningful work. These management subsystems make it possible for a user to access a database via a Web interface, print a report, or do something as complex as managing a 911 emergency system. At any moment in the system, numerous programs may request services from these subsystems. It is the kernel's responsibility to schedule work and, if the process is authorized, grant access to utilize these subsystems. In short, programs interact with the subsystems via software libraries and the systems call interface. We'll start by looking at how the UNIX kernel comes to life by way of the system initialization process.
System initialization (booting) is the first step toward bringing your system into an operational state. A number of machine-dependent and machine-independent steps are gone through before your system is ready to begin servicing users. At system startup, there is nothing running on the Central Processing Unit (CPU). The kernel is a complex program that must have its binary image loaded at a specific address from some type of storage device, usually a disk drive. The boot disk maintains a small restricted area called the boot sector that contains a boot program that loads and initializes the kernel. You'll find that this is a vendor specific procedure that reflects the architectural hardware differences between the various UNIX vendor platforms. When this step is completed, the CPU must jump to a specific memory address and start executing the code at that location. Once the kernel is loaded, it goes through its own hardware and software initialization.
The operating system, or kernel, runs in a privileged manner known as kernel mode. This mode of operation allows the kernel to run without being interfered with by other programs currently in the system. The microprocessor enforces this line of demarcation between user and kernel level mode. With the kernel operating in its own protected address space, it is guaranteed to maintain the integrity of its own data structures and that of other processes. (That's not to say that a privileged process could not inadvertently cause corruption within the kernel.) These data structures are used by the kernel to manage and control itself and any other programs that may be running in the system. If any of these data structures were allowed to be accidentally or intentionally altered, the system could quickly crash. Now that we have learned what a UNIX kernel is and how it is loaded into the system, we are ready to take a look at the four UNIX subsystems Process Management, Memory Management, Filesystem Management and I/O Management.
The Process Management subsystem controls the creation, termination, accounting, and scheduling of processes. It also oversees process state transitions and the switching between privileged and nonprivileged modes of execution. The Process Management subsystem also facilitates and manages the complex task of the creation of child processes.
A simple definition of a process is that it is an executing program. It is an entity that requires system resources, and it has a finite lifetime. It has the capability to create other processes via the system call interface. In short, it is an electronic representation of a user's or programmer's desire to accomplish some useful piece of work. A process may appear to the user as if it is the only job running in the machine. This "sleight of hand" is only an illusion. At any one time a processor is only executing a single process.
A process has a definite structure (see Figure 19.1). The kernel views this string of bits as the process image. This binary image consists of both a user and system address space as well as registers that store the process's data during its execution. The user address space is also known as the user image. This is the code that is written by a programmer and compiled into an ".o " object file. An object file is a file that contains machine language code/data and is in a format that the linker program can use to then create an executable program.
Diagram of process areas.
The user address space consists of five separate areas: Text, Data, Bss, stack, and user area.
The first area of a process is its text segment. This area contains the executable program code for the process. This area is shared by other processes that execute the program. It is therefore fixed and unchangeable and is usually swapped out to disk by the system when memory gets too tight.
The data area contains both the global and static variables used by the program. For example, a programmer may know in advance that a certain data variable needs to be set to a certain value. In the C programming language, it would look like:
If you were to look at the data segment when the program was loaded, you would see that the variable x was an integer type with an initial value of 15.
The bss area, like the data area, holds information for the programs variables. The difference is that the bss area maintains variables that will have their data values assigned to them during the programs execution. For example, a programmer may know that she needs variables to hold certain data that will be input by a user during the execution of the program.
int a,b,c; // a,b and c are variables that hold integer values.
char *ptr; // ptr is an unitialized character pointer.
The program code can also make calls to library routines like malloc to obtain a chunk of memory and assign it to a variable like the one declared above.
The stack area maintains the process's local variables, parameters used in functions, and values returned by functions. For example, a program may contain code that calls another block of code (possibly written by someone else). The calling block of code passes data to the receiving block of code by way of the stack. The called block of code then process's the data and returns data back to the calling code. The stack plays an important role in allowing a process to work with temporary data.
The user area maintains data that is used by the kernel while the process is running. The user area contains the real and effective user identifiers, real and effective group identifiers, current directory, and a list of open files. Sizes of the text, data, and stack areas, as well as pointers to process data structures, are maintained. Other areas that can be considered part of the process's address space are the heap, private shared libraries data, shared libraries, and shared memory. During initial startup and execution of the program, the kernel allocates the memory and creates the necessary structures to maintain these areas.
The user area is used by the kernel to manage the process. This area maintains the majority of the accounting information for a process. It is part of the process address space and is only used by the kernel while the process is executing(see Figure 19.2). When the process is not executing, its user area may be swapped out to disk by the Memory Manager. In most versions of UNIX, the user area is mapped to a fixed virtual memory address. Under HP-UX 10.X
, this virtual address is 0x7FFE6000
. When the kernel performs a context switch (starts executing a different process) to a new process, it will always map the process's physical address to this virtual address. Since the kernel already has a pointer fixed to this location in memory, it is a simple matter of referencing the current u pointer to be able to begin managing the newly switched in process. The file /usr/include/sys/user.h
contains the user area's structure definition for your version of UNIX.
Diagram of kernel address space.
The process table is another important structure used by the kernel to manage the processes in the system. The process table is an array of process structures that the kernel uses to manage the execution of programs. Each table entry defines a process that the kernel has created. The process table is always resident in the computer's memory. This is because the kernel is repeatedly querying and updating this table as it switches processes in and out of the CPU. For those processes that are not currently executing, their process table structures are being updated by the kernel for scheduling purposes. The process structures for your system are defined in /usr/include/sys/proc.h.
The kernel provides each process with the tools to duplicate itself for the purpose of creating a new process. This new entity is termed a child process. The fork() system call is invoked by an existing process (termed the parent process) and creates a replica of the parent process. While a process will have one parent, it can spawn many children. The new child process inherits certain attributes from its parent.
Process Run States
A process moves between several states during its lifetime, although a process can only be in one state at any one time. Certain events, such as system interrupts, blocking of resources, or software traps will cause a process to change its run state. The kernel maintains queues in memory that it uses to assign a process to based upon that process's state. It keeps track of the process by its user ID.
UNIX version System V Release 4 (SVR4) recognizes the following process run states:
- SIDLE This is the state right after a process has issued
a fork() system call. A process image has yet to be copied into memory.
- SRUN The process is ready to run and is waiting to be executed by the CPU.
- SONPROC The process is currently being executed by the CPU.
- SSLEEP The process is blocking on an event or resource.
- SZOMB The process has terminated and is waiting on
either its parent or the init process to allow it to completely exit.
- SXBRK The process is has been switched out so that another process can be executed.
- SSTOP The process is stopped.[/COLOR]
When a process first starts, the kernel allocates it a slot in the process table and places the process in the SIDL state. Once the process has the resources it needs to run, the kernel places it onto the run queue. The process is now in the SRUN state awaiting its turn in the CPU. Once its turn comes for the process to be switched into the CPU, the kernel will tag it as being in the SONPROC state. In this state, the process will execute in either user or kernel mode. User mode is where the process is executing nonprivileged code from the user's compiled program. Kernel mode is where kernel code is being executed from the kernel's privileged address space via a system call.
At some point the process is switched out of the CPU because it has either been signaled to do so (for instance, the user issues a stop signal--SSTOP state) or the process has exceeded its quota of allowable CPU time and the kernel needs the CPU to do some work for another process. The act of switching the focus of the CPU from one process to another is called a context switch. When this occurs, the process enters what is known as the SXBRK state. If the process still needs to run and is waiting for another system resource, such as disk services, it will enter the SSLEEP state until the resource is available and the kernel wakes the process up and places it on the SRUN queue. When the process has finally completed its work and is ready to terminate, it enters the SZOMB state. We have seen the fundamentals of what states a process can exist in and how it moves through them. Let's now learn how a kernel schedules a process to run.
Most modern versions of UNIX (for instance, SVR4 and Solaris 2.x) are classified as preemptive operating systems. They are capable of interrupting an executing a process and "freezing" it so that the CPU can service a different process. This obviously has the advantage of fairly allocating the system's resources to all the processes in the system. This is one goal of the many systems architects and programmers who design and write schedulers. The disadvantages are that not all processes are equal and that complex algorithms must be designed and implemented as kernel code in order to maintain the illusion that each user process is running as if it was the only job in the system. The kernel maintains this balance by placing processes in the various priority queues or run queues and apportioning its CPU time-slice based on its priority class (Real-Time versus Timeshare).
Random access memory (RAM) is a very critical component in any computer system. It's the one component that always seems to be in short supply on most systems. Unfortunately, most organizations' budgets don't allow for the purchase of all the memory that their technical staff feel is necessary to support all their projects. Luckily, UNIX allows us to execute all sorts of programs without, what appears at first glance to be, enough physical memory. This comes in very handy when the system is required to support a user community that needs to execute an organization's custom and commercial software to gain access to its data.
Memory chips are high-speed electronic devices that plug directly into your computer. Main memory is also called core memory
by some technicians. Ever heard of a core dump? (Writing out main memory to a storage device for post-dump analysis.) Usually it is caused by a program or system crash or failure. An important aspect of memory chips is that they can store data at specific locations called addresses. This makes it quite convenient for another hardware device called the central processing unit (CPU) to access these locations to run your programs. The kernel uses a paging and segmentation arrangement to organize process memory. This is where the memory management subsystem plays a significant role. Memory management can be defined as the efficient managing and sharing of the system's memory resources by the kernel and user processes.
Memory management follows certain rules that manage both physical and virtual memory. Since we already have an idea of what a physical memory chip or card is, we will provide a definition of virtual memory. Virtual memory
is where the addressable memory locations that a process can be mapped into are independent of the physical address space of the CPU. Generally speaking, a process can exceed the physical address space/size of main memory and still load and execute.
The systems administrator should be aware that just because she has a fixed amount of physical memory, she should not expect it all to be available to execute user programs. The kernel is always resident in main memory and depending upon the kernel's configuration (tunable-like kernel tables, daemons, device drivers loaded, and so on), the amount left over can be classified as available memory. It is important for the systems administrator to know how much available memory the system has to work with when supporting his environment. Most systems display memory statistics during boot time. If your kernel is larger than it needs to be to support your environment, consider reconfiguring a smaller kernel to free up resources.
We learned before that a process has a well-defined structure and has certain specific control data structures that the kernel uses to manage the process during its system lifetime. One of the more important data structures that the kernel uses is the virtual address space (vas in HP-UX and as in SVR4. For a more detailed description of the layout of these structures, look at the vas.h or as.h
header files under /usr/include
on your system.).
A virtual address space exists for each process and is used by the process to keep track of process logical segments or regions that point to specific segments of the process's text (code), data, u_area, user, and kernel stacks;
shared memory; shared library; and memory mapped file segments. Per-process regions protect and maintain the number of pages mapped into the segments. Each segment has a virtual address space segment as well. Multiple programs can share the process's text segment. The data segment holds the process's initialized and uninitialized (BSS) data. These areas can change size as the program executes.
and kernel stack
contain information used by the kernel, and are a fixed size. The user stack is contained in the u_area;
however, its size will fluctuate during its execution. Memory mapped files allow programmers to bring files into memory and work with them while in memory. Obviously, there is a limit to the size of the file you can load into memory (check your system documentation). Shared memory segments are usually set up and used by a process to share data with other processes. For example, a programmer may want to be able to pass messages to other programs by writing to a shared memory segment and having the receiving programs attach to that specific shared memory segment and read the message. Shared libraries allow programs to link to commonly used code at runtime. Shared libraries reduce the amount of memory needed by executing programs because only one copy of the code is required to be in memory. Each program will access the code at that memory location when necessary.
When a programmer writes and compiles a program, the compiler generates the object file from the source code. The linker program (ld) links the object file with the appropriate libraries and, if necessary, other object files to generate the executable program. The executable program contains virtual addresses that are converted into physical memory addresses when the program is run. This address translation must occur prior to the program being loaded into memory so that the CPU can reference the actual code.
When the program starts to run, the kernel sets up its data structures (proc, virtual address space, per-process region) and begins to execute the process in user mode. Eventually, the process will access a page that's not in main memory (for instance, the pages in its working set are not in main memory). This is called a page fault
. When this occurs, the kernel puts the process to sleep, switches from user mode to kernel mode, and attempts to load the page that the process was requesting to be loaded. The kernel searches for the page by locating the per-process region where the virtual address is located. It then goes to the segments (text, data, or other) per-process region to find the actual region that contains the information necessary to read in the page.
The kernel must now find a free page in which to load the process's requested page. If there are no free pages, the kernel must either page or swap out pages to make room for the new page request. Once there is some free space, the kernel pages in a block of pages from disk. This block contains the requested page plus additional pages that may be used by the process. Finally the kernel establishes the permissions and sets the protections for the newly loaded pages. The kernel wakes the process and switches back to user mode so the process can begin executing using the requested page. Pages are not brought into memory until the process requests them for execution. This is why the system is referred to as a demand paging
The memory management unit is a hardware component that handles the translation of virtual address spaces to physical memory addresses. The memory management unit also prevents a process from accessing another process's address space unless it is permitted to do so (protection fault). Memory is thus protected at the page level. The Translation Lookaside Buffer (TLB)
is a hardware cache that maintains the most recently used virtual address space to physical address translations. It is controlled by the memory management unit to reduce the number of address translations that occur on the system.
Input and Output Management
The simplest definition of input/output
is the control of data between hardware devices and software. A systems administrator is concerned with I/O at two separate levels. The first level is concerned with I/O between user address space and kernel address space; the second level is concerned with I/O between kernel address space and physical hardware devices. When data is written to disk, the first level of the I/O subsystem copies the data from user space to kernel space. Data is then passed from the kernel address space to the second level of the I/O subsystem. This is when the physical hardware device activates its own I/O subsystems, which determine the best location for the data on the available disks.
The OEM (Original Equipment Manufacture)
UNIX configuration is satisfactory for many work environments, but does not take into consideration the network traffic or the behavior of specific applications on your system. Systems administrators find that they need to reconfigure the systems I/O to meet the expectations of the users and the demands of their applications. You should use the default configuration as a starting point and, as experience is gained with the demands on the system resources, tune the system to achieve peak I/O performance.
UNIX comes with a wide variety of tools that monitor system performance. Learning to use these tools will help you determine whether a performance problem is hardware or software related. Using these tools will help you determine whether a problem is poor user training, application tuning, system maintenance, or system configuration. sar, iostat
, and monitor
are some of your best basic I/O performance monitoring tools.
The sar command writes to standard output the contents of selected cumulative activity counters in the operating system. The following list is a breakdown of those activity counters that sar accumulates.
* File access
* Buffer usage
* system call activity
* Disk and tape input/output activity
* Free memory and swap space
* Kernel Memory Allocation (KMA)
* Interprocess communication
* Queue Activity
* Central Processing Unit (CPU)
* Kernel tables
* Terminal device activity
Reports CPU statistics and input/output statistics for TTY devices, disks, and CD-ROMs.
Like the sar command, but with a visual representation of the computer state.
The memory subsystem comes into effect when the programs start requesting access to more physical RAM memory than is installed on your system. Once this point is reached, UNIX will start I/O processes called paging
This is when kernel procedures start moving pages of stored memory out to the paging or swap areas defined on your hard drives. (This procedure reflects how swap files work in Windows by Microsoft for a PC.) All UNIX systems use these procedures to free physical memory for reuse by other programs. The drawback to this is that once paging and swapping have started, system performance decreases rapidly. The system will continue using these techniques until demands for physical RAM drop to the amount that is installed on your system. There are only two physical states for memory performance on your system: Either you have enough RAM or you don't, and performance drops through the floor.
Memory performance problems are simple to diagnose; either you have enough memory or your system is thrashing.
Computer systems start thrashing when more resources are dedicated to moving memory (paging and swapping) from RAM to the hard drives. Performance decreases as the CPUs and all subsystems become dedicated to trying to free physical RAM for themselves and other processes.
This summary doesn't do justice, however, to the complexity of memory management nor does it help you to deal with problems as they arise. To provide the background to understand these problems, we need to discuss virtual memory activity in more detail.
We have been discussing two memory processes: paging and swapping. These two processes help UNIX fulfill memory requirements for all processes. UNIX systems employ both paging and swapping to reduce I/O traffic and execute better control over the system's total aggregate memory. Keep in mind that paging and swapping are temporary measures; they cannot fix the underlying problem of low physical RAM memory.
Swapping moves entire idle processes to disk for reclamation of memory, and is a normal procedure for the UNIX operating system. When the idle process is called by the system again, it will copy the memory image from the disk swap area back into RAM.
On systems performing paging and swapping, swapping occurs in two separate situations. Swapping is often a part of normal housekeeping. Jobs that sleep for more that 20 seconds are considered idle and may be swapped out at any time. Swapping is also an emergency technique used to combat extreme memory shortages. Remember our definition of thrashing; this is when a system is in trouble. Some system administrators sum this up very well by calling it "desperation swapping."
Paging, on the other hand, moves individual pages (or pieces) of processes to disk and reclaims the freed memory, with most of the process remaining loaded in memory. Paging employs an algorithm to monitor usage of the pages, to leave recently accessed pages in physical memory, and to move idle pages into disk storage. This allows for optimum performance of I/O and reduces the amount of I/O traffic that swapping would normally require.
NOTE: Monitoring what the system is doing is easy with the ps command. ps is a "process status" command on all UNIX systems and typically shows many idle and swapped-out jobs. This command has a rich amount of options to show you what the computer is doing.
I/O performance management, like all administrative tasks, is a continual process. Generating performance statistics on a routine basis will assist in identifying and correcting potential problems before they have an impact on your system or, worst case, your users. UNIX offers basic system usage statistics packages that will assist you in automatically collecting and examining usage statistics.
You will find the load on the system will increase rapidly as new jobs are submitted and resources are not freed quickly enough. Performance drops as the disks become I/O bound trying to satisfy paging and swapping calls. Memory overload quickly forces a system to become I/O and CPU bound.
Filesystem is the collection place on disk device(s) for files. Visualize the filesystem as consisting of a single node at the highest level (ROOT) and all other nodes descending from the root node in a tree-like fashion (see Figure 19.5) . The second meaning will be used for this discussion, and Hewlett Packard's High-performance Filesystem will be used for technical reference purposes.
Diagram of a Android' s hierarchical filesystem.
The superblock is the key to maintaining the filesystem. It's an 8 KB block of disk space that maintains the current status of the filesystem. Because of its importance, a copy is maintained in memory and at each cylinder group within the filesystem. The copy in main memory is updated as events transpire. The update daemon is the actual process that calls on the kernel to flush the cached superblocks, modified inodes, and cached data blocks to disk. The superblock maintains the following static and dynamic information about the