Sankar Kasala: November 2012

Friday, November 30, 2012

How to manage Redhat Cluster in RHEL 5

Check cluster status

[root@ncs-db-1 ~]# clustat
Cluster Status for ncs_dbcluster @ Wed Sep 1 15:22:08 2009
Member Status: Quorate

Member Name                             ID   Status
------ ----                             ---- ------
ncs-db-n1                                   1 Online, Local, rgmanager
ncs-db-n2                                   2 Online, rgmanager

Service Name                   Owner (Last)                   State
------- ----                   ----- ------                   -----
service:clusvc                 ncs-db-n1                      started

Disable cluster status

After disable, resource group shutdown and will not affect by server reboot and/or failover.

[root@ncs-db-1 ~]# clusvcadm -d clusvc
Local machine disabling service:clusvc...Success

Enable cluster status

[root@ncs-db-1 ~]# clusvcadm -e clusvc
Local machine trying to enable service:clusvc...Success
service:clusvc is now running on ncs-db-n1

Freeze cluster resource group

After freeze resource group, it will not being monitored by the cluster manager. But current resource group state will not be affected by this command.

[root@ncs-db-1 ~]# clusvcadm -Z clusvc
Local machine freezing service:clusvc...Success

Unfreeze cluster resource group

[root@ncs-db-1 ~]# clusvcadm -U clusvc
Local machine unfreezing service:clusvc...Success

Stop cluster resource group
Resource group can fail over if current node due to fencing.

[root@ncs-db-1 ~]# clusvcadm -s clusvc
Local machine stopping service:clusvc...Success

Restart cluster resource group
Resource group can failover if current node due to fencing.

[root@ncs-db-1 ~]# clusvcadm -s clusvc
Local machine stopping service:clusvc...Success

Relocate resource to another member in the failover domain

[root@ncs-db-1 ~]# clusvcadm -r clusvc -m ncs-db-n2
Trying to relocate service:clusvc to ncs-db-n2...Success
service:clusvc is now running on ncs-db-n2

Other useful commands
[root@ncs-db-1 ~]# ccs_tool lsnode

Cluster name: ncs_dbcluster, config_version: 21

Nodename                        Votes Nodeid Fencetype
ncs-db-n1                          1    1    ncs-db-1-ilo1
ncs-db-n2                          1    2    ncs-db-2-ilo2

[root@ncs-db-1 ~]# cman_tool status
Version: 6.1.0
Config Version: 21
Cluster Name: ncs_cluster
Cluster Id: 27444
Cluster Member: Yes
Cluster Generation: 1796
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Quorum: 1
Active subsystems: 9
Flags: 2node Dirty
Ports Bound: 0 11 177
Node name: ncs-db-n1
Node ID: 1
Multicast addresses: xxx.xxx.xxx.xxx
Node addresses: xxx.xxx.xxx.xxx

Thursday, November 29, 2012

Local Yum repository to Red Hat Enterprise Linux.

1) Mount the dvd in /mnt/dvd path.

2) Install the createrepo RPM from the RHEL 5.2 DVD ( which I have mounted to /mnt/dvd ).

# rpm -ivh /mnt/dvd/Server/createrepo-0.4.11-3.el5.noarch.rpm

# cd /mnt

# createrepo .

3) Create the YUM repository pointing back at the newly created repository metadata.

# vim /etc/yum.repos.d/dvdrhel.repo
[MailRepo]
name=MailRepo
baseurl=file:///mnt/
enabled=1
gpgcheck=0

4) Update YUM to have it pick up the new repository.

# yum clean

5) Test by listing the available packages.

# yum list

Wednesday, November 28, 2012

Patch Management in Linux using YUM

Pre-requisites:

    Installation of Red Hat Enterprise Linux 5.
    Premium/Standard License for RHEL 5 32-bit or 64-bit servers (Note: RHEL 32-bit YUM Server will let    you apply patches on RHEL 32-bit OS. For 64-bit OS, you required YUM Server on RHEL 64-bit).
    Createrepo, yum-downloadonly, httpd packages to be installed on the server.
    Installation and configuration of apache web server.
    Copying the rpm’s from RHEL CD to the defined DocumentRoot Path mentioned in httpd configuration file.

Installation and Configuration of YUM Server:

Step 1: Creating a Repository using apache.

      Installation of apache web server.
# rpm –ivh httpd

      Modify httpd configuration file as mentioned below.
# vi /etc/httpd/conf/httpd.conf

ServerAdmin root@192.168.0.5
ServerName 192.168.0.5:80
DocumentRoot "/var/www/html"

      Create folders as mentioned below.
# cd /var/www/html
# mkdir Server
# mkdir VT
# mkdir images

      Copy all RHEL 5 RPM’s from CD to the Server, VT & images folders on the server.

Step 2: Create Database of RPM’s

      Run createrepo command to create database of the rpm’s.
# cd /var/www/html/Server
# createrepo .
# cd /var/www/html/VT
# createrepo .
# cd /var/www/html/images
# createrepo .

       Create group of RPM’s for installing group of packages.
# createrepo -g /var/www/html/Server/repodata/comps-rhel5-server-core.xml
# createrepo -g /var/www/html/VT/repodata/comps-rhel5-VT-core.xml

Step 3: Register your YUM Server with Red hat Network.

    Ensure that the following entries have been added in host file and the URLs are accessible from the server.
# vi /etc/hosts
209.132.183.44 xmlrpc.rhn.redhat.com
209.132.183.43 satellite.rhn.redhat.com
209.132.183.42 rhn.redhat.com
    Run rhn_register command and follow the instructions as printed on screen, create a system profile. (Note: You should have a valid subscription key)
    De-select Location aware updates from RHN website of the registered machine. (Note: You should have a valid RHN login id.)

Step 4: Download required updates & hot fixes from Red hat

    Run below command to download RPM’s to configured repository.
# yum update -y --downloadonly --downloaddir=/var/www/html/Server/

          Re-run below command after downloading any new packages into repository.
# cd /var/www/html/Server
# createrepo –update .

Note : If any patch is released, First download it via download command as mentioned in step 4 and then only install on YUM Server using yum update command. Else, you won’t be able to download same patch again.

Configuration of YUM Client:

Step 1: Creation of Repo file for pointing the client to server for updates.
# vi /etc/yum.repos.d/Server.repo
[rhel-i386-server-5]
name=rhel-i386-server-5
baseurl=http://192.168.0.5/Server
enabled=1
gpgcheck=0

[rhel-i386-server-vt-5]
name=rhel-i386-server-vt-5
baseurl=http://192.168.0.5/VT
enabled=1
gpgcheck=0

Step 2: Configure Mail alerts for pending patches on client.
Note: You have to enable SMTP relay on the server.
# yum check-update | mailx -s "PATCHES PENDING on $HOSTNAME" abc@tcs.com

Step 3: Run Yum update command on quarterly basis to make your system up2date with latest patches and hot fixes.
# yum update

Note: Before updating the system, have a proper backup for the same.

Rollback Package updates/Installation on YUM Server and Client:

Step 1: To configure yum to save rollback information, add the line tsflags=repackage to /etc/yum.conf.

Step 2: To configure command-line rpm to do the same thing, add the line %_repackage_all_erasures 1 to /etc/rpm/macros. If /etc/rpm/macros. Does not exist, just create it.

Step 3: You can now install, erase and update packages with yum and/or rpm, and they will save roll back information.

Step 4: When you want to roll back, use rpm to do so.
You do this by specifying the --rollback switch and a date/time, like the examples below:
rpm -Uhv --rollback '19:00'
rpm -Uhv --rollback '8 hours ago'
rpm -Uhv --rollback 'december 31'
rpm -Uhv --rollback 'yesterday'

Friday, November 16, 2012

Apache restrict access based on IP address to selected directories

Apache web server allows server access based upon various conditions. For example you just want to restrict access to url http://payroll.nixcraft.in/ (mapped to /var/www/sub/payroll directory) from 192.168.1.0/24 network (within intranet).

Apache provides access control based on client host name, IP address, or other characteristics of the client request using mod_access module.

Open your httpd.conf file:

# vi /etc/httpd/conf/httpd.conf Locate directory section (for example/var/www/sub/payroll) and set it as follows:
<Directory /var/www/sub/payroll/>
Order allow,deny
Allow from 192.168.1.0/24
Allow from 127
</Directory>

Order allow,deny: The Order directive controls the default access state and the order in which Allow and Deny directives are evaluated. The (allow,deny) Allow directives are evaluated before the Deny directives. Access is denied by default. Any client which does not match an Allow directive or does match a Deny directive will be denied access to the server.

Allow from192.168.1.0/24: The Allow directive affects which hosts can access an area of the server (i.e. /var/www/sub/payroll/). Access is only allowed from network 192.168.1.0/24 and localhost (127.0.0.1).

Save file and restart apache web server:

# /etc/init.d/httpd restart

Thursday, November 15, 2012

What is the difference between a daemon and a server process?

A 'daemon' is a software process that runs in the background (continuously) and provides the service to client upon request.

For example named is a daemon. When requested it will provide DNS service. Other examples are:

xinetd (it is a super-daemon, it is responsible for invoking other Internet servers when they are needed)
inetd (same as xinetd, but with limited configuration options)
sendmail/postfix (to send/route email)
Apache/httpd (web server)

Wednesday, November 7, 2012

What is the difference between varchar and char?

CHAR Data Type is a Fixed Length Data Type. For example, if you declare a variable/column of CHAR (10) data type, then it will always take 10 bytes irrespective of whether you are storing 1 character or 10 character in this variable or column. And in this example, as we have declared this variable/column as CHAR(10), so we can store max 10 characters in this column.

On the other hand, VARCHAR is a variable length Data Type. For example, if you declare a variable/column of VARCHAR (10) data type, it will take the number of bytes equal to the number of characters stored in this column. So, in this variable/column, if you are storing only one character, then it will take only one byte and if we are storing 10 characters, then it will take 10 bytes. And in this example, as we have declared this variable/column as VARCHAR (10), so we can store max 10 characters in this column.

Monday, November 5, 2012

How to Split and Merge Multiple Linux Files

Considering that you have 50MB of single file. Your friend needs it urgently. The situation shows that the ONLY means of transferring this large file between you and your friend is thru email exchanges. The problem is both your ISP limits the size of email attachments to 5MB file attachments.

How to transfer 50MB large file by email?
How to split and merge large chunks of file by email?
How to send big file by email?
How to split big file and rejoin them later from the other end?

Splitting and Merging Multiple Linux Files

Split Linux commands chops file into multiple chunks of file. Split command divides a single file into multiple files regardless of file types. Split command is fantastic on handling and splitting binary files, compressed and archived files, text files and any other file types. Split is part of coreutils package.

Here’s how to accomplish our objective of splitting large files into pieces and merging them back.

As an example, I have prepared a binary rpm file named testfile.rpm with 18MB filesize. Our objective is to split tesfile.rpm into multiple file pieces and merge them back into the same binary file state from the other end.

# ls -la testfile.rpm
-rw-r--r-- 1 root root 18835725 2007-11-23 11:07 testfile.rpm

Spliting the file into multiple file chunks with 4MB as maximum filesize

# split -b4000000 testfile.rpm
# ls -la
-rw-r--r-- 1 root root 18835725 2007-11-23 11:07 testfile.rpm
-rw-r--r-- 1 root root 4000000 2007-11-23 12:27 xaa
-rw-r--r-- 1 root root 4000000 2007-11-23 12:27 xab
-rw-r--r-- 1 root root 4000000 2007-11-23 12:27 xac
-rw-r--r-- 1 root root 4000000 2007-11-23 12:27 xad
-rw-r--r-- 1 root root 2835725 2007-11-23 12:27 xae

Notice that the file testfile.rpm as splitted into 5 file chunks, with 4 files having 4MB filesize and 1 file having more than 2MB of filesize. Filename conventions are written alphabetically xaa, xab, xac, xad and xae and so on for larger file.

Since each filesize is less than the 5MB email attachment limit, you can now attach these multiple chunks of file into your email message and send them successfully into the receiving end, bypassing email attachment restrictions of 5MB.

Now, we need to merge them back from the other receiving end. Here’s how to do it.

How to merge splitted files into a single big file?

Simply issue these simple commands:

# cat xaa xab xac xad xae > newfile.rpm
# ls -la
-rwxr-xr-x 1 root root 18835725 2007-11-23 12:35 newfile.rpm
-rwxr-xr-x 1 root root 18835725 2007-11-23 11:07 testfile.rpm
-rw-r--r-- 1 root root 4000000 2007-11-23 12:27 xaa
-rw-r--r-- 1 root root 4000000 2007-11-23 12:27 xab
-rw-r--r-- 1 root root 4000000 2007-11-23 12:27 xac
-rw-r--r-- 1 root root 4000000 2007-11-23 12:27 xad
-rw-r--r-- 1 root root 2835725 2007-11-23 12:27 xae

The receiving end now has the actual file with simple file splitting and file merging operations.

Enjoy

Linux Troubleshooting High Load

What do you do when you get an alert that your system load is high? Tracking down the cause of high load just takes some time, some experience and a few Linux tools.

This column is the first in a series of columns dedicated to one of my favorite subjects: troubleshooting. I'm a systems administrator during the day, and although I enjoy many aspects of my job, it's hard to beat the adrenaline rush of tracking down a complex server problem when downtime is being measured in dollars. Although it's true that there are about as many different reasons for downtime as there are Linux text editors, and just as many approaches to troubleshooting, over the years, I've found I perform the same sorts of steps to isolate a problem. Because my column is generally aimed more at tips and tricks and less on philosophy and design, I'm not going to talk much about overall approaches to problem solving. Instead, in this series I describe some general classes of problems you might find on a Linux system, and then I discuss how to use common tools, most of which probably are already on your system, to isolate and resolve each class of problem.

For this first column, I start with one of the most common problems you will run into on a Linux system. No, it's not getting printing to work. I'm talking about a sluggish server that might have high load. Before I explain how to diagnose and fix high load though, let's take a step back and discuss what load means on a Linux machine and how to know when it's high.
Uptime and Load

When administrators mention high load, generally they are talking about the load average. When I diagnose why a server is slow, the first command I run when I log in to the system is uptime:

$ uptime
18:30:35 up 365 days, 5:29, 2 users, load average: 1.37, 10.15, 8.10

As you can see, it's my server's uptime birthday today. You also can see that my load average is 1.37, 10.15, 8.10. These numbers represent my average system load during the last 1, 5 and 15 minutes, respectively. Technically speaking, the load average represents the average number of processes that have to wait for CPU time during the last 1, 5 or 15 minutes. For instance, if I have a current load of 0, the system is completely idle. If I have a load of 1, the CPU is busy enough that one process is having to wait for CPU time. If I do have a load of 1 and then spawn another process that normally would tie up a CPU, my load should go to 2. With a load average, the system will give you a good idea of how consistently busy it has been over the past 1, 5 and 10 minutes.

Another important thing to keep in mind when you look at a load average is that it isn't normalized according to the number of CPUs on your system. Generally speaking, a consistent load of 1 means one CPU on the system is tied up. In simplified terms, this means that a single-CPU system with a load of 1 is roughly as busy as a four-CPU system with a load of 4. So in my above example, let's assume that I have a single-CPU system. If I were to log in and see the above load average, I'd probably assume that the server had pretty high load (8.10) during the last 15 minutes that spiked around 5 minutes ago (10.15), but recently, at least during the last 1 minute, the load has dropped significantly. If I saw this, I might even assume that the real cause of the load has subsided. On the other hand, if the load averages were 20.68, 5.01, 1.03, I would conclude that the high load had likely started in the last 5 minutes and was getting worse.
How High Is High?

After you understand what load average means, the next logical question is “What load average is good and what is bad?” The answer to that is “It depends.” You see, a lot of different things can cause load to be high, each of which affects performance differently. One server might have a load of 50 and still be pretty responsive, while another server might have a load of 10 and take forever to log in to. I've had servers with load averages in the hundreds that were certainly slow, but didn't crash, and I had one server that consistently had a load of 50 that was still pretty responsive and stayed up for years.

What really matters when you troubleshoot a system with high load is why the load is high. When you start to diagnose high load, you find that most load seems to fall into three categories: CPU-bound load, load caused by out of memory issues and I/O-bound load. I explain each of these categories in detail below and how to use tools like top and iostat to isolate the root cause.
top

If the first tool I use when I log in to a sluggish system is uptime, the second tool I use is top. The great thing about top is that it's available for all major Linux systems, and it provides a lot of useful information in a single screen. top is a quite complex tool with many options that could warrant its own article. For this column, I stick to how to interpret its output to diagnose high load.

To use top, simply type top on the command line. By default, top will run in interactive mode and update its output every few seconds. Listing 1 shows sample top output from a terminal.

Listing 1. Sample top Output

top - 14:08:25 up 38 days, 8:02, 1 user, load average: 1.70, 1.77, 1.68
Tasks: 107 total,   3 running, 104 sleeping,   0 stopped,   0 zombie
Cpu(s): 11.4%us, 29.6%sy, 0.0%ni, 58.3%id, .7%wa, 0.0%hi, 0.0%si, 0.0%st
Mem:   1024176k total,   997408k used,    26768k free,    85520k buffers
Swap: 1004052k total,     4360k used,   999692k free,   286040k cached

PID USER    PR NI VIRT RES SHR S %CPU %MEM    TIME+ COMMAND
9463 mysql   16   0 686m 111m 3328 S   53 5.5 569:17.64 mysqld
18749 nagios 16   0 140m 134m 1868 S   12 6.6   1345:01 nagios2db_status
24636 nagios 17   0 34660 10m 712 S    8 0.5   1195:15 nagios
22442 nagios 24   0 6048 2024 1452 S    8 0.1   0:00.04 check_time.pl

As you can see, there's a lot of information in only a few lines. The first line mirrors the information you would get from the uptime command and will update every few seconds with the latest load averages. In this case, you can see my system is busy, but not what I would call heavily loaded. All the same, this output breaks down well into our different load categories. When I troubleshoot a sluggish system, I generally will rule out CPU-bound load, then RAM issues, then finally I/O issues in that order, so let's start with CPU-bound load.
CPU-Bound Load

CPU-bound load is load caused when you have too many CPU-intensive processes running at once. Because each process needs CPU resources, they all must wait their turn. To check whether load is CPU-bound, check the CPU line in the top output:

Cpu(s): 11.4%us, 29.6%sy, 0.0%ni, 58.3%id, .7%wa, 0.0%hi, 0.0%si, 0.0%st

Each of these percentages are a percentage of the CPU time tied up doing a particular task. Again, you could spend an entire column on all of the output from top, so here's a few of these values and how to read them:

    us: user CPU time. More often than not, when you have CPU-bound load, it's due to a process run by a user on the system, such as Apache, MySQL or maybe a shell script. If this percentage is high, a user process such as those is a likely cause of the load.

    sy: system CPU time. The system CPU time is the percentage of the CPU tied up by kernel and other system processes. CPU-bound load should manifest either as a high percentage of user or high system CPU time.

    id: CPU idle time. This is the percentage of the time that the CPU spends idle. The higher the number here the better! In fact, if you see really high CPU idle time, it's a good indication that any high load is not CPU-bound.

    wa: I/O wait. The I/O wait value tells the percentage of time the CPU is spending waiting on I/O (typically disk I/O). If you have high load and this value is high, it's likely the load is not CPU-bound but is due to either RAM issues or high disk I/O.

Track Down CPU-Bound Load

If you do see a high percentage in the user or system columns, there's a good chance your load is CPU-bound. To track down the root cause, skip down a few lines to where top displays a list of current processes running on the system. By default, top will sort these based on the percentage of CPU used with the processes using the most on top (Listing 2).

Listing 2. Current Processes Example

PID USER   PR NI VIRT RES SHR S %CPU %MEM    TIME+ COMMAND
9463 mysql 16 0 686m 111m 3328 S   53 5.5 569:17.64 mysqld
18749 nagios 16 0 140m 134m 1868 S   12 6.6   1345:01 nagios2db_status
24636 nagios 17 0 34660 10m 712 S    8 0.5   1195:15 nagios
22442 nagios 24 0 6048 2024 1452 S    8 0.1   0:00.04 check_time.pl

The %CPU column tells you just how much CPU each process is taking up. In this case, you can see that MySQL is taking up 53% of my CPU. As you look at this output during CPU-bound load, you probably will see one of two things: either you will have a single process tying up 99% of your CPU, or you will see a number of smaller processes all fighting for a percentage of CPU time. In either case, it's relatively simple to see the processes that are causing the problem. There's one final note I want to add on CPU-bound load: I've seen systems get incredibly high load simply because a multithreaded program spawned a huge number of threads on a system without many CPUs. If you spawn 20 threads on a single-CPU system, you might see a high load average, even though there are no particular processes that seem to tie up CPU time.

Out of RAM Issues

The next cause for high load is a system that has run out of available RAM and has started to go into swap. Because swap space is usually on a hard drive that is much slower than RAM, when you use up available RAM and go into swap, each process slows down dramatically as the disk gets used. Usually this causes a downward spiral as processes that have been swapped run slower, take longer to respond and cause more processes to stack up until the system either runs out of RAM or slows down to an absolute crawl. What's tricky about swap issues is that because they hit the disk so hard, it's easy to misdiagnose them as I/O-bound load. After all, if your disk is being used as RAM, any processes that actually want to access files on the disk are going to have to wait in line. So, if I see high I/O wait in the CPU row in top, I check RAM next and rule it out before I troubleshoot any other I/O issues.

When I want to diagnose out of memory issues, the first place I look is the next couple of lines in the top output:

Mem: 1024176k total, 997408k used, 26768k free, 85520k buffers
Swap: 1004052k total, 4360k used, 999692k free, 286040k cached

These lines tell you the total amount of RAM and swap along with how much is used and free; however, look carefully, as these numbers can be misleading. I've seen many new and even experienced administrators who would look at the above output and conclude the system was almost out of RAM because there was only 26768k free. Although that does show how much RAM is currently unused, it doesn't tell the full story.
The Linux File Cache

When you access a file and the Linux kernel loads it into RAM, the kernel doesn't necessarily unload the file when you no longer need it. If there is enough free RAM available, the kernel tries to cache as many files as it can into RAM. That way, if you access the file a second time, the kernel can retrieve it from RAM instead of the disk and give much better performance. As a system stays running, you will find the free RAM actually will appear to get rather small. If a process needs more RAM though, the kernel simply uses some of its file cache. In fact, I see a lot of the overclocking crowd who want to improve performance and create a ramdisk to store their files. What they don't realize is that more often than not, if they just let the kernel do the work for them, they'd probably see much better results and make more efficient use of their RAM.

To get a more accurate amount of free RAM, you need to combine the values from the free column with the cached column. In my example, I would have 26768k + 286040k, or over 300Mb of free RAM. In this case, I could safely assume my system was not experiencing an out of RAM issue. Of course, even a system that has very little free RAM may not have gone into swap. That's why you also must check the Swap: line and see if a high proportion of your swap is being used.
Track Down High RAM Usage

If you do find you are low on free RAM, go back to the same process output from top, only this time, look in the %MEM column. By default, top will sort by the %CPU column, so simply type M and it will re-sort to show you which processes are using the highest percentage of RAM. In the output in Listing 3, I sorted the same processes by RAM, and you can see that the nagios2db_status process is using the most at 6.6%.

Listing 3. Processes Sorted by RAM

PID USER   PR NI VIRT RES SHR S %CPU %MEM    TIME+ COMMAND
18749 nagios 16 0 140m 134m 1868 S   12 6.6   1345:01 nagios2db_status
9463 mysql 16 0 686m 111m 3328 S   53 5.5 569:17.64 mysqld
24636 nagios 17 0 34660 10m 712 S    8 0.5   1195:15 nagios
22442 nagios 24 0 6048 2024 1452 S    8 0.1   0:00.04 check_time.pl

I/O-Bound Load

I/O-bound load can be tricky to track down sometimes. As I mentioned earlier, if your system is swapping, it can make the load appear to be I/O-bound. Once you rule out swapping though, if you do have a high I/O wait, the next step is to attempt to track down which disk and partition is getting the bulk of the I/O traffic. To do this, you need a tool like iostat.

The iostat tool, like top, is a complicated and full-featured tool that could fill up its own article. Unlike top, although it should be available for your distribution, it may not be installed on your system by default, so you need to track down which package provides it. Under Red Hat and Debian-based systems, you can get it in the sysstat package. Once it's installed, simply run iostat with no arguments to get a good overall view of your disk I/O statistics:

Linux 2.6.24-19-server (hostname)     01/31/2009

avg-cpu: %user   %nice %system %iowait %steal   %idle
           5.73    0.07    2.03    0.53    0.00   91.64

Device:    tps Blk_read/s Blk_wrtn/s   Blk_read   Blk_wrtn
sda       9.82       417.96        27.53   30227262    1990625
sda1      6.55       219.10         7.12   15845129     515216
sda2      0.04         0.74         3.31      53506     239328
sda3      3.24       198.12        17.09   14328323    1236081

Like with top, iostat gives you the CPU percentage output. Below that, it provides a breakdown of each drive and partition on your system and statistics for each:

    tps: transactions per second.

    Blk_read/s: blocks read per second.

    Blk_wrtn/s: blocks written per second.

    Blk_read: total blocks read.

    Blk_wrtn: total blocks written.

By looking at these different values and comparing them to each other, ideally you will be able to find out first, which partition (or partitions) is getting the bulk of the I/O traffic, and second, whether the majority of that traffic is reads (Blk_read/s) or writes (Blk_wrtn/s). As I said, tracking down the cause of I/O issues can be tricky, but hopefully, those values will help you isolate what processes might be causing the load.

For instance, if you have an I/O-bound load and you suspect that your remote backup job might be the culprit, compare the read and write statistics. Because you know that a remote backup job is primarily going to read from your disk, if you see that the majority of the disk I/O is writes, you reasonably can assume it's not from the backup job. If, on the other hand, you do see a heavy amount of read I/O on a particular partition, you might run the lsof command and grep for that backup process and see whether it does in fact have some open file handles on that partition.

As you can see, tracking down I/O issues with iostat is not straightforward. Even with no arguments, it can take some time and experience to make sense of the output. That said, iostat does have a number of arguments you can use to get more information about different types of I/O, including modes to find details about NFS shares. Check out the man page for iostat if you want to know more.

Up until recently, tools like iostat were about the limit systems administrators had in their toolboxes for tracking down I/O issues, but due to recent developments in the kernel, it has become easier to find the causes of I/O on a per-process level. If you have a relatively new system, check out the iotop tool. Like with iostat, it may not be installed by default, but as the name implies, it essentially acts like top, only for disk I/O. In Listing 4, you can see that an rsync process on this machine is using the most I/O (in this case, read I/O).

Listing 4. Example iotop Tool Output

Total DISK READ: 189.52 K/s | Total DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN     IO>    COMMAND
8169 be/4 root 189.52 K/s   0.00 B/s 0.00 % 0.00 % rsync --server --se
4243 be/4 kyle    0.00 B/s   3.79 K/s 0.00 % 0.00 % cli /usr/lib/gnome-
4244 be/4 kyle    0.00 B/s   3.79 K/s 0.00 % 0.00 % cli /usr/lib/gnome-
    1 be/4 root    0.00 B/s   0.00 B/s 0.00 % 0.00 % init

Once You Track Down the Culprit

How you deal with these load-causing processes is up to you and depends on a lot of factors. In some cases, you might have a script that has gone out of control and is something you can easily kill. In other situations, such as in the case of a database process, it might not be safe simply to kill the process, because it could leave corrupted data behind. Plus, it could just be that your service is running out of capacity, and the real solution is either to add more resources to your current server or add more servers to share the load. It might even be load from a one-time job that is running on the machine and shouldn't impact load in the future, so you just can let the process complete. Because so many different things can cause processes to tie up server resources, it's hard to list them all here, but hopefully, being able to identify the causes of your high load will put you on the right track the next time you get an alert that a machine is slow.

Thursday, November 1, 2012

Check Linux Server connected switch port no

Using below command we can check switch name and switch port.

tcpdump -nn -v -i eth1 -s 1500 -c 1 'ether[20:2] == 0x2000'