Category Archives: Oracle

Root Filesystem Full – No Space Left on Device due to open files

Here’s an interesting scenario that I was asked to look into recently with the root file system on an Oracle database server filling up. Normally cleaning up disk space is straight forward; find the large and/or old files and delete them. However, in this case there was a difference is space usage reported between df and du, and the find utility could not locate any file over 1G in size.

Here’s the status of the root file system which was causing the “No Space Left on Device” error message.

# df -h .
Filesystem            Size  Used Avail Use% Mounted on
                       30G   30G     0 100% /

After deleting around 2G of old files and logs, the error went away but the output of df -h showed the root file system slowing filling up again. These directory sizes hardly changed at all, only MB differences. From the “/” directory, here are all the directories that are on the “/” file system as seen in df -h $dir .

# du -sh *
7.7M     bin
67M     boot
3.5M     dev
8.5M     etc
1.2G     home
440M     lib
28M     lib64
16K     lost+found
4.0K     media
1.2G     mnt
6.8G     opt
1.1G     root
41M     sbin
4.0K     selinux
4.0K     srv
0     sys
12M     tmp
3.0G     usr
260M     var

Notice here that the sum of these directories only adds up to around 15G, leaving the rest of the used space unaccounted for, and the file system used space was still increasing.

Next was to look at open files. It is worth mentioning here that even if a file is deleted, it’s space may not be reclaimed if the process that created it, or still using it, is still running. Using the lsof ( list open files ) utility will show these files.

# lsof | grep deleted
expdp      7271  oracle    1w      REG              253,0 16784060416    2475867 /home/oracle/nohup.out (deleted)
expdp      7271  oracle    2w      REG              253,0 16784060416    2475867 /home/oracle/nohup.out (deleted)
# ps -ef | grep 7271
oracle    7271     1 99 May31 ?        3-10:43:36 expdp               directory=DP_DIR dumpfile=exp_schema.dmp logfile=exp_schema.log schemas=schema

The above shows an export data pump job ( pid = 7271 ) whose process was still running at the OS level, although it was not running in the database. This job was probably canceled out for some reason, but was not cleaned up although the nohup file was deleted. The background process was still running at the OS level and the nohup.out file is taking up the space filling up the “/” partition. It is worth mentioning here that the use of nohup is NOT desired with data pump. The data pump utilities are server side processes; if you kick off a job and then loose your terminal for whatever reason, the data pump job is still running.

Once the expdp process 7271 was killed at the OS level, the space was reclaimed.

# df -h .
Filesystem            Size  Used Avail Use% Mounted on
                       30G   13G   16G  45% /

HugePages Configuration and Monitoring for Oracle

Here is a blog I did earlier last year when trying to get HugePages configured on a server that was running Oracle 10g.

Implementing HugePages has become common practice with Oracle 11g and is fairly well documented in MOS Note 361468.1.
The basics steps are as follows:

* Set the memlock ulimit for the oracle user.
* Disable Automatic Memory Managment if necesary as it is incompatible with HugePages.
* Run the Oracle supplied script to calculate the recommended value for the vm.nr_hugepages kernel parameter.
* Edit /etc/sysctl.conf with the vm.nr_hugepages with the recommeneded setting.
* Reboot the server

Unfortunately, The database we were working with was 10g. As it turns out, there are some differences between
Oracle 10 and 11, mainly that there is no HugePage logging in the alert log on version 10. See MOS Note: 1392543.1

Click here to see what to look for and how to troubleshoot HugePages implementation.