top/back


Computer info


Contents


Our computers

We have followings computes.


Our disks

We have followings disks. The summary of each disk is given below.

/disk0 (glass.local, 26 TB (1.1 TB used)) (misc, backup)

/disk0/data/photonics (154 GB)
/disk0/data/spline_table
/disk0/data/IceCube/DataBranch (175 GB)
/disk0/data/IceCube/dominant_data (2.5 GB)
/disk0/data/IceCube/JULIeT (47 GB)
/disk0/data/IceCube/MCdata/9strings (8.8 GB)
/disk0/data/IceCube/MCdata/IceCube++ (14 GB)
/disk0/data/IceCube/MCdata/standard-candle (11 GB)
/disk0/work (18 GB)
/disk0/users (684 GB)

/disk1 (fridge.local, 26 TB (23 TB used))

/disk1/data/IceCube/RealData/40strings
/disk1/data/IceCube/RealData/79strings
/disk1/data/IceCube/RealData/86strings
/disk1/data/IceCube/RealData/beacon

/disk2 (attached to rum.local, 1.4 TB, only accesible from rum) (Real data 2007 (level0))

/disk2/oldusers (21 GB)
/disk2/data/IceCube/RealData/2007 (922 GB)

/disk3 (attached to grappa.local, 917 GB (744 GB used))

/disk3/data/IceCube/MCdata/22strings/V02-00-14/juliet

/disk4 (attached to cognac.local, 917 GB (723 GB used))

/misc/disk4/data/IceCube/MCdata/22strings/V02-00-14/corsika
/misc/disk4/data/IceCube/MCdata/22strings/V02-00-14/juliet

/disk5 (attached to caolila.local, 917 GB (33 GB used))

/disk5/data/IceCube/RealData/2009/testdata (33 GB)

/disk7 (attached to limoncello.local, 917 GB (771 GB used))

/disk7/data/IceCube/IceCube/MCdata/40strings/V02-02-13/juliet

/disk8 (attached to limoncello.local, 917 GB (342 GB used))

/disk8/data/IceCube/MCdata/40strings/
/disk8/data/IceCube/MCdata/80strings/
/disk8/data/IceCube/MCdata/9strings/

/disk9 (attached to limoncello.local, 917 GB (517 GB used))

/disk9/data/IceCube/RealData/22strings
/disk9/data/IceCube/MCdata/80strings
/disk9/data/IceCube/MCdata/DOMdata

/disk10 (freezer.local, 9.1 TB (8.5 TB used))

/disk10/data/IceCube/Analysis
/disk10/data/IceCube/MCdata/79strings
/disk10/data/IceCube/MCdata/80strings
/disk10/data/IceCube/MCdata/86strings
/disk10/data/IceCube/RealData/79strings

/disk11 (2.8 TB (2.6 TB used)) (IC22 CORSIKA data)

/disk11/icecube/data/MCData/22strings/V02-00-01/corsika/SIBYLL/proton (488 GB)
/disk11/icecube/data/MCData/22strings/V02-00-01/corsika/SIBYLL/iron (913 GB)
/disk11/icecube/data/MCData/22strings/V02-00-01/corsika/QGSII/proton (336 GB)
/disk11/icecube/data/MCData/22strings/V02-00-01/corsika/QGSII/iron (861 GB)

/disk12 (4.6 TB (4.6 TB used))

/disk12/data/IceCube/MCdata/79strings
/disk12/data/IceCube/MCdata/80strings
/disk12/data/IceCube/MCdata/dcorsika6900
/disk12/data/IceCube/RealData/40strings
/disk12/data/ARA/calibration
/disk12/data/ARA/opticalfiber

/disk13 (4.6 TB (4.2 TB used))

/disk13/data/IceCube/MCdata/80strings
/disk13/data/IceCube/RealData/40strings

/disk14 (2.8 TB (2.5 TB used))

/disk14/data/IceCube/MCdata/79strings
/disk14/data/IceCube/RealData/40strings

/disk15 (20 TB (13 TB used)) (scratch)

/disk15/data/IceCube/MCdata/79strings
/disk15/data/IceCube/MCdata/80strings
/disk15/data/IceCube/MCdata/86strings
/disk15/data/IceCube/MCdata/SCdata
/disk15/data/IceCube/MCdata/dcorsika6900
/disk15/data/IceCube/RealData/40strings
/disk15/scratch

/disk16 (19 TB (19 TB used))

/disk16/data/ARA/opticalFiber
/disk16/data/ARA/DTM*
/disk16/data/IceCube/MCdata/40strings
/disk16/data/IceCube/MCdata/59strings
/disk16/data/IceCube/MCdata/79strings
/disk16/data/IceCube/MCdata/80strings
/disk16/data/IceCube/MCdata/86strings
/disk16/data/IceCube/MCdata/dcorsika6900
/disk16/data/IceCube/MCdata/dcorsika6980
/disk16/data/IceCube/RealData/40strings
/disk16/data/IceCube/RealData/59strings

Network settings

ATM Network (s.chiba-u.ac.jp)

Gateway:133.82.130.254
Subnet Mask:255.255.255.0
DNS:133.82.130.1

Giga Network (phys.chiba-u.jp)

Gateway:10.25.254.254
Subnet Mask:255.255.0.0
DNS:10.245.1.95

Grappa administration

Instructions on how to create a new user can be found here:
 /home/icecube/instructions/HOW_TO_MAKE_A_NEW_USER 
Some instructions regarding software installations can be found in:
/home/icecube/instructions

How to use Condor system

Condor is one of a batch system which can manange jobs.

The home page is here, and the manual is here, where you can get the detailed info.

You can learn something in the above page, but you don't normaly need to read it just to use the condor. I will give you some useful commands, and how to use it.

Condor commands

Steering the condor file

You can find the info in this page.

Let me give you the short course.


Condor administration

The condor system running on grappa can be customized by modifying condor configuration files. These are located here:

/home/condor/condor/etc/condor_config
/home/condor/condor/etc/*.local
/home/condor/condor/etc/config.d

Occasionally the negotiator or scheduler are hanging, resulting in only a subset of the nodes accepting jobs. They can be restarted using these commands:

condor_restart -subsystem negotiator
condor_restart -subsystem schedd

Customization of slots on individual machines

By default condor creates one slot for each CPU on a machine, and then the memory is divided evenly between each slot. If we want to have more high memory slots we can customize them in the nodes condor config, e.g.:
/home/condor/condor/etc/machinename.local
For example customized slots can be configured like this:
##--------------------------------------------------------------------
##  condor_startd
##--------------------------------------------------------------------
SLOT_TYPE_1 = cpus=2, ram=8000MB
SLOT_TYPE_2 = cpus=2, ram=5800MB
NUM_SLOTS_TYPE_1 = 5
NUM_SLOTS_TYPE_2 = 1
The official documentation for this can be found
here.

Dealing with jobs overusing their requested resources

By default condor is not handling jobs overusing their requested resources in any special way. So it is possible to starve a machines memory. Usually in this case the out-of-memory killer from linux is handling this by scanning the list of tasks to decide which task to kill to free up memory. This process might get stuck before it finds a suitable task to kill, putting the whole machine into a bad state. The out-of-memory killer behavior can be adjusted in this file
 /etc/sysctl.conf 
The default value for the oom-killer task choice is
 vm.oom_kill_allocating_task = 0 
By setting
 vm.oom_kill_allocating_task = 1 
instead of performing the task scan the oom-killer will instead kill the process going over the memory limit. (A change in this file only becomes active after rebooting the machine.) Another possibility to handle these jobs is by configuring condor to put them on HOLD if they are overusing their requested resources. It is handled by this file on grappa:
 /home/condor/condor/etc/config.d/60_startd_hold 
This should put jobs on HOLD if they are using more memory than they requested with a small grace value, instead of just quietly killing them. The idea is that this will prevent the machines from dropping from condor and also make bookkeeping for users easier, because it should be more obvious which jobs are being killed/held.

How to use the Intel compiler

We have the Intel compiler (version 12.1) in our cluster. It's xx% (has to be evaluated) faster than the gcc 4.4.

It's installed at /opt/intel/bin.

In order to use icc (Intel compiler), you can alias gcc to icc such as below.

(You can write it in your .cshrc (in case of tcsh). You also have to make a path to /opt/intel/bin.)

alias gcc       'icc -static -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE'
alias cc        'icc -static -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE'

How to shutdown grappa system and restart the system

Note that you need the administrater privilage to shutdown the system. Ask the administrator (Keiichi or Shigeru) if necesary.

How to shutdown

  • Execute /usr/local/sbin/HALT at icebox
  • How to restart

  • Restart file servers manually by pressing the power button.
  • Restart grappa manually.
  • Execute /root/sbin/WOL at grappa. This commnad restart all HPC computers.
  • Other computers except for HPC (caolila and cognac) have to be restarted manually.
  • If the system is being restarted after a power shutdown, make sure to turn the air conditioning in the server room back on.

  • Several things to do when a new machine installed

    1. updates /etc/hosts for computers that is not HPC one.
    2. make a link to a new RAID disk at "/".
    3. copy /data/IceCube from grappa
    4. setup condor (configure /home/condor/condor. copy condor commands in /usr/local/bin.)

    Manual of ICRR Computer (Outdated. Sorry.)

    利用の手引や講習会の資料は、
    ここ にまとまっています。以下に、IceTray を使う上で重要なことをまとめます。
    
    

    1, ログイン

    ICRRの外からは、次のどちらかにログインします. icrlogin1.icrr.u-tokyo.ac.jp icrlogin2.icrr.u-tokyo.ac.jp job を投げたりするには、そこから更に、 icrhome6 にログインします。

    2, パスワード

    icrlogin1、icrlogin2、icrhome6 のパスワードは別々に管理されています。 どこか一ヶ所で変更しても、別の場所のパスワードはかわりません。 icrhome6 では passwd ではなく、chpasswd を使います。

    3, バッチジョブ

    3-1 基本

    Moab 千葉大の condor にあたるソフトは、Moabです。 使い方は PBS とほとんど同じですが、投入時のコマンドは msub です。 詳しくは、マニュアル (PDF ファイルHTML)を読んでください。 投入方法 計算機で走らせたいシェルスクリプト(***.sh)を用意して、 [someone@icrhome6 somewhere]$ msub ***.sh 確認方法 千葉とは違って、ジョブの数が多いので grep が便利です。 [someone@icrhome6 somewhere]$ showq | grep someone 消去方法 showqしたときに一番左に表示される数字(JOBID)を使います。 [someone@icrhome6 somewhere]$ canceljob JOBID 自分のジョブ全てをキャンセルするには [someone@icrhome6 somewhere]$ canceljob ALL を実行します。

    3-2 IceTray のジョブの投入

    condor と違って、一気にジョブを投入できないので、 ジョブ投入用シェルスクリプト、バッチスクリプト、パイソンスクリプト の3つを用意します。 ジョブ投入用シェルスクリプト ポイントは2つです。 + -V オプションをつけて、ジョブが投入される計算機にも今設定してる IceTray の環境変数を反映させる + [someone@icrhome6 somewhere]$ msub ***.sh 100 のようにして、***.sh で argv 等を使っても引数「100」は 認識されません。そこで、引数にしたい数や文字列は環境変数にして 無理矢理読み込ませます。 例) JulietwzCmcGr_nue_40str_msub.sh バッチスクリプト #PBS が頭にある行に Moab の設定を書きます。 ジョブのログが記録されたファイルと、作った i3ファイルとの 対応関係を明確にするため、この例では、 ジョブ番号を使いました。($PBS_JOBID) 例) JulietwzCmcGr_nue_40str.sh パイソンスクリプト いつものパイソンです。例は、IC40 の nu_e を生成するスクリプトです。 例) JulietwzCmcGr_nue_40str.py

    4, 環境変数、フォトニクステーブル

    /icrr/work/icecube/tools の下に必要なものがすべてあります。 具体的には、以下のような感じです。 setenv I3_PORTS /icrr/work/icecube/tools/i3_ports setenv JAVA_HOME /icrr/work/icecube/tools/java/jdk1.6.0_04 setenv ROOTSYS /icrr/work/icecube/tools/i3_ports/root-v5.16.00 tables -> /icrr/work/icecube/tools/photonics_table/AHAmodel/

    Keiichi Mase
    Mio Ono
    Last modified: 2009-05-18 19:08:55