Development servers =================== VDQM servers: lxb8294 (dedicated PC) lxcastordev04 (virtual PC) Disk servers: lxc2disk07 lxc2disk08 Current tape servers: tpsrv202 (ibmlib1) tpsrv203 (ibmlib1) Past tape servers (do not use): tpsrv250 (ibmlib3) tpsrv222 (ibmlib3) tpsrv233 (ibmlib1) tpsrv234 (ibmlib1) Tape pools: vdqm2_test vdqm2_special How to find the CASTOR 64-bit SLC4 rpms ======================================= Here are 3 methods to find the CASTOR 64-bit SLC4 rpms: Method 1 -------- ssh lxservb04 cd /var/www/html/swrep/x86_64_slc4 Method 3 -------- lynx http://swrepsrv.cern.ch/swrep/x86_64_slc4/ Method 4 -------- cd /afs/cern.ch/project/castor/www/DIST/CERN/savannah/CASTOR.pkg/2.1.9-3/SL4/x86_64 How to update an installed rpm ============================== ssh root@c2itdcsrv102 rpm -Uvh--force /castor-rtcopyclient-server-2.1.8-7.x86_64.rpm How to increase the maximum number of simultenous XE sessions and processes =========================================================================== alter system set sessions=250 scope=spfile; alter system set processes=200 scope=spfile; How to disable all of the tapes in a pool ========================================= vmgrlisttape -P tape_dev | awk '{print "sudo -u gordon vmgrmodifytape -V " $1 " --st DISABLED";};' > doit.sh sh doit.sh vmgrlisttape -P tape_dev I10547 I10547 IBM_LIB1 700GC aul tape_dev 700.00GiB 20081103 DISABLED I10548 I10548 IBM_LIB1 700GC aul tape_dev 700.00GiB 20081024 DISABLED I10549 I10549 IBM_LIB1 700GC aul tape_dev 700.00GiB 20081013 DISABLED I10550 I10550 IBM_LIB1 700GC aul tape_dev 700.00GiB 20081028 DISABLED I10551 I10551 IBM_LIB1 700GC aul tape_dev 700.00GiB 00000000 DISABLED I10552 I10552 IBM_LIB1 700GC aul tape_dev 700.00GiB 00000000 DISABLED I10553 I10553 IBM_LIB1 700GC aul tape_dev 700.00GiB 00000000 DISABLED I10554 I10554 IBM_LIB1 700GC aul tape_dev 700.00GiB 00000000 DISABLED How to determine the columns of a multi-column unique constraint ================================================================ [root@lxcastordev04 CASTOR2]# sqlplus castor@castor SQL*Plus: Release 10.2.0.3.0 - Production on Tue Nov 11 13:41:37 2008 Copyright (c) 1982, 2006, Oracle. All Rights Reserved. Enter password: Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production With the Partitioning, Real Application Clusters and Data Mining options SQL> column column_name format a11 SQL> select index_name from user_constraints where constraint_name = 'TAPESEG'; INDEX_NAME ------------------------------ TAPESEG SQL> select table_name, column_name, column_position from user_ind_columns where index_name = 'TAPESEG'; TABLE_NAME COLUMN_NAME COLUMN_POSITION ------------------------------ ----------- --------------- CNS_SEG_METADATA VID 1 CNS_SEG_METADATA SIDE 2 CNS_SEG_METADATA FSEQ 3 SQL> How to modify posovl to log its command-line arguments ====================================================== RCS file: /local/reps/castor/CASTOR2/tape/posovl.c,v retrieving revision 1.37 diff -r1.37 posovl.c 103a104,135 > { > int bufLen = 1024; > char buf[bufLen]; > char *arg = NULL; > int resultLen = 0; > int argLen = 0; > int i=0; > > for(i=0; i // tplogit(func, "argv[%d]=%s\n", i, argv[i]); > > arg = argv[i]; > argLen = strlen(arg); > > if( > (resultLen + argLen + 3) > /* 3 = leading space plus 2 quotes */ > (bufLen - 3) /* 3 = carriage return (max 2 chars) plus end of string*/ > ) { > tplogit(func, "posovl command-line is too long"); > } else { > strncpy(&(buf[resultLen]), " '", 2); > resultLen += 2; > strncpy(&(buf[resultLen]), arg, argLen); > resultLen += argLen; > strncpy(&(buf[resultLen]), "'", 1); > resultLen += 1; > } > } > strcpy(&(buf[resultLen]), "\n"); > tplogit(func, buf); > } > How to attach to posovl ======================= `ps -ef | grep posovl | grep -v grep | awk '{print "gdb attach " $2;}'` How to list the contents of an rpm in the repository ==================================================== rpm -qlp http://swrepsrv.cern.ch/swrep/x86_64_slc4/castor-vdqm-server-2.1.7-4.x86_64.rpm How to find the test resources ============================== Tape drives: 35923004@tpsrv222 35923005@tpsrv250 Tapes: # vmgrlisttape -P vdqm2_test I07754 I07754 IBM_LIB3 700GC nl vdqm2_test 700.00GiB 20081006 I07762 I07762 IBM_LIB3 700GC nl vdqm2_test 700.00GiB 20081006 I07764 I07764 IBM_LIB3 700GC nl vdqm2_test 700.00GiB 20081006 I07766 I07766 IBM_LIB3 700GC nl vdqm2_test 700.00GiB 20081006 I10486 I10486 IBM_LIB3 700GC aul vdqm2_test 698.68GiB 20081006 I10487 I10487 IBM_LIB3 700GC aul vdqm2_test 700.00GiB 20081006 I10488 I10488 IBM_LIB3 700GC aul vdqm2_test 700.00GiB 20081006 I10489 I10489 IBM_LIB3 700GC aul vdqm2_test 700.00GiB 20081006 # vmgrlisttape -P vdqm2_special T16624 T16624 SL8500_1 500GC aul vdqm2_special 500.00GiB 00000000 T20676 T20676 SL8500_1 500GC aul vdqm2_special 500.00GiB 20080211 T20815 T20815 SL8600_0 500GC aul vdqm2_special 500.00GiB 00000000 T20851 T20851 SL8600_0 500GC aul vdqm2_special 500.00GiB 00000000 # vmgrlisttape -P tape_dev I10547 I10547 IBM_LIB1 700GC aul tape_dev 700.00GiB 20081102 I10548 I10548 IBM_LIB1 700GC aul tape_dev 700.00GiB 20081024 I10549 I10549 IBM_LIB1 700GC aul tape_dev 700.00GiB 20081013 I10550 I10550 IBM_LIB1 700GC aul tape_dev 700.00GiB 20081028 I10551 I10551 IBM_LIB1 700GC aul tape_dev 700.00GiB 00000000 I10552 I10552 IBM_LIB1 700GC aul tape_dev 700.00GiB 00000000 I10553 I10553 IBM_LIB1 700GC aul tape_dev 700.00GiB 00000000 I10554 I10554 IBM_LIB1 700GC aul tape_dev 700.00GiB 00000000 Umbrello service ================ pcitadc19 Database resources ================== stage_dev04@castor64 vdqm_dev04@castor64 vdqm_dev04_writer@castor64 # vdqm@castordev64 # vdqm_writer@castordev64 # vdqm@c2vdqmdb # vdqm@stevelap # vdqm_writer@stevelap How to give access rights to create a new tape pool =================================================== Cupvadd --user nbessone --group cs --src lxcastordev08.cern.ch --tgt lxcastordev04.cern.ch --priv ADMIN How to add a tape to a pool =========================== vmgrmodifytape -V I50068 -P vdqm2_special How to setup VDQM permissions ============================= All stagers, administration PCs and tpread/tpwrite client PCs should have the TP_OPER privilege in order for the VDQM to allow them to delete volume requests. Please note that the tpread and tpwrite command-line tools will and should delete volume request when they receive Ctrl-C. This is why tpread/tpwrite client PCs should have the TP_OPER privilege. Example: Cupvadd --user stage --group st --src lxcastordev04.cern.ch --tgt lxcastorsrv102.cern.ch --priv TP_OPER Where: lxcastordev04.cern.ch is a stager and a tpread/tpwrite client PC lxcastorsrv102.cern.ch is the VDQM server How to setup VMGR permissions ============================= Tape servers require the CUPV TP_SYSTEM privilege in order for the VMGR to allow them to mount tapes: Example: Cupvadd --user root --group root --src '^tpsrv.*.cern.ch' --tgt lxcastorsrv102.cern.ch --priv TP_SYSTEM Where: '^tpsrv.*.cern.ch' identifies the tape servers lxcastorsrv102.cern.ch is the VMGR server The TP_OPER privilege is required to use the following command-line tools: * vmgrdeltag * vmgrentertape * vmgrmodifytape * vmgrsettag Example: Cupvadd --user developer --group developers --src 'devbox.cern.ch' --tgt lxcastorsrv102.cern.ch --priv TP_OPER The ADMIN privilege is required to use the following command-line tools: * vmgrdeletedenmap * vmgrdeletedgnmap * vmgrdeletelibrary * vmgrdeletemodel * vmgrdeletepool * vmgrdeletetape * vmgrenterdenmap * vmgrenterdgnmap * vmgrenterlibrary * vmgrentermodel * vmgrenterpool * vmgrmodifylibrary * vmgrmodifypool * reclaim Example: Cupvadd --user root --group root --src 'localhost.localdomain' --tgt 'lxcastordev04.cern.ch' --priv ADMIN How to populate a new VMGR database =================================== vmgrentermodel --mo 3592 --ml J --mc 250 vmgrenterdenmap -d 1000GC --ml J --mo 3592 --nc 931G vmgrenterdenmap -d 700GC --ml J --mo 3592 --nc 651G vmgrenterdenmap -d 500GC --ml J --mo 3592 --nc 465G vmgrenterlibrary --name IBMLIB1A --capacity 6600 vmgrenterlibrary --name IBMLIB1B --capacity 6600 vmgrenterlibrary --name IBMLIB2A --capacity 6600 vmgrenterlibrary --name IBMLIB2B --capacity 6600 vmgrenterlibrary --name IBMLIB3A --capacity 6600 vmgrenterlibrary --name IBMLIB3B --capacity 6600 vmgrenterdgnmap -g 359B1A --mo 3592 --library IBMLIB1A vmgrenterdgnmap -g 359B1B --mo 3592 --library IBMLIB1B vmgrenterdgnmap -g 359B2A --mo 3592 --library IBMLIB2A vmgrenterdgnmap -g 359B2B --mo 3592 --library IBMLIB2B vmgrenterdgnmap -g 359B3A --mo 3592 --library IBMLIB3A vmgrenterdgnmap -g 359B3B --mo 3592 --library IBMLIB3B vmgrenterpool --group st --name aggregator_dev --user stage vmgrenterpool --group st --name bad_dev --user stage vmgrenterpool --group st --name german_dev --user stage vmgrenterpool --group st --name giulia_dev --user stage vmgrenterpool --group st --name nbessone_dev --user stage vmgrenterpool --group st --name repack_src --user stage vmgrenterpool --group st --name repack_dst --user stage vmgrenterpool --group st --name repack_spare --user stage vmgrenterpool --group st --name spare_dev --user stage vmgrenterpool --group st --name stager_cert1 --user stage vmgrenterpool --group st --name stager_cert2 --user stage vmgrenterpool --group st --name stager_cert3 --user stage vmgrenterpool --group st --name stager_cert4 --user stage vmgrenterpool --group st --name stager_dev01 --user stage vmgrenterpool --group st --name stager_dev02 --user stage vmgrenterpool --group st --name stager_dev03 --user stage vmgrenterpool --group st --name stager_dev04 --user stage vmgrenterpool --group st --name stager_dev05 --user stage vmgrenterpool --group st --name stager_dev06 --user stage vmgrenterpool --group st --name stager_dev07 --user stage vmgrenterpool --group st --name stager_dev08 --user stage vmgrenterpool --group st --name tprw_dev --user stage vmgrenterpool --group st --name tpsrv249_dev --user stage vmgrentertape -V I02000 --mo 3592 --ml J --li IBMLIB1A -d 1000GC -l aul --po aggregator_dev vmgrentertape -V I05236 --mo 3592 --ml J --li IBMLIB1A -d 1000GC -l aul --po bad_dev vmgrentertape -V I12307 --mo 3592 --ml J --li IBMLIB1A -d 1000GC -l aul --po bad_dev vmgrentertape -V I18020 --mo 3592 --ml J --li IBMLIB1A -d 1000GC -l aul --po bad_dev vmgrentertape -V I02001 --mo 3592 --ml J --li IBMLIB1B -d 1000GC -l aul --po giulia_dev vmgrentertape -V I02008 --mo 3592 --ml J --li IBMLIB1B -d 1000GC -l aul --po giulia_dev vmgrentertape -V I02011 --mo 3592 --ml J --li IBMLIB1A -d 1000GC -l aul --po nbessone_dev vmgrentertape -V I10551 --mo 3592 --ml J --li IBMLIB1A -d 1000GC -l aul --po stager_dev01 vmgrentertape -V I10552 --mo 3592 --ml J --li IBMLIB1A -d 1000GC -l aul --po stager_dev01 vmgrentertape -V I10553 --mo 3592 --ml J --li IBMLIB1A -d 1000GC -l aul --po spare_dev vmgrentertape -V I02017 --mo 3592 --ml J --li IBMLIB1B -d 1000GC -l aul --po stager_cert1 vmgrentertape -V I02018 --mo 3592 --ml J --li IBMLIB1B -d 1000GC -l aul --po stager_cert2 vmgrentertape -V I02020 --mo 3592 --ml J --li IBMLIB1B -d 1000GC -l aul --po stager_cert3 vmgrentertape -V I02021 --mo 3592 --ml J --li IBMLIB1B -d 1000GC -l aul --po stager_cert4 vmgrentertape -V I10554 --mo 3592 --ml J --li IBMLIB1A -d 1000GC -l aul --po tprw_dev vmgrentertape -V I00075 --mo 3592 --ml J --li IBMLIB1B -d 1000GC -l aul --po stager_dev02 vmgrentertape -V I02005 --mo 3592 --ml J --li IBMLIB1B -d 1000GC -l aul --po stager_dev02 vmgrentertape -V I02022 --mo 3592 --ml J --li IBMLIB1B -d 1000GC -l aul --po stager_dev03 vmgrentertape -V I02023 --mo 3592 --ml J --li IBMLIB1B -d 1000GC -l aul --po stager_dev03 vmgrentertape -V I02024 --mo 3592 --ml J --li IBMLIB1B -d 1000GC -l aul --po stager_dev04 vmgrentertape -V I02025 --mo 3592 --ml J --li IBMLIB1B -d 1000GC -l aul --po stager_dev04 vmgrentertape -V I02026 --mo 3592 --ml J --li IBMLIB1B -d 1000GC -l aul --po stager_dev05 vmgrentertape -V I02034 --mo 3592 --ml J --li IBMLIB1B -d 1000GC -l aul --po stager_dev05 vmgrentertape -V I02036 --mo 3592 --ml J --li IBMLIB1B -d 1000GC -l aul --po stager_dev06 vmgrentertape -V I02039 --mo 3592 --ml J --li IBMLIB1B -d 1000GC -l aul --po stager_dev06 vmgrentertape -V I02040 --mo 3592 --ml J --li IBMLIB1B -d 1000GC -l aul --po stager_dev07 vmgrentertape -V I02043 --mo 3592 --ml J --li IBMLIB1B -d 1000GC -l aul --po stager_dev07 vmgrentertape -V I02053 --mo 3592 --ml J --li IBMLIB1B -d 1000GC -l aul --po stager_dev08 vmgrentertape -V I02056 --mo 3592 --ml J --li IBMLIB1B -d 1000GC -l aul --po stager_dev08 vmgrentertape -V I02061 --mo 3592 --ml J --li IBMLIB1A -d 1000GC -l aul --po german_dev vmgrentertape -V I02062 --mo 3592 --ml J --li IBMLIB1A -d 1000GC -l aul --po german_dev How to modify the density of a tape =================================== TAPES='I10551 I10552 I10553 I10554'; for TAPE in $TAPES;do echo "Checking $TAPE"; vmgrmodifytape -V $TAPE -d 1000GC; done VMGR permissions ================ P_ADMIN vmgr_srv_reclaim P_TAPE_OPERATOR vmgr_srv_modifytape How to reclaim a tape ===================== Make sure th tape has no file entries in the name server: nslistape -V I07754 nsrm the_full_path_name Set the status of the tape to FULL: su gordon -c 'vmgrmodifytape -V I07754 --st FULL' Code generation scripts: vmgrlisttape -P vdqm2_test | awk '{print "nslisttape -V "$1;}' vmgrlisttape -P vdqm2_test | awk "{print \"su gordon -c 'vmgrmodifytape -V \" \$1 \" --st FULL'\";}" vmgrlisttape -P vdqm2_test | awk "{print \"su gordon -c 'reclaim -V \" \$1 \"'\";}" How to determine tape server status =================================== tpstat mt status How to bring a up and down a tape drive ======================================= Either: tpmaint --start tpmaint --stop or: network tpconfig 35923005 up network tpconfig 35923005 down How to modify a quattor template ================================ [lxadm02] /afs/cern.ch/user/m/murrayc3 > cdbop quattor CDB CLI: Version 2.1.14 Connecting to https://cdbserv.cern.ch... Welcome to CDB Command Line Interface Opening session... [INFO] session opened with ID Type 'help' for more info get profiles/profile_lxb1366 [INFO] 'profiles/profile_lxb1366.tpl': received get profiles/profile_lxb Display all 4070 possibilities? (y or n) !pico profiles/profile_lxb1366.tpl up profiles/profile_lxb1366.tpl [INFO] '/profiles/profile_lxb1366': scheduled to be updated com [INFO] '/profiles/profile_lxb1366': will be updated please confirm [yes]: [INFO] please wait... [INFO] commit OK [root@lxb1366 root]# ccm-fetch [root@lxb1366 root]# ncm-ncd --co castorconf Commands for setting up the environment of a stager client: export STAGE_HOST=lxb1366.cern.ch export RFIO_USE_CASTOR_V2=YES export STAGE_SVCCLASS=vdqm2_test Command for setting up the environment of a VDQM client: export VDQM_HOST=lab8294.cern.ch Command used to display the environment of a stager client: env | egrep "STAGE_HOST|RFIO_USE_CASTOR_V2|STAGE_SVCCLASS" How to upgrage castordev04 ========================== Stop all daemons /etc/init.d/dlfserver stop /etc/init.d/expertd stop /etc/init.d/transfermanagerd stop /etc/init.d/rhd stop /etc/init.d/stagerd stop STAGER DB drop http://isscvs.cern.ch/cgi-bin/viewcvs-all.cgi/CASTOR2/castor/db/castor_oracle_drop.sqlplus?revision=1.14&root=castor create: http://isscvs.cern.ch/cgi-bin/viewcvs-all.cgi/CASTOR2/castor/db/castor_oracle_create.sqlplus?revision=1.87&root=castor ssh murrayc3@lxadm cdbop get prod/cluster/castordev/instance/dev04 !vi prod/cluster/castordev/instance/dev04.tpl up prod/cluster/castordev/instance/dev04.tpl com exit exit ssh root@lxcastordev04 spma_wrapper.sh Start all daemons /etc/init.d/dlfserver start /etc/init.d/expertd start /etc/init.d/transfermanagerd start /etc/init.d/rhd start /etc/init.d/stagerd start for cname in `nslistclass | grep NAME | awk '{print $2}'` ; do enterFileClass --Name $cname --GetFromCns ; done enterSvcClass --Name default --DiskPools default enterSvcClass --Name dev --DiskPools extra enterSvcClass --Name diskonly --DiskPools extra --DiskOnlyBehavior yes --ForcedFileClass temp moveDiskServer default lxc2disk07.cern.ch moveDiskServer extra lxc2disk08.cern.ch rmAdminNode -n lxc2disk07.cern.ch -r -R rmAdminNode -n lxc2disk08.cern.ch -r -R How to create a new service class ================================= enterSvcClass --Name vdqm2_test --NbDrives 2 --TapePools vdqm2_test --DiskPools default modifySvcClass --Name vdqm2_test --RemoveDiskPool default modifySvcClass --Name vdqm2_test --AddDiskPool vdqm2_test How to add a tape pool to and enable a stream for a service class ================================================================= modifySvcClass --Name default --AddTapePools vdqm2_test modifySvcClass --Name default --NbDrives 1 How to restart the CASTOR servers within a stager ================================================= /etc/init.d/rechandlerd restart /etc/init.d/stagerd restart /etc/init.d/rtcpclientd restart /etc/init.d/transfermanagerd restart /etc/init.d/mighunterd restart default How to create files full of random data in a castor-disk-server mount-point =========================================================================== sudo -u stage sh dd if=/dev/urandom of=/srv/castor/01/00/oneGFile bs=1M count=1024 dd if=/dev/urandom of=/srv/castor/01/00/twoGFile bs=1M count=2048 How to write a file to the end of a tape using tpwrite ====================================================== sudo -u stage tpwrite -V I10547 -l aul -F F -q n1 lxc2disk07:/srv/castor/01/00/oneGFile sudo -u stage tpwrite -V I10548 -l aul -F F -q n1 lxc2disk07:/srv/castor/01/00/oneGFile sudo -u stage tpwrite -V I10549 -l aul -F F -q n1 lxc2disk07:/srv/castor/01/00/oneGFile sudo -u stage tpwrite -V I10550 -l aul -F F -q n1 lxc2disk07:/srv/castor/01/00/oneGFile sudo -u stage tpwrite -V I10551 -l aul -F F -q n1 lxc2disk07:/srv/castor/01/00/oneGFile How to test the drive down with more request for other tapes ============================================================ ssh tpsrv202 tpmaint -stop sudo -u stage tpread -V I10547 -q 1 -FF lxc2disk07:/dev/null & sudo -u stage tpread -V I10548 -q 1 -FF lxc2disk07:/dev/null & ssh tpsrv202 tpmaint -start sleep 5 ssh tpsrv202 tpmaint -stop How to test the drive up and down status messages from two drives ================================================================= i=0; while true; do let i=i+1; echo -n "$i: "; date; echo "$i: Bringing drive up"; ssh tpsrv202 tpconfig 35921001 up; echo "$i: Deleting drive"; vdqm_admin -deldrv -dgn 3592B1 -drive 35921001 -server tpsrv202 ; echo "$i: Bringing drive down"; ssh tpsrv202 tpconfig 35921001 down; echo "$i: Deleting drive"; vdqm_admin -deldrv -dgn 3592B1 -drive 35921001 -server tpsrv202 ; done How do execute the same command on all of the tape servers of a DGN =================================================================== wassh -c tapeserver/3592b2 -l root 'grep -c "Error in smcmount" /var/spool/tape/log' |sort How to read a segment from tape using tpread ============================================ sudo -u stage tpread -V I10547 -q 1 -FF lxc2disk07:/dev/null sudo -u stage tpread -V I10548 -q 1 -FF lxc2disk07:/dev/null How to write 1000 2G files to the disk cache ============================================ i=1; while test $i -lt 500; do filename=/castor/cern.ch/user/m/murrayc3/vdqm_test2/test$i; echo $filename; rfcp twoGFile $filename; let i=i+1; done; How to write many times the same file to several tapes ====================================================== export VDQM_HOST=lxcastordev04; tapes=`vmgrlisttape -P vdqm2_test | grep ul| egrep -v "READ-ONLY|DISABLED" | awk '{print $1;}'` ; for tape in $tapes ; do i=0 ; while test $i -lt 10 ; do file=/srv/castor/01/00/oneGFile; cmd="sudo -u stage tpwrite -V $tape -F F -q n1 $file" ; echo $cmd; sh -c "$cmd &" ; let i=i+1 ; done ; done How to read several times the first segment from each of serveral tapes ======================================================================= export VDQM_HOST=lxcastordev04; dest=lxc2disk07; tapes=`vmgrlisttape -P vdqm2_test | egrep -v "READ-ONLY|DISABLED" | awk '{print $1;}'`; for tape in $tapes; do i=0; while test $i -lt 10; do cmd="sudo -u stage tpread -V $tape -q 1 -FF ${dest}:/dev/null"; sh -c "$cmd &"; let i=i+1; done; done export VDQM_HOST=lxcastordev04; dest=lxc2disk07; tapes=`vmgrlisttape -P tape_dev | egrep -v "READ-ONLY|DISABLED" | awk '{print $1;}'`; for tape in $tapes; do i=0; while test $i -lt 10; do cmd="sudo -u stage tpread -V $tape -q 1 -FF ${dest}:/dev/null"; sh -c "$cmd &"; let i=i+1; done; done export VDQM_HOST=lxcastordev04; dest=lxc2disk08; tapes=`vmgrlisttape -P tape_dev | egrep -v "READ-ONLY|DISABLED" | awk '{print $1;}'`; for tape in $tapes; do i=0; while test $i -lt 10; do cmd="sudo -u stage tpread -V $tape -q 1 -FF ${dest}:/dev/null"; sh -c "$cmd &"; let i=i+1; done; done How to read twice, read and then write, write twice, write and then read ======================================================================== # Read twice export TAPE=I10554; export DISKFILE=lxc2disk07:/srv/castor/01/00/test; ssh root@lxcastordev04 'echo -e "\n\n"`date`" : read read test\n\n" >> /var/spool/vdqm/log'; sh -c "sudo -u stage tpread -V $TAPE -q 1 -FF $DISKFILE &"; sleep 5; sh -c "sudo -u stage tpread -V $TAPE -q 1 -FF $DISKFILE &" # Read then write export TAPE=I10554; export DISKFILE=lxc2disk07:/srv/castor/01/00/test; ssh root@lxcastordev04 'echo -e "\n\n"`date`" : read write test\n\n" >> /var/spool/vdqm/log'; sh -c "sudo -u stage tpread -V $TAPE -q 1 -FF $DISKFILE &"; sleep 5; sh -c "sudo -u stage tpwrite -V $TAPE -F F -q n1 $DISKFILE &" # Write twice export TAPE=I10554; export DISKFILE=lxc2disk07:/srv/castor/01/00/test; ssh root@lxcastordev04 'echo -e "\n\n"`date`" : write write test\n\n" >> /var/spool/vdqm/log'; sh -c "sudo -u stage tpwrite -V $TAPE -F F -q n1 $DISKFILE &" ; sleep 5; sh -c "sudo -u stage tpwrite -V $TAPE -F F -q n1 $DISKFILE &" # Write then read export TAPE=I10554; export DISKFILE=lxc2disk07:/srv/castor/01/00/test; ssh root@lxcastordev04 'echo -e "\n\n"`date`" : write read test\n\n" >> /var/spool/vdqm/log'; sh -c "sudo -u stage tpwrite -V $TAPE -F F -q n1 $DISKFILE &" ; sleep 5; sh -c "sudo -u stage tpread -V $TAPE -q 1 -FF $DISKFILE &" How to remove all drives from the VDQM ====================================== echo "export VDQM_HOST=lxcastordev04" > tmp.sh showqueues -D -x | sed 's/@/ /' | awk '{print "echo vdqm_admin -deldrv -dgn " $2 " -drive " $3 " -server " $4 "\nvdqm_admin -deldrv -dgn " $2 " -drive " $3 " -server " $4}' >> tmp.sh sh tmp.sh rm -f tmp.sh How to remove all requests from the VDQM ======================================== echo "export VDQM_HOST=lxcastordev04" > tmp.sh showqueues -x | egrep "^Q" | awk '{print "vdqm_admin -delvol -reqid " $5;}' >> tmp.sh sh tmp.sh rm -f tmp.sh How to write 1000 files as one file to a tape and then read them back using dd ============================================================================== # Create the files i=0; padded=000; while test $i -lt 1000; do if test $i -lt 10; then padded=00$i; elif test $i -lt 100; then padded=0$i; else padded=$i; fi; echo value=${padded} > file${padded}.txt; let i=i+1; done # Tpwrite the files su stage bash `echo -n "tpwrite -V I10487 -F F -q 1-"; i=0; padded=000; while test $i -lt 1000; do if test $i -lt 10; then padded=00$i; elif test $i -lt 100; then padded=0$i; else padded=$i; fi; echo -n " lxc2disk07:/srv/castor/01/00/file${padded}.txt" ; let i=i+1; done; echo` # Mount the tape smc -m -D 5 -h ibmlib3rmc -V I10487 # Read all blocks up to and including the two consequtive tape marks at the end # of the tape. First block is block 0 mt rewind; i=0; while test $i -lt 12; do echo "READING BLOCK $i"; dd if=/dev/nst0 count=1 ibs=256k; let i=i+1; done How to see the status of a tape robot ===================================== ssh root@tpsrv250 smc -h tpsrv250 -q D How to mount, unload and then dismount a tape ============================================= [root@lxc2dev4d2 ~]# smc -h localhost -q D 0 1 free 1 2 free [root@lxc2dev4d2 ~]# smc -h localhost -q V V42001 1024 slot V42002 1025 slot V42003 1026 slot V42004 1027 slot V42005 1028 slot [root@lxc2dev4d2 ~]# smc -m -D 0 -h localhost -V V42001 [root@lxc2dev4d2 ~]# smc -h localhost -q D 0 1 loaded V42001 1 2 free [root@lxc2dev4d2 ~]# smc -h localhost -q V V42001 1 drive V42002 1025 slot V42003 1026 slot V42004 1027 slot V42005 1028 slot [root@lxc2dev4d2 ~]# smc -d -D 0 -h localhost -V V42001 [root@lxc2dev4d2 ~]# smc -h localhost -q D 0 1 free 1 2 free [root@lxc2dev4d2 ~]# smc -h localhost -q V V42001 1024 slot V42002 1025 slot V42003 1026 slot V42004 1027 slot V42005 1028 slot [root@lxc2dev4d2 ~]# How to set the read priority of tape for a single-mount ======================================================= [root@castorsrv203 ~]# vdqmsetpriority -v I02522 -a read -l singleMount -p 1000 [root@castorsrv203 ~]# vdqmsetpriority -v I02926 -a read -l singleMount -p 1000 [root@castorsrv203 ~]# vdqmlistrequest | egrep "I02522|I02926" 3889467 3592B1 I02926 read Nov 12 15:56:24 1000 3889473 3592B1 I02522 read Nov 12 15:56:25 1000 How to force recalls ==================== stager_rm -S '*' -M /castor.... stager_get -M /castor..... How to test dedications =========================== # Hold all requests vdqm_admin -dedicate -dgn 3592B1 -drive 35921001 -server tpsrv202 -match host=pippo; vdqm_admin -dedicate -dgn 3592B1 -drive 35921002 -server tpsrv203 -match host=pippo # Launch read requests unset DISPLAY; tapes="I10550 I10554"; i=0; while test $i -lt 10; do let i=i+1; for tape in $tapes; do cmd="ssh root@lxc2disk07 sudo -u stage tpread -V $tape -q 1 -FF lxc2disk07:/srv/castor/01/00/test1"; sh -c "$cmd &"; cmd="ssh root@lxc2disk08 sudo -u stage tpread -V $tape -q 1 -FF lxc2disk08:/srv/castor/01/00/test2"; sh -c "$cmd &"; done; done # Initial VID ssh root@lxcastordev04 'echo -e "\n\n"`date`" : dedicate test tpsrv202=I10550 tpsrv203=I10554\n\n" >> /var/spool/vdqm/log'; vdqm_admin -dedicate -dgn 3592B1 -drive 35921001 -server tpsrv202 -match vid='I10550'; vdqm_admin -dedicate -dgn 3592B1 -drive 35921002 -server tpsrv203 -match vid='I10554' # Swapped VID ssh root@lxcastordev04 'echo -e "\n\n"`date`" : dedicate test tpsrv202=I10554 tpsrv203=I10550\n\n" >> /var/spool/vdqm/log'; vdqm_admin -dedicate -dgn 3592B1 -drive 35921001 -server tpsrv202 -match vid='I10554'; vdqm_admin -dedicate -dgn 3592B1 -drive 35921002 -server tpsrv203 -match vid='I10550' # Initial host ssh root@lxcastordev04 'echo -e "\n\n"`date`" : dedicate test tpsrv202=lxc2disk07 tpsrv203=lxc2disk08\n\n" >> /var/spool/vdqm/log'; vdqm_admin -dedicate -dgn 3592B1 -drive 35921001 -server tpsrv202 -match host=lxc2disk07; vdqm_admin -dedicate -dgn 3592B1 -drive 35921002 -server tpsrv203 -match host=lxc2disk08 # Swapped host ssh root@lxcastordev04 'echo -e "\n\n"`date`" : dedicate test tpsrv202=lxc2disk08 tpsrv203=lxc2disk07\n\n" >> /var/spool/vdqm/log'; vdqm_admin -dedicate -dgn 3592B1 -drive 35921001 -server tpsrv202 -match host=lxc2disk08; vdqm_admin -dedicate -dgn 3592B1 -drive 35921002 -server tpsrv203 -match host=lxc2disk07 # Delete all request so we can now launch a mixture of read and write requests ssh root@lxc2disk07 killall -2 tpread; ssh root@lxc2disk08 killall -2 tpread # Launch read and write requests unset DISPLAY; tapes="I10553 I10554"; i=0; while test $i -lt 10; do let i=i+1; for tape in $tapes; do cmd="ssh root@lxc2disk07 sudo -u stage tpread -V $tape -q 1 -FF /dev/null"; sh -c "$cmd &"; cmd="ssh root@lxc2disk08 sudo -u stage tpread -V $tape -q 1 -FF /dev/null"; sh -c "$cmd &"; cmd="ssh root@lxc2disk07 sudo -u stage tpwrite -V $tape -l aul -F F -q n1 /srv/castor/01/00/oneGFile"; sh -c "$cmd &"; cmd="ssh root@lxc2disk08 sudo -u stage tpwrite -V $tape -l aul -F F -q n1 /srv/castor/01/00/oneGFile"; sh -c "$cmd &"; done; done # Initial access modes ssh root@lxcastordev04 'echo -e "\n\n"`date`" : dedicate test tpsrv202=0 tpsrv203=1\n\n" >> /var/spool/vdqm/log'; vdqm_admin -dedicate -dgn 3592B1 -drive 35921001 -server tpsrv202 -match mode=0; vdqm_admin -dedicate -dgn 3592B1 -drive 35921002 -server tpsrv203 -match mode=1 # Swap access modes ssh root@lxcastordev04 'echo -e "\n\n"`date`" : dedicate test tpsrv202=1 tpsrv203=0\n\n" >> /var/spool/vdqm/log'; vdqm_admin -dedicate -dgn 3592B1 -drive 35921001 -server tpsrv202 -match mode=1; vdqm_admin -dedicate -dgn 3592B1 -drive 35921002 -server tpsrv203 -match mode=0 How to generate the database and conversion code ================================================ ssh murrayc3@seblap.cern.ch export CVS_RSH=ssh export CVSROOT=:ext:isscvs.cern.ch:/local/reps/castor cvs co CASTOR2 cd CASTOR2 # Use umbrello do modify the UML model # Note that DB classes should inherit from IObject and IPeristent umbrello codeGeneration/VDQM.xmi gencastor codeGeneration/VDQM.xmi mv castor/db/oracleSchema.sql castor/vdqm/ # scp files that were upgraded or created # Note that .h and CInt.cpp files can be ignored DEST=root@lxb8294:/usr/local/src/CASTOR2 scp castor/db/cnv/DbVolumePriorityCnv.hpp $DEST/castor/db/cnv scp castor/db/cnv/DbVolumePriorityCnv.cpp $DEST/castor/db/cnv scp castor/vdqm/oracleSchema.sql $DEST/castor/vdqm scp castor/vdqm/VolumePriority.hpp $DEST/castor/vdqm scp castor/vdqm/VolumePriority.cpp $DEST/castor/vdqm scp codeGeneration/VDQM.xmi $DEST/codeGeneration # Update the list of database identifier types in Constants.hpp by adding # any new object type ids to the end of the ObjectsIds enumeration and by # updating the macro OBJECT_IDS_NB appropriately. vi castor/Constants.hpp # Update the list of database identifier types in Constants.cpp by adding # any new object type id names to the end of the string array # castor::ObjectsIdStrings. vi castor/Constants.cpp How to force recalls ==================== cd $CASTOR_CVS gencastor codeGeneration/VDQM.xmi How to find the disabling of a given tape ========================================= Determine the library in which the tape is located: vmgrlisttape -V T06162 In this example the output of vmgrlisttape was: T06162 T06162 SL8500_1 500GC aul na48_new1 0B 20080327 FULL This means the library of tape T06162 was: SL8500_1 Determine the DGN of the library: vmgrlistdgnmap | grep SL8500_1 In this example the output of vmgrlistdgnmap was: T10KR1 T10000 SL8500_1 This means the DGN of the library was: T10KR1 Determine the tape servers within the DGN: showqueues -x | egrep "^D.*T10KR1" | awk '{print $3}' | sed 's/^[^@]*@//' | sort Scan the Lemon log file /var/log/edg-fmon-agent.log on each of the tape servers for tape disabled messages: tpsrvs=`showqueues -x | egrep "^D.*T10KR1" | awk '{print $3}' | sed 's/^[^@]*@//' | sort` for tpsrv in $tpsrvs; do ssh root@$tpsrv 'egrep -H -n "Tape.*has been disabled" /var/log/edg-fmon-agent.log' 2>&1 | sed "s/^/$tpsrv: /" | grep T06162 done How to delete a tape drive ========================== $CASTOR_CVS/vdqm/vdqm_admin -deldrv -dgn 3592B3 -server tpsrv248 -drive 35923002 $CASTOR_CVS/vdqm/vdqm_admin -deldrv -dgn 3592B3 -server tpsrv249 -drive 35923003 How to dedicate a tape drive ============================ $CASTOR_CVS/vdqm/vdqm_admin -dedicate -dgn 3592B3 -server tpsrv248 -drive 35923002 -match 'vid=I10489' $CASTOR_CVS/vdqm/vdqm_admin -dedicate -dgn 3592B3 -server tpsrv249 -drive 35923003 -match 'vid=I10489' How to measure network bandwidth ================================ #!/usr/bin/perl -w use strict; my $query = "cat /proc/net/dev | grep eth0"; my $dev_before = 0; my $dev_after = 0; my $delta_t = 5; my $bytes_before = 0; my $bytes_after = 0; my $delta_bytes = 0; my $bytes_per_sec = 0; my $m_bytes_per_sec = 0; my $date = ""; while(1) { $dev_before = `$query`; sleep($delta_t); $dev_after = `$query`; if($dev_before =~ m/eth0:(\d+)/) { $bytes_before = $1; if($dev_after =~ m/eth0:(\d+)/) { $bytes_after = $1; $delta_bytes = $bytes_after - $bytes_before; $bytes_per_sec = $delta_bytes / $delta_t; $m_bytes_per_sec = $bytes_per_sec / 1024 / 1024; $date = `date`; chomp($date); # print("$date: bytes_before = $bytes_before\n"); # print("$date: bytes_after = $bytes_after\n"); # print("$date: delta_bytes = $delta_bytes\n"); # print("$date: bytes_per_sec = $bytes_per_sec\n"); print("$date: m_bytes_per_sec = $m_bytes_per_sec\n"); } } } How to install iperf on a 64-bit PC with SLC 4 ============================================== rpm -ivh http://swrepsrv.cern.ch/swrep/x86_64_slc4/iperf-2.0.2-2.el4.rf.x86_64.rpm How to run iperf ================ SERVER: [root@tpsrv250 ~]# iperf -s -V -p 12345 CLIENT: [root@lxfsra1008 ~]# iperf -c tpsrv250 -V -p 12345 -f M -i 5 -t 60 How to test the performance of the disk responsible for /tmp ============================================================ [root@lxfsra1008 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda3 15G 3.6G 11G 26% / /dev/sda1 1012M 63M 898M 7% /boot none 1004M 0 1004M 0% /dev/shm /dev/sda2 15G 171M 14G 2% /tmp /dev/sda6 2.0G 83M 1.8G 5% /usr/vice/cache /dev/sda5 111G 757M 104G 1% /var AFS 8.6G 0 8.6G 0% /afs /dev/sdb1 1.5T 2.1G 1.5T 1% /srv/castor/01 /dev/sdc1 188G 2.1G 186G 2% /srv/castor/02 /dev/sdd1 188G 2.1G 186G 2% /srv/castor/03 [root@lxfsra1008 ~]# hdparm -t /dev/sda2 How to concatenate 100 x 100M disk files into a single 10G tape file ==================================================================== `echo -n "./tpwrite -V I10487 -F F -q 1"; i=0; while test $i -lt 100; do echo -n " lxfsra1008:/srv/castor/01/murrayc3/100M" ; let i=i+1; done; echo` How to list the disk servers of repack running with the itdc stager =================================================================== sqlplus castor_stager@c2itdcstgdb with pool as ( select id from diskpool where name = 'repack' ) select unique name from diskserver inner join filesystem on diskserver.id = filesystem.diskserver inner join pool on filesystem.diskpool=pool.id order by name; NAME -------------------------------------------------------------------------------- lxfsra1008.cern.ch lxfsrd1206.cern.ch lxfsrd1208.cern.ch lxfsre0905.cern.ch lxfsrk3901.cern.ch lxfsrk4106.cern.ch 6 rows selected. SQL> How to generate a series of random content files ================================================ #!/bin/sh MAX_BS=100000 SIZE=1 BS=1 COUNT=1 CMD="NONE" while test $SIZE -le 10000000000; do if test $SIZE -lt $MAX_BS; then let BS=$SIZE else let BS=$MAX_BS fi let COUNT=$SIZE/BS CMD="dd if=/dev/urandom of=$SIZE bs=$BS count=$COUNT" echo $CMD `$CMD` let SIZE=SIZE*10 done How to create a single 10G tape file using, 1000, 100, 10 small disk files ========================================================================== #!/usr/bin/perl -w use strict; my $srvs_file = "~murrayc3/public/castor/repack/disksrvs.txt"; my @disk_srvs = `cat $srvs_file`; my $disk_file_dir = "/srv/castor/01/murrayc3"; my $tape_file_size = 10000000000; my $tpwrite = "$ENV{'CASTOR_CVS'}/rtcopy/tpwrite"; my $vid = "I10487"; my $log_file = "results.log"; chomp(@disk_srvs); system("echo -n > $log_file"); my $disk_file_size = 10000000; my $nb_files = 0; my $file_index = 0; my $srv_index = 0; my $srv = ""; my $cmd = ""; while($disk_file_size <= $tape_file_size) { $nb_files = $tape_file_size / $disk_file_size; $file_index = 0; $srv_index = 0; $cmd = "$tpwrite -V $vid -F F -q 1"; while($file_index < $nb_files) { $srv = $disk_srvs[$srv_index]; $cmd = "$cmd $srv:$disk_file_dir/$disk_file_size"; $file_index = $file_index + 1; $srv_index = $srv_index + 1; if($srv_index == @disk_srvs) { $srv_index = 0; } } $cmd = "2>&1 $cmd | tee -a $log_file"; print("nb_files=$nb_files\n"); print("$cmd\n"); open(LOG, ">> $log_file") or die("Cannot open $log_file: $!\n"); print(LOG "$cmd\n"); close(LOG); system($cmd); $disk_file_size = $disk_file_size * 10; } How to configure rtcpd ====================== Add the following entries to /etc/castor/castor.conf: RTCOPYD BUFSZ 5242880 RTCOPYD MOUNT_TIME 900 RTCOPYD NB_BUFS 573 RTCOPYD SELF_MONITOR YES How to turn on the logging of rtcpd DEBUG messages ================================================== Add the following entries to /etc/castor/castor.conf: RTCOPY DEBUG 1 RTCOPY LOGLEVEL 7 How to turn on and off the compression of an IBM 3295 drive =========================================================== Turn compression on and check that it is: mt -f /dev/nst0 compression 1 # DCE = Data Compression Enable sdparm -q --get=DCE --long /dev/nst0 Turn compression off and check that it is: mt -f /dev/nst0 compression 0 # DCE = Data Compression Enable sdparm -q --get=DCE --long /dev/nst0 How to write to tape at full speed ================================== # Mount the tape smc -h ibmlib3rmc -m -D 5 -V I10487 # Turn off compression mt -f /dev/nst0 compression 0 # Check compression is off (DCE = Data Compression Enable) sdparm -q --get=DCE --long /dev/nst0 # Write 10G of data using the devices default block size (40960 * 262144 = 10G) dd if=/dev/zero of=/dev/nst0 bs=262144 count=40960 How to create a virtual file of any size independent of disk space ================================================================== Example 1: Console A: rm -f test; mkfifo test; dd if=/dev/zero of=test bs=65536 count=163840 Console B: time cat test > /dev/null Example 2: Console A: cd /var/tmp/murrayc3 rm -f test; mkfifo test; dd if=/dev/zero of=test bs=65536 count=163840 Console B: cd /var/tmp/murrayc3 rfcp test test_file Example output wuth final file size error: [lxcastordev04] /var/tmp/murrayc3 > rfcp test test_file 10737418240 bytes in 275 seconds through local (in) and local (out) (38130 KB/sec) 10737418240 bytes in remote file System error : got 10737418240 bytes instead of 0 bytes How to add a new project to the CASTOR make system ================================================== Assuming the following: * One wants to build a new project located at CASTOR2/castor/tape/aggregator and CASTOR2/castor/tape exists and has an Imakefile already * One doesn't want to build any manual pages 1. Add the new directory to the SUBDIRS macros of CASTOR2/castor/tape/Imakefile: aggregator 2. Make sure that the new code is compiled only when needed, that is use ClientProgramTarget in case of executables that should be build when only the client part is built and TapeProgramTarget for executables that should be built when only the tape part is built How to label a tape =================== Logon to a tape server and run tplable, please note that the -f option forces the labelling to take place if the tape already has data on it: [root@tpsrv250 ~]# tplabel -D 35923005 -d 700GC -g 3592B3 -l aul -V I10487 -v I10487 -f How to run valgrind with the vdqmd ======================================= valgrind -v --leak-check=full --log-file=vdqm_valgrind.log --suppressions=/afs/cern.ch/user/m/murrayc3/public/castor/valgrind/vdqm2.supp /usr/local/src/CASTOR2/castor/vdqm/vdqmd -f -c /root/castor.conf valgrind -v --leak-check=full --log-file=vdqm_valgrind.log --suppressions=/afs/cern.ch/user/m/murrayc3/public/castor/valgrind/vdqm2.supp vdqmd -f -c /root/castor.conf How to spot memory leaks ======================== Look at the lost blocks counts after grepping the valgrind log for them: grep -n "lost in" vdqm_valgrind.log.15729 How to print a memory leak with a thread-safe time stamp ======================================================== if(ptr_tapeDrive->tape() != NULL) { time_t ltime; char buf[50]; time(<ime); std::cout << "LEAK!!!!!: " << ctime_r(<ime, buf); } How to create a valgrind supressions file for the VDQM2 ======================================================= Use a here-document to create the suppressions file: cat << 'HERE' > vdqm2.supp.test { vdqm1 Memcheck:Param write(buf) obj:/lib64/tls/libpthread-2.3.4.so fun:snttwrite fun:nttwr fun:nsntwrn fun:nspsend fun:nsdofls fun:nsdo fun:nsdosend fun:nioqrc fun:ttcdrv fun:nioqwa fun:upirtrc } { Oracle's library, use of uninitialized value Memcheck:Value4 obj:*/libclntsh.so* } { Oracle's library, use of uninitialized value size 8 in clntsh Memcheck:Value8 obj:*/libclntsh.so* } { Oracle's library, use of uninitialized value size 8 in nnz10 Memcheck:Value8 obj:*/libnnz10.so* } { Oracle's library, jump depending on uninitialized value in clntsh Memcheck:Cond obj:*/libclntsh.so* } { Oracle's library, jump depending on uninitialized value in nnz10 Memcheck:Cond obj:*/libnnz10.so* } { Oracle's library, Invalid read of size 8 in nnz10 Memcheck:Addr8 obj:*libnnz10.so } { Oracle's call to times Memcheck:Param times(buf) fun:times fun:kghinp } { Oracle's call to pthread core:PThread fun:pthread_error fun:pthread_mutex_destroy fun:snsbittrm_ts } { Oracle's call to pthread(2) core:PThread fun:pthread_error fun:pthread_mutex_destroy fun:sltsmxd } { Oracle's call to free Memcheck:Free obj:*/libclntsh.so* } { Oracle's mismatch free() / delete / delete Memcheck:Free fun:_ZdlPv fun:_ZN6oracle4occi14ConnectionImplD0Ev fun:_ZN6oracle4occi15EnvironmentImpl19terminateConnectionEPNS0_10ConnectionE } { libc Invalid free() / delete / delete[] Memcheck:Free fun:free obj:/lib/tls/libc-2.3.2.so fun:__libc_freeres fun:_vgw__freeres } { Oracle leak when creating conneciton Memcheck:Leak fun:malloc obj:/lib/tls/libc-2.3.2.so fun:__nss_database_lookup } HERE How to test an aggregator VDQM message ====================================== Use a here-document to create the Makefile: cat << 'HERE' > Makefile CASTOR_SRC = /usr/local/src/CASTOR2 MAJOR_CASTOR_VERSION = __MAJOR_CASTOR_VERSION__ MINOR_CASTOR_VERSION = __MINOR_CASTOR_VERSION__ THREADFLAGS = -pthread -DCTHREAD_POSIX -D_THREAD_SAFE -D_REENTRANT CPPFLAGS = -g -pedantic -Wall -Werror -Wno-long-long default: testaggregator testaggregator: testaggregator.o g++ $(CPPFLAGS) $(THREADFLAGS) -o $@ $^ -L$(CASTOR_SRC)/shlib -lshift testaggregator.o: testaggregator.cpp g++ $(CPPFLAGS) $(THREADFLAGS) -o $@ $^ -c -I /usr/local/src/CASTOR2/h clean: rm -f testaggregator testaggregator.o HERE Use a here-document to create the source file: cat << 'HERE' > testaggregator.cpp #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include int main(int argc, char **argv) { int rc, sock, port, VolReqID; socklen_t len = 0; char *dgn, *VID; struct sockaddr_in sin ; /* Internet address */ if ( argc < 3 ) { fprintf(stderr,"Usage: %s dgn vid\n",argv[0]); exit(2); } dgn = argv[1]; VID = argv[2]; sock = socket(AF_INET,SOCK_STREAM,0); if ( sock == -1 ) { fprintf(stderr,"socket(): %s\n",strerror(errno)); exit(1); } /* * Bind to an arbitrary port */ port = 0; sin.sin_addr.s_addr = htonl(INADDR_ANY); sin.sin_family = AF_INET; sin.sin_port = htons((u_short)port); rc = bind(sock, (struct sockaddr *)&sin, sizeof(sin)); if ( rc == -1 ) { fprintf(stderr,"bind(): %s\n",strerror(errno)); exit(1); } rc = listen(sock,100); if ( rc == -1 ) { fprintf(stderr,"listen(): %s\n",strerror(errno)); exit(1); } /* * Get the port number */ len = sizeof(sin); rc = getsockname(sock,(struct sockaddr *)&sin,&len); if ( rc == -1 ) { fprintf(stderr,"getsockname(): %s\n",sstrerror(errno)); exit(1); } port = ntohs(sin.sin_port); /* * Send the request to read a tape */ rc = vdqm_SendAggregatorVolReq(NULL,&VolReqID,VID,dgn, NULL,NULL, WRITE_DISABLE,port); if ( rc == -1 ) { fprintf(stderr,"vdqm_SendAggregatorVolReq(): %s\n",sstrerror(serrno)); exit(1); } /* * Wait for connection from a tape server */ len = sizeof(sin); rc = accept(sock,(struct sockaddr *)&sin,&len); /* * Start the tape request ... */ exit(0); } HERE How to use the testaggregator command ===================================== Login in as root on your development PC and run the testaggregator shell script: ssh root@lxcastordev04 ~murrayc3/public/castor/test_aggregator/testaggregator.sh How to list a users database jobs ================================= select job_name from user_scheduler_jobs order by job_name; How to replace (delete then create) a database job ================================================== /** * Table used to monitor the progress of the drive scheduler job. */ CREATE TABLE SchedulerMonitor( nbJobCalls INTEGER CONSTRAINT NN_SchedulerMonitor_nbJobCall NOT NULL, nbSchedulerCalls INTEGER CONSTRAINT NN_SchedulerMonitor_nbSchdCall NOT NULL, nbAllocations INTEGER CONSTRAINT NN_SchedulerMonitor_nbAlloc NOT NULL, nbNoMatches INTEGER CONSTRAINT NN_SchedulerMonitor_nbNoMatch NOT NULL, nbConflicts INTEGER CONSTRAINT NN_SchedulerMonitor_nbConflict NOT NULL, nbExceptions INTEGER CONSTRAINT NN_SchedulerMonitor_nbExcept NOT NULL ); /** * Set the intial value of all the counters in the drive scheduler monitor * table to 0. */ INSERT INTO SchedulerMonitor(nbJobCalls, nbSchedulerCalls, nbAllocations, nbNoMatches, nbConflicts, nbExceptions) VALUES(0, 0, 0, 0, 0, 0); COMMIT; /** * Create the drive scheduler job. */ BEGIN BEGIN DBMS_SCHEDULER.DROP_JOB('scheduler', TRUE); EXCEPTION WHEN OTHERS THEN NULL; END; DBMS_SCHEDULER.CREATE_JOB ( JOB_NAME => 'scheduler', JOB_TYPE => 'PLSQL_BLOCK', JOB_ACTION => ' DECLARE returnVar NUMBER; tapeDriveIdVar NUMBER; tapeDriveNameVar VARCHAR2(255); tapeRequestIdVar NUMBER; tapeRequestVidVar VARCHAR2(255); BEGIN UPDATE SchedulerMonitor SET nbJobCalls = nbJobCalls + 1; LOOP UPDATE SchedulerMonitor SET nbSchedulerCalls = nbSchedulerCalls + 1; castorVdqm.allocateDrive( returnVar, tapeDriveIdVar, tapeDriveNameVar, tapeRequestIdVar, tapeRequestVidVar); COMMIT; IF returnVar = 1 THEN UPDATE SchedulerMonitor SET nbAllocations = nbAllocations + 1; ELSIF returnVar = 0 THEN UPDATE SchedulerMonitor SET nbNoMatches = nbNoMatches + 1; ELSIF returnVar = -1 THEN UPDATE SchedulerMonitor SET nbConflicts = nbConflicts + 1; END IF; EXIT WHEN returnVar = 0; END LOOP; EXCEPTION WHEN OTHERS THEN UPDATE SchedulerMonitor SET nbExceptions = nbExceptions + 1; END;', START_DATE => SYSDATE, REPEAT_INTERVAL => 'SYSDATE + 1/24/60/12', -- 5 seconds ENABLED => TRUE, COMMENTS => 'Allocates free drives to volume requests'); END; /** * Table used to monitor the progress of the volume priority cleaner job. */ CREATE TABLE PrioCleanerMonitor( nbJobCalls INTEGER CONSTRAINT NN_PrioCleanerMon_nbJobCall NOT NULL, nbDeletes INTEGER CONSTRAINT NN_PrioCleanerMon_nbDelete NOT NULL, nbExceptions INTEGER CONSTRAINT NN_PrioCleanerMon_nbExcept NOT NULL ); /** * Set the intial value of all the counters in the volume priority cleaner * monitor table to 0. */ INSERT INTO PrioCleanerMonitor(nbJobCalls, nbDeletes, nbExceptions) VALUES(0, 0, 0); COMMIT; /** * Create the volume priority cleaner job. */ BEGIN BEGIN DBMS_SCHEDULER.DROP_JOB('priocleaner', TRUE); EXCEPTION WHEN OTHERS THEN NULL; END; DBMS_SCHEDULER.CREATE_JOB ( JOB_NAME => 'priocleaner', JOB_TYPE => 'PLSQL_BLOCK', JOB_ACTION => ' DECLARE maxAgeVar NUMBER := 86400; prioritiesDeletedVar NUMBER := 0; BEGIN UPDATE PrioCleanerMonitor SET nbJobCalls = nbJobCalls + 1; castorVdqm.deleteOldVolPriorities(maxAgeVar, prioritiesDeletedVar); IF prioritiesDeletedVar > 0 THEN UPDATE PrioCleanerMonitor SET nbDeletes = nbDeletes + prioritiesDeletedVar; END IF; EXCEPTION WHEN OTHERS THEN UPDATE PrioCleanerMonitor SET nbExceptions = nbExceptions + 1; END;', START_DATE => SYSDATE, REPEAT_INTERVAL => 'SYSDATE + 1/24', -- 1 hour ENABLED => TRUE, COMMENTS => 'Deletes old volume priorities'); END; How to determine which castor servers/daemons listen on which ports =================================================================== find -name "*.h*" -exec grep -H PORT {} \; | grep define | egrep "[0-9]" | egrep -v 'HPP|MAX|IMPORT|EXPORT|VALID_PORT|EMONBASEOFF' > ports.txt ./castor/client/BaseClient.hpp:#define CSP_RHSERVER_PORT 9002 ./castor/client/BaseClient.hpp:#define CSP_RHSERVER_SEC_PORT 9007 ./castor/rh/Server.hpp:#define CSP_RHSERVER_PORT 9002 ./castor/rh/Server.hpp:#define CSP_RHSERVER_SEC_PORT 9007 ./castor/rh/Server.hpp:#define CSP_NOTIFICATION_PORT 9001 ./castor/Constants.hpp:#define RHSERVER_PORT 9002 ./h/vdqm_constants.h:#define SVDQM_PORT (5512) ./h/vdqm_constants.h:#define VDQM_PORT (5012) ./h/stage_constants.h:#define STAGE_PORT 5007 ./h/tape_aggregator_constants.h:#define TAPE_AGGREGATOR_PORT (5070) ./h/Cns_constants.h:#define CNS_SEC_PORT 5510 ./h/Cns_constants.h:#define CNS_PORT 5010 ./h/rmc_constants.h:#define RMC_PORT 5014 ./h/stager_client_api_common.hpp:#define DEFAULT_PORT 9002 ./h/stager_client_api_common.hpp:#define DEFAULT_SEC_PORT 9007 ./h/stager_constants.h:#define STAGER_DEFAULT_SECURE_PORT 5515 /* Default secure port number */ ./h/stager_constants.h:#define STAGER_DEFAULT_PORT 5015 /* Default port number */ ./h/stager_constants.h:#define STAGER_DEFAULT_NOTIFY_PORT 55015 /* Default notify port number */ ./h/expert_constants.h:#define EXPERT_PORT 5045 ./h/rfio_constants.h:#define RFIO_PORT 5001 ./h/rfio_constants.h:#define SRFIO_PORT 5501 ./h/Cupv_constants.h:#define SCUPV_PORT 5520 ./h/Cupv_constants.h:#define CUPV_PORT 56013 ./h/vmgr_constants.h:#define SVMGR_PORT 5513 ./h/vmgr_constants.h:#define VMGR_PORT 5013 ./h/rtcpcld_constants.h:#define RTCPCLD_NOTIFY_PORT (5050) /* rtcpclientd notification (UDP) port */ ./dlf/server.h:#define DEFAULT_SERVER_PORT 5036 /**< the default port to listen on */ How to restart the aggregatord daemons quickly ============================================== stopallaggregators.sh; make -C /usr/local/src/CASTOR2/castor/tape/aggregator; packageaggregator.sh; waitallaggregatordsstopped.sh; distributeaggregator.sh; restartaggregator.sh How to restart the aggregatordaemons completely =============================================== stopallaggregators.sh; make -C /usr/local/src/CASTOR2/castor/tape/aggregator; packageaggregator.sh; waitallaggregatordsstopped.sh; distributeallaggregator.sh; restartaggregator.sh How to use the strace to monitor network access =============================================== Original command: sudo -u stage tpread -V I10547 -q 1 -FF lxc2disk07:/dev/null Command with strace: sudo -u stage strace -e trace=network -o /tmp/thetrace.txt tpread -V I10547 -q 1 -FF lxc2disk07:/dev/null How to determine which process is using which TCP/IP port ========================================================= [root@lxcastordev04 vdqm]# lsof -i | grep 5013 vmgrdaemo 4628 root 2u IPv4 7624051 TCP *:5013 (LISTEN) How to enable access to RFIOD /dev on a disk server =================================================== Add or modify the following line in /etc/castor/castor.conf to include /dev RFIOD PathWhiteList /tmp/murrayc3 /dev How to move the VDQM state of a tape drive from UNKNOWN to FREE =============================================================== ssh tpsrv202 tpmaint -stop; ssh tpsrv202 tpmaint -start How to install dot from the graphviz package on SLC4 ==================================================== rpm -h -i http://swrepsrv.cern.ch/swrep/x86_64_slc4/graphviz-2.6-1.fc4.x86_64.rpm How to install UMLGraph sequence.pic ==================================== # Install UMLGraphg UMLGRAPH_DIR=/usr/UMLGraph mkdir $UMLGRAPH_DIR cd $UMLGRAPH_DIR wget 'http://www.umlgraph.org/UMLGraph-5.2.tar.gz' tar -xvzf UMLGraph-5.2.tar.gz # Create a symbolic link to the UMLGraph pic2plot macros file for creating UML # sequence diagrams CASTOR_CVS=/usr/local/src/CASTOR2 ln -s $UMLGRAPH_DIR/UMLGraph-5.2/src/sequence.pic $CASTOR_CVS/castor/tape/doc/sequence_diagrams/sequence.pic How to create a sequence diagram ================================ Install the pic2plot command of the GNU plotutils package from http://www.gnu.org/software/plotutils Install UMLGraph sequence.pic from http://www.umlgraph.org, for example: Create a sequence diagram pic2plot source file, for example: CASTOR_CVS=/usr/local/src/CASTOR2 cd $CASTOR_CVS/castor/tape/doc/sequence_diagrams cat << HERE > test.pic .PS copy "sequence.pic"; # Define the objects object(V,"vdqmd"); object(A,"aggregatord"); object(R,"rtcpd"); step(); # Message sequences active(V); message(V,A,"connect"); active(A); message(V,A,"RcpJob"); message(A,R,"connect"); active(R); message(A,R,"RcpJob"); message(R,A,"RcpJobReply"); message(A,V,"RcpJobReply"); inactive(A); inactive(V); message(R,A,"connect"); active(A); message(A,R,"RtcpTapeRequest"); message(R,A,"RtcpAcknowledge"); message(R,A,"RtcpTapeRequest"); message(A,R,"RtcpAcknowledge"); complete(V); complete(A); complete(R); .PE HERE Generate the sequence diagram, for example: pic2plot -T ps test.pic > out.ps ; gv -scale=4 out.ps How to create a finite state transition network =============================================== cat << HERE > vdqm_drive.dot digraph finite_state_machine { rankdir=LR; size="8,5" node [shape = doublecircle]; START; node [shape = circle]; START -> FREE [ label = "drive up" ]; FREE -> RUNNING [ label = "drive assigned" ]; RUNNING -> RELEASE [ label = "drive released" ]; RELEASE -> FREE [ label = "drive up" ]; FREE -> DOWN [ label = "drive down" ]; DOWN -> FREE [ label = "drive up" ]; DOWN -> DOWN [ label = "job submission succeeded" ]; DOWN -> UNKNOWN [ label = "job submission failed" ]; UNKNOWN -> FREE [ label = "drive up" ]; } HERE dot -T ps vdqm_drive.dot -o out.ps ; gv out.ps How to see the doxygen output from processing the CASTOR source code ==================================================================== Open the following web page: https://savannah.cern.ch/files/?group=castor Click on the CASTOR2 link of the version to be displayed Then click on doc/ Then click on html/ How to run a locally compiled version of RTCPD ============================================== cat << HERE > rtcpd.sh #!/bin/sh CASTOR_CVS=/root/rtcpd/CASTOR2 export PATH=$CASTOR_CVS/rtcopy export LD_LIBRARY_PATH="\ $CASTOR_CVS/common:\ $CASTOR_CVS/castor:\ $CASTOR_CVS/castor/db/cnv:\ $CASTOR_CVS/dlf:\ $CASTOR_CVS/expert:\ $CASTOR_CVS/ns:\ $CASTOR_CVS/rfio:\ $CASTOR_CVS/security:castor/db/newora:\ $CASTOR_CVS/tape:\ $CASTOR_CVS/upv:\ $CASTOR_CVS/vdqm:\ $CASTOR_CVS/vmgr\ " $CASTOR_CVS/rtcopy/rtcpd HERE How to list the file names that are within a binary's the debug information =========================================================================== Try: objdump -t -j .debug_line ./rtcpd | egrep '\.c$' | awk '{print $6;}' Or: [root@tpsrv202 rtcopy]# readelf -wl libcastorrtcopy.so | grep '\.c' 1 0 0 0 rtcpc_SendRecv.c 1 0 0 0 rtcp_InitNW.c 1 0 0 0 rtcpc_log.c 1 0 0 0 rtcpc_CallVMGR.c 1 0 0 0 rtcp_Listen.c 1 0 0 0 rtcpapi.c 1 0 0 0 rtcpc_BuildReq.c 1 0 0 0 rtcp_RetvalSHIFT.c 1 0 0 0 rtcpc_CheckReq.c [root@tpsrv202 vdqm]# readelf -wl libcastorvdqm.so | grep '\.c' 1 0 0 0 vdqmapi.c 1 0 0 0 vdqmc_SendRecv.c Strange: [root@tpsrv202 rtcopy]# objdump -t -j .debug_line ./rtcpd | egrep '\.c$' | awk '{print $6;}' | egrep '_CallVMGR\.c|_CheckReq\.c|_SendRecv\.c|_log\.c' rtcp_CallVMGR.c rtcp_CheckReq.c rtcp_SendRecv.c rtcp_log.c [root@tpsrv202 rtcopy]# objdump -t -j .debug_line ./libcastorrtcopy.so | egrep '\.c$' | awk '{print $6;}' | egrep '_CallVMGR\.c|_CheckReq\.c|_SendRecv\.c|_log\.c' rtcpc_SendRecv.c rtcpc_log.c rtcpc_CallVMGR.c rtcpc_CheckReq.c How to give root the right to vdqm_admin on the CASTOR common services server ============================================================================= Cupvadd --user root --group root --src '^castorsrv[1-2]0[1-3]$' --tgt '^castorsrv[1-2]0[1-3]$' --priv ADMIN How to generate and migrate a test file ======================================= Create an executable script with the following contents and then run it: ------------------------ START OF SCRIPT ------------------------------------ #!/bin/sh if test e$STAGE_HOST = e; then echo -n "ERROR: " echo "The environment variable STAGE_HOST is not set" exit -1 fi if test e$TSTMIG_USER = e; then echo -n "ERROR: " echo "The environment variable TSTMIG_USER is not set" exit -1 fi export RFIO_USE_CASTOR_V2=YES LETTER_DIR=`echo $TSTMIG_USER | cut -c -1` SOURCEFILENAME=test_`date | sed 's/ /_/g;s/:/_/g'`.txt SOURCEPATH=/tmp/$SOURCEFILENAME DESTINATIONPATH=/castor/cern.ch/user/$LETTER_DIR/$TSTMIG_USER/$SOURCEFILENAME echo "Creating source file $SOURCEPATH as $TSTMIG_USER" sudo -u $TSTMIG_USER echo "Filename: $SOURCEPATH" > $SOURCEPATH echo "Migrating $SOURCEPATH to $DESTINATIONPATH as $TSTMIG_USER" sudo -u $TSTMIG_USER rfcp $SOURCEPATH $DESTINATIONPATH CMD="stager_qry -M $DESTINATIONPATH" echo $CMD $CMD CMD="/etc/init.d/mighunter restart default" echo $CMD $CMD -------------------------- END OF SCRIPT ------------------------------------ How to setup a tape server for VMGR, VDQM and aggregatord testing ================================================================= Make sure the following three lines are in /etc/castor/castor.conf: VDQM HOST lxcastordev04 VMGR HOST lxcastordev04 aggregatord LOGSTANDARD file:///var/spool/tape/aggregatord/log x-dlf://lxcastordev04 How to setup a central services server for VMGR testing ======================================================= Make sure the following file exists with one line giving the Oracle conenction string: /etc/castor/VMGRCONFIG How to compile GNU plotutils on SLC4 ==================================== # Download and install GNU plotutils PLOTUTILS_DIR=/usr/plotutils mkdir $PLOTUTILS_DIR cd $PLOTUTILS_DIR wget ftp://sunsite.cnlab-switch.ch/mirror/gnu/plotutils/plotutils-2.5.tar.gz tar -xvzf plotutils-2.5.tar.gz cd plotutils-2.5 ./configure --enable-libplotter make # Create a symbolic link in /usr/bin to pic2plot ln -s $PLOTUTILS_DIR/plotutils-2.5/pic2plot/pic2plot /usr/bin/pic2plot How to install and run an aggregator for development on a tape server ===================================================================== Create a new file called reinstallAggregator.sh with your favourite text editor and enter the following code. Once finished make sure the file is executable and then run it. The code: #!/bin/sh NB_ARGS=$# # Function to print usage message function usage { echo echo "Usage: reinstallAggregator.sh tapeServer [tapeServer]..." } # Check the number of command line arguments if test $NB_ARGS -lt 1; then echo echo -n "ERROR: " echo "Invalid number of arguments" usage echo exit -1 fi TPSRVS=$@ if test "x$CASTOR_CVS" = x; then echo "Error: The environment variable CASTOR_CVS is not set" echo echo "CASTOR_CVS should be the full path to the CASTOR source code up to and" echo "including CASTOR2, e.g." echo echo " export CASTOR_CVS=/usr/local/src/CASTOR2" echo exit -1 fi cat > /tmp/aggregatord.sh << HERE #!/bin/sh export LD_LIBRARY_PATH=/aggregator ulimit -c unlimited /aggregator/aggregatord HERE chmod +x /tmp/aggregatord.sh for TPSRV in $TPSRVS; do echo echo "Reinstalling $TPSRV" echo ssh $TPSRV killall aggregatord ssh $TPSRV rm -rf /aggregator ssh $TPSRV mkdir /aggregator scp `find $CASTOR_CVS -name '*.so*' | xargs echo` $TPSRV:/aggregator scp $CASTOR_CVS/castor/tape/aggregator/aggregatord $TPSRV:/aggregator scp /tmp/aggregatord.sh $TPSRV:/aggregator ssh $TPSRV /aggregator/aggregatord.sh ssh $TPSRV ps -ef | grep aggregatord | sed "s/^/$TPSRV: /" done rm /tmp/aggregatord.sh How to install an rtcpd for development on a tape server ======================================================== Create a new file called reinstallRtcpd.sh with your favourite text editor and enter the following code. Once finished make sure the file is executable and then run it. The code: NB_ARGS=$# # Function to print usage message function usage { echo echo "Usage: reinstallRtcpd.sh tapeServer" } # Check the number of command line arguments if test $NB_ARGS -ne 1; then echo echo -n "ERROR: " echo "Invalid number of arguments" usage echo exit -1 fi TPSRV=$1 if test "x$CASTOR_CVS" = x; then echo "Error: The environment variable CASTOR_CVS is not set" echo echo "CASTOR_CVS should be the full path to the CASTOR source code up to and" echo "including CASTOR2, e.g." echo echo " export CASTOR_CVS=/usr/local/src/CASTOR2" echo exit -1 fi ssh $TPSRV killall rtcpd ssh $TPSRV rm -rf /rtcpd ssh $TPSRV mkdir /rtcpd scp `find $CASTOR_CVS -name '*.so*' | xargs echo` $TPSRV:/rtcpd scp $CASTOR_CVS/rtcopy/rtcpd $TPSRV:/rtcpd cat > /tmp/rtcpd.sh << HERE #!/bin/sh export LD_LIBRARY_PATH=/rtcpd ulimit -c unlimited /rtcpd/rtcpd HERE chmod +x /tmp/rtcpd.sh scp /tmp/rtcpd.sh $TPSRV:/rtcpd ssh $TPSRV /rtcpd/rtcpd.sh ssh $TPSRV ps -ef | grep rtcpd rm /tmp/rtcpd.sh How to re-label and re-populate a tape with one thousand 100MB files ==================================================================== [root@lxcastordev04 tpcp]# vmgrmodifytape -V I10554 --st FULL [root@lxcastordev04 tpcp]# reclaim -V I10554 [root@lxcastordev04 tpcp]# ssh tpsrv202 tplabel -D 35921001 -d 1000GC -g 359B1A -l aul -V I10554 -v I10554 -f ******************************************************************************* * http://cern.ch/ComputingRules : Govern the use of CERN computing facilities * * ******************************************************************************* tplabel: uid=0 gid=0 vid=I10554 side=0 dgn=359B1A den=1000GC vsn=I10554 lbltype=aul mounttape: TP062 - tape I10554 to be prelabelled has vsn I10554 tplabel: tape labeled. [root@lxcastordev04 tpcp]# ssh lxc2disk07 dd if=/dev/urandom of=/srv/castor/01/00/f bs=1M count=100 [root@lxcastordev04 tpcp]# `(NB_FILES=1000; echo -n "sudo -u stage tpwrite -V I10554 -l aul -F F -q n$NB_FILES"; x=0; while test $x -lt $NB_FILES; do echo -n " lxc2disk07:/srv/castor/01/00/f"; let x=x+1; done; echo)` Jun 27 15:20:15 tpwrite[12164]: selecting tape server ... Jun 27 15:20:15 tpwrite[12164]: * tpsrv202 is a possible tape server. Jun 27 15:20:15 tpwrite[12164]: ! selected tape server is tpsrv202. How to trace the RTCP messages sent and received by tpread, tpwrite and dumptape ================================================================================ With your favourite text editor create an executable shell script with the name trace_rtcp.sh and enter the following contents. Once created, run the script with an entire tpread, tpwrite or dumptape command as the script's command-line arguments. For example "trace_rtcp.sh tpwrite -V I02025 -l aul -F F -q n3 lxc2disk07:/srv/castor/01/00/test" or "trace_rtcp.sh dumptape -V I02025". The script will create a trace file with a name of the form "/tmp/trace_PROGRAM.DATE.output", where PROGRAM is either tpread, tpwrite or dumptape and DATE is the current date accurate to the nearest second. Please note that the script relies on the fact that rtcpc_SendRecv.c:400 is the last line of the RTCP client function rtcp_Transfer(). #!/bin/sh PARAMS=$@ NB_PARAMS=$# if test "x$CASTOR_CVS" = x; then echo "Error: The environment variable CASTOR_CVS is not set" echo echo "CASTOR_CVS should be the full path to the CASTOR source code up to and" echo "including CASTOR2, e.g." echo echo " export CASTOR_CVS=/usr/local/src/CASTOR2" echo exit -1 fi if test ! -d $CASTOR_CVS; then echo "Error: The directory specified by CASTOR_CVS does not exist" echo echo "CASTOR_CVS=\"$CASTOR_CVS\"" echo exit -1 fi if test $NB_PARAMS -lt 1; then echo "Error: Wrong number of comand-line arguments" echo echo "There should at least be 1 corresponding to tpread, tpwrite or dumptape" echo exit -1 fi if test $1 != "tpread" -a $1 != "tpwrite" -a $1 != "dumptape"; then echo "Error: Invalid first command-line argument" echo echo "The first command-line argument should be either tpread, tpwrite or" echo "dumptape" echo exit -1 fi RTCP_PROGRAM="$CASTOR_CVS/rtcopy/$1" if test ! -x $RTCP_PROGRAM; then echo "Error: The RTCP program does not exist" echo echo "RTCP_PROGRAM=\"$RTCP_PROGRAM\"" echo exit -1 fi DATE=`date | sed 's/ /_/g;s/:/_/g'` GDB_COMMAND_FILE=/tmp/trace_$1.gdb_command_file GDB_LOG_FILE="/tmp/trace_$1.$DATE.output" echo "=======================================================" > $GDB_LOG_FILE echo "Running" >> $GDB_LOG_FILE echo "$PARAMS" >> $GDB_LOG_FILE echo "=======================================================" >> $GDB_LOG_FILE echo >> $GDB_LOG_FILE chmod 666 $GDB_LOG_FILE cat > $GDB_COMMAND_FILE << HERE set height 0 set breakpoint pending on set follow-fork-mode child set logging file $GDB_LOG_FILE set logging redirect on set logging on directory $CASTOR_CVS/rtcopy b rtcpc_SendRecv.c:400 commands silent if hdr != 0 && filereq != 0 && (hdr->reqtype == 0x2102 || hdr->reqtype == 0x2105) printf "\n" printf "\n" printf "==================================\n" if whereto == SendTo printf "Sending " else printf "Receiving " end if hdr->reqtype == 0x2102 printf "RTCP_FILE_REQ message\n" else printf "RTCP_FILEERR_REQ message\n" end printf "==================================\n" printf "\n" printf "file_path=%s\n" , filereq->file_path printf "tape_path=%s\n" , filereq->tape_path printf "recfm=%s\n" , filereq->recfm printf "fid=%s\n" , filereq->fid printf "ifce=%s\n" , filereq->ifce printf "stageID=%s\n" , filereq->stageID printf "VolReqID=%d\n" , filereq->VolReqID printf "jobID=%d\n" , filereq->jobID printf "stageSubreqID=%d\n" , filereq->stageSubreqID printf "umask=%d\n" , filereq->umask printf "position_method=%d ", filereq->position_method if filereq->position_method == 0 printf "TPPOSIT_FSEQ\n" else if filereq->position_method == 1 printf "TPPOSIT_FID\n" else if filereq->position_method == 2 printf "TPPOSIT_EOI\n" else if filereq->position_method == 3 printf "TPPOSIT_BLKID\n" else printf "UNKNOWN\n" end end end end printf "tape_fseq=%d\n" , filereq->tape_fseq printf "disk_fseq=%d\n" , filereq->disk_fseq printf "blocksize=%d\n" , filereq->blocksize printf "recordlength=%d\n" , filereq->recordlength printf "retention=%d\n" , filereq->retention printf "def_alloc=%d\n" , filereq->def_alloc printf "rtcp_err_action=%d\n", filereq->rtcp_err_action printf "tp_err_action=%d\n" , filereq->tp_err_action printf "convert=%d\n" , filereq->convert printf "check_fid=%d\n" , filereq->check_fid printf "concat=%d\n" , filereq->concat printf "proc_status=%d " , filereq->proc_status if filereq->proc_status == 0x1 printf "RTCP_WAITING\n" else if filereq->proc_status == 0x2 printf "RTCP_POSITIONED\n" else if filereq->proc_status == 0x3 printf "RTCP_PARTIALLY_FINISHED\n" else if filereq->proc_status == 0x4 printf "RTCP_FINISHED\n" else if filereq->proc_status == 0x5 printf "RTCP_EOV_HIT\n" else if filereq->proc_status == 0x6 printf "RTCP_UNREACHABLE\n" else if filereq->proc_status == 0x7 printf "RTCP_REQUEST_MORE_WORK\n" else printf "UNKNOWN\n" end end end end end end end printf "cprc=%d\n" , filereq->cprc printf "TStartPosition=%d\n" , filereq->TStartPosition printf "TEndPosition=%d\n" , filereq->TEndPosition printf "TStartTransferDisk=%d\n", filereq->TStartTransferDisk printf "TEndTransferDisk=%d\n" , filereq->TEndTransferDisk printf "TStartTransferTape=%d\n", filereq->TStartTransferTape printf "TEndTransferTape=%d\n" , filereq->TEndTransferTape printf "blockid[0]=%d\n" , filereq->blockid[0] printf "blockid[1]=%d\n" , filereq->blockid[1] printf "blockid[2]=%d\n" , filereq->blockid[2] printf "blockid[3]=%d\n" , filereq->blockid[3] printf "offset=%ld\n" , filereq->offset printf "bytes_in=%ld\n" , filereq->bytes_in printf "bytes_out=%ld\n" , filereq->bytes_out printf "host_bytes=%ld\n" , filereq->host_bytes printf "nbrecs=%ld\n" , filereq->nbrecs printf "maxnbrec=%ld\n" , filereq->maxnbrec printf "maxsize=%ld\n" , filereq->maxsize printf "startsize=%ld\n" , filereq->startsize printf "castorSegAttr.nameServerHostName=%s\n", filereq->castorSegAttr.nameServerHostName printf "castorSegAttr.segmCksumAlgorithm=%s\n", filereq->castorSegAttr.segmCksumAlgorithm printf "castorSegAttr.segmCksum=%d\n" , filereq->castorSegAttr.segmCksum printf "castorSegAttr.castorFileId=%ld\n" , filereq->castorSegAttr.castorFileId printf "stgReqId->time_low=%d\n" , filereq->stgReqId.time_low printf "stgReqId->time_mid=%d\n" , filereq->stgReqId.time_mid printf "stgReqId->time_hi_and_version=%d\n" , filereq->stgReqId.time_hi_and_version printf "stgReqId->clock_seq_hi_and_reserved=%d\n", filereq->stgReqId.clock_seq_hi_and_reserved printf "stgReqId->clock_seq_low=%d\n" , filereq->stgReqId.clock_seq_low printf "stgReqId->node[0]=%d\n" , filereq->stgReqId.node[0] printf "stgReqId->node[1]=%d\n" , filereq->stgReqId.node[1] printf "stgReqId->node[2]=%d\n" , filereq->stgReqId.node[2] printf "stgReqId->node[3]=%d\n" , filereq->stgReqId.node[3] printf "stgReqId->node[4]=%d\n" , filereq->stgReqId.node[4] printf "stgReqId->node[5]=%d\n" , filereq->stgReqId.node[5] printf "err.errmsgtxt=%s\n" , filereq->err.errmsgtxt printf "err.severity=%d\n" , filereq->err.severity printf "err.errorcode=%d\n" , filereq->err.errorcode printf "err.max_tpretry=%d\n", filereq->err.max_tpretry printf "err.max_cpretry=%d\n", filereq->err.max_cpretry end if hdr != 0 && tapereq != 0 && (hdr->reqtype == 0x2101 || hdr->reqtype == 0x2104) printf "\n" printf "\n" printf "==================================\n" if whereto == SendTo printf "Sending " else printf "Receiving " end if hdr->reqtype == 0x2101 printf "RTCP_TAPE_REQ message\n" else printf "RTCP_TAPEERR_REQ message\n" end printf "==================================\n" printf "\n" printf "vid=%s\n" , tapereq->vid printf "vsn=%s\n" , tapereq->vsn printf "label=%s\n" , tapereq->label printf "devtype=%s\n" , tapereq->devtype printf "density=%s\n" , tapereq->density printf "server=%s\n" , tapereq->server printf "unit=%s\n" , tapereq->unit printf "VolReqID=%d\n" , tapereq->VolReqID printf "jobID=%d\n" , tapereq->jobID printf "mode=%d\n" , tapereq->mode printf "start_file=%d\n" , tapereq->start_file printf "end_file=%d\n" , tapereq->end_file printf "side=%d\n" , tapereq->side printf "tprc=%d\n" , tapereq->tprc printf "TStartRequest=%d\n", tapereq->TStartRequest printf "TEndRequest=%d\n" , tapereq->TEndRequest printf "TStartRtcpd=%d\n" , tapereq->TStartRtcpd printf "TStartMount=%d\n" , tapereq->TStartMount printf "TEndMount=%d\n" , tapereq->TEndMount printf "TStartUnmount=%d\n", tapereq->TStartUnmount printf "TEndUnmount=%d\n" , tapereq->TEndUnmount printf "rtcpReqId->time_low=%d\n" , tapereq->rtcpReqId.time_low printf "rtcpReqId->time_mid=%d\n" , tapereq->rtcpReqId.time_mid printf "rtcpReqId->time_hi_and_version=%d\n" , tapereq->rtcpReqId.time_hi_and_version printf "rtcpReqId->clock_seq_hi_and_reserved=%d\n", tapereq->rtcpReqId.clock_seq_hi_and_reserved printf "rtcpReqId->clock_seq_low=%d\n" , tapereq->rtcpReqId.clock_seq_low printf "rtcpReqId->node[0]=%d\n" , tapereq->rtcpReqId.node[0] printf "rtcpReqId->node[1]=%d\n" , tapereq->rtcpReqId.node[1] printf "rtcpReqId->node[2]=%d\n" , tapereq->rtcpReqId.node[2] printf "rtcpReqId->node[3]=%d\n" , tapereq->rtcpReqId.node[3] printf "rtcpReqId->node[4]=%d\n" , tapereq->rtcpReqId.node[4] printf "rtcpReqId->node[5]=%d\n" , tapereq->rtcpReqId.node[5] printf "err.errmsgtxt=%s\n" , tapereq->err.errmsgtxt printf "err.severity=%d\n" , tapereq->err.severity printf "err.errorcode=%d\n" , tapereq->err.errorcode printf "err.max_tpretry=%d\n", tapereq->err.max_tpretry printf "err.max_cpretry=%d\n", tapereq->err.max_cpretry end c end b rtcp_SendTpDump commands silent if dumpreq != 0 printf "\n" printf "\n" printf "=================================\n" printf "Sending RTCP_DUMPTAPE_REQ message\n" printf "=================================\n" printf "\n" printf " dumpreq->maxbyte=%d\n" , dumpreq->maxbyte printf " dumpreq->blocksize=%d\n" , dumpreq->blocksize printf " dumpreq->convert=%d\n" , dumpreq->convert printf " dumpreq->tp_err_action=%d\n", dumpreq->tp_err_action printf " dumpreq->startfile=%d\n" , dumpreq->startfile printf " dumpreq->maxfile=%d\n" , dumpreq->maxfile printf " dumpreq->fromblock=%d\n" , dumpreq->fromblock printf " dumpreq->toblock=%d\n" , dumpreq->toblock end c end run quit HERE CMD="export LD_LIBRARY_PATH=`find $CASTOR_CVS -name '*.so*' | sed 's|/[^/]*$||' | sort | uniq | xargs | sed 's/ /:/g'`; gdb -x $GDB_COMMAND_FILE --args $CASTOR_CVS/rtcopy/$PARAMS" if test $USER = root; then sudo -u stage sh -c "$CMD" else sh -c "$CMD" fi How to checkout CASTOR from SVN =============================== svn co svn+ssh://username@svn.cern.ch/reps/CASTOR/CASTOR2/trunk CASTOR2 How to modify an SVN commit comment =================================== svn propedit svn:log --revprop -r 18620 How to extract the white-list privilege rules from a stager database ==================================================================== select 'insert into whitelist values (' || nvl2(svcclass, '''' || svcclass || '''','NULL') || ', ' || nvl(to_char(euid), 'NULL') || ', ' || nvl(to_char(egid), 'NULL') || ',' || nvl(to_char(reqtype), 'NULL') || ');' from whitelist; How to add 2 new tapes in a new library together with a new tape server to ========================================================================== the tape-development setup ========================== The following two tapes are to be added to the tape-development setup: T00033 T00121 In production the tape operations team have DISABLED the two tapes and put them in the tape_dev pool. To see this one must be logged onto a machine with a unix account that sees the production system. In this example such a login is murrayc3@lxplus: [root@lxcastordev04 doc]# ssh murrayc3@lxplus vmgrlisttape -V T00033 ******************************************************************************* * The LXPLUS Public Login Unix Service * * * http://cern.ch/ComputingRules : Govern the use of CERN computing facilities * * ******************************************************************************* T00033 T00033 SL8600_1 1000GC aul tape_dev 1000.00GB 20090714 DISABLED [root@lxcastordev04 doc]# ssh murrayc3@lxplus vmgrlisttape -V T00121 ******************************************************************************* * The LXPLUS Public Login Unix Service * * * http://cern.ch/ComputingRules : Govern the use of CERN computing facilities * * ******************************************************************************* T00121 T00121 SL8600_1 1000GC aul tape_dev 1000.00GB 20090616 DISABLED [root@lxcastordev04 doc]# The tape-development setup requires a new library definition for the two tapes. This definition should have a name similar to the production one in order not to cause too much confusion. We do not use exactly the same name because we usually split tape production tape libraries into at least two libraries (an 'A' and 'B') in the tape-development setup in order for us to better control which tapes go to which drives. From the above queries in the production system about our two tapes we see that the name of their production library is the same and that it is the following: SL8600_1 The tape-development setup will have two libraries derived from this one, and they will be named as follows: SL86001A SL86001B The above two libraries need to be added to the tape-development setup before the two tapes they contain. To do this one must be logged onto a machine with a unix account that sees the tape-development setup. In this example such a login is root@lxcastordev04: [root@lxcastordev04 doc]# vmgrenterlibrary --name SL86001A --capacity 6600 [root@lxcastordev04 doc]# vmgrenterlibrary --name SL86001B --capacity 6600 [root@lxcastordev04 doc]# vmgrlistlibrary | grep SL86001 SL86001A CAPACITY 6600 FREE 6600 (100.0%) ONLINE SL86001B CAPACITY 6600 FREE 6600 (100.0%) ONLINE [root@lxcastordev04 doc]# The models for the two tapes need to be copied from production into the tape-development setup before tapes can be added. The models of the two tapes can be determined as follows: [root@lxcastordev04 tape]# ssh murrayc3@lxplus vmgrlisttape -x -V T00033 | awk '{ print "VID=" $1 " MODEL=" $6; }' ******************************************************************************* * The LXPLUS Public Login Unix Service * * * http://cern.ch/ComputingRules : Govern the use of CERN computing facilities * * ******************************************************************************* VID=T00033 MODEL=T10000 [root@lxcastordev04 tape]# ssh murrayc3@lxplus vmgrlisttape -x -V T00121 | awk '{ print "VID=" $1 " MODEL=" $6; }' ******************************************************************************* * The LXPLUS Public Login Unix Service * * * http://cern.ch/ComputingRules : Govern the use of CERN computing facilities * * ******************************************************************************* VID=T00121 MODEL=T10000 [root@lxcastordev04 tape]# The above output shows that both tapes are of the same model and that model is: T10000 In order to enter the model into the tape-development setup we need to get its media letter and media cost: [root@lxcastordev04 tape]# ssh murrayc3@lxplus vmgrlistmodel --mo T10000 ******************************************************************************* * The LXPLUS Public Login Unix Service * * * http://cern.ch/ComputingRules : Govern the use of CERN computing facilities * * ******************************************************************************* T10000 T 200 [root@lxcastordev04 tape]# The model can now be entered into the tape-development setup: [root@lxcastordev04 tape]# vmgrentermodel --mo T10000 --ml T --mc 200 [root@lxcastordev04 tape]# vmgrlistmodel --mo T10000 T10000 T 200 The density mapping between the model and the density of the two tapes needs to be entered into the tape-development setup before the two tapes are entered. The name given to the density is 1000GC and the native (actual) capacity is 931G: [root@lxcastordev04 tape]# vmgrenterdenmap -d 1000GC --mo T10000 --ml T --nc 931G [root@lxcastordev04 tape]# vmgrlistdenmap | grep T10000 T10000 T 1000GC 931.00Gi The DGN mapping between the new library and tape model needs to be entered into the tape-development setup before the two tapes can be entered. To avoid confusion the DGN should look like what exits in production but should not be the same in order to protect against the two system interferring with each other in the case where by accident one system talks to the VDQM of the other. If different DGNs are used in the two systems then the VDQM of one system will refuse requests from the other system because it does not recognise their DGNs. The following shows how to get the DGN used in the production system: [root@lxcastordev04 tape]# ssh murrayc3@lxplus vmgrlistdgnmap | grep SL8600 ******************************************************************************* * The LXPLUS Public Login Unix Service * * * http://cern.ch/ComputingRules : Govern the use of CERN computing facilities * * e****************************************************************************** T10KB6 T10000 SL8600_1 [root@lxcastordev04 tape]# Like with the libraries we will create at least two DGNs in the tape-development setup (A and B). The "A" DGN will be mapped to the "A" library and likewise the "B" DGN will be mapped to the "B" library: [root@lxcastordev04 tape]# vmgrenterdgnmap -g T10B6A --library SL86001A --mo T10000 [root@lxcastordev04 tape]# vmgrenterdgnmap -g T10B6B --library SL86001B --mo T10000 [root@lxcastordev04 tape]# vmgrlistdgnmap | grep T10B6 T10B6A T10000 SL86001A T10B6B T10000 SL86001B The VDQM should now be synchronised with the VMGR: [root@lxcastorsrv102 ~]# vdqmDBInit Try to update DeviceGroupName table in db... New DeviceGroupName row inserted into db: dgName = T10B6A, libraryName = SL86001A New DeviceGroupName row inserted into db: dgName = T10B6B, libraryName = SL86001B Try to update TapeAccessSpecification table in db... New TapeAccessSpecification row inserted into db: accessMode = 0, density = 1000GC, tapeModel = T10000 New TapeAccessSpecification row inserted into db: accessMode = 1, density = 1000GC, tapeModel = T10000 [root@lxcastorsrv102 ~]# The two tapes can now be added to the tape-development setup. We will put them both into the SL86001B library and the repack_spare pool: [root@lxcastordev04 tape]# vmgrentertape -V T00033 --mo T10000 --ml T --li SL86001B -d 1000GC -l aul --po repack_spare [root@lxcastordev04 tape]# vmgrentertape -V T00121 --mo T10000 --ml T --li SL86001B -d 1000GC -l aul --po repack_spare [root@lxcastordev04 tape]# vmgrlisttape | grep T00 T00033 T00033 SL86001B 1000GC aul repack_spare 931.00GiB 00000000 T00121 T00121 SL86001B 1000GC aul repack_spare 931.00GiB 00000000 How to get a backtrace of all the threads of aggregatord ======================================================== [root@tpsrv203 ~]# PID=`ps -ef | egrep 'aggregatord$' | awk '{print $2;}'`; gdb attach $PID << HERE set pagination off set logging file /tmp/aggregator_full_stack_trace.log set logging overwrite on set logging on thread apply all bt full set logging off HERE How to add a DGN directly to the VDQM database ============================================== Only do this if you really know what you are doing. This infomation is usually added to the VDQM database using the vdqmDBInit command-line tool. DECLARE dgnIdVar NUMBER := 0; dgNameVar VARCHAR2(2048) := '359B3B'; libraryNameVar VARCHAR2(2048) := 'IBMLIB3B'; BEGIN INSERT INTO DeviceGroupName(id, dgName, libraryName) VALUES(ids_seq.nextval, dgNameVar, libraryNameVar) RETURNING id INTO dgnIdVar; INSERT INTO Id2Type (id, type) VALUES(dgnIdVar, 88); -- OBJ_DeviceGroupName = 88 END; / commit; How to add a tape access specification directly to the VDQM database ==================================================================== Only do this if you really know what you are doing. This infomation is usually added to the VDQM database using the vdqmDBInit command-line tool. DECLARE specIdVar NUMBER := 0; accessModeVar NUMBER := 1; densityVar VARCHAR2(2048) := '1000GC'; tapeModelVar VARCHAR2(2048) := '3592'; BEGIN INSERT INTO TapeAccessSpecification(id, accessMode, density, tapeModel) VALUES(ids_seq.nextval, accessModeVar, densityVar, tapeModelVar) RETURNING id INTO specIdVar; INSERT INTO Id2Type (id, type) VALUES(specIdVar, 91); -- OBJ_TapeAccessSpecification = 91 END; / How to add a tape server directly to the VDQM database ====================================================== Only do this if you really know what you are doing. This infomation is usually added by a tape server starting or stopping for the first time. DECLARE serverIdVar NUMBER := 0; serverNameVar VARCHAR2(2048) := 'tpsrv250'; actingModeVar NUMBER := 0; BEGIN INSERT INTO TapeServer(id, serverName, actingMode) VALUES(ids_seq.nextval, serverNameVar, actingModeVar) RETURNING id INTO serverIdVar; INSERT INTO Id2Type (id, type) VALUES(serverIdVar, 86); -- OBJ_TapeServer = 86 END; / commit; How to add a tape drive directly to the VDQM database ====================================================== Only do this if you really know what you are doing. This infomation is usually added by a tape server starting or stopping for the first time. DECLARE driveIdVar NUMBER := 0; jobIdVar NUMBER := 0; modificationTimeVar NUMBER := 0; resetTimeVar NUMBER := 0; useCountVar NUMBER := 0; errCountVar NUMBER := 0; transferredMBVar NUMBER := 0; totalMBVar NUMBER := 0; driveNameVar VARCHAR2(2048) := '35923005'; tapeVar NUMBER := NULL; runningTapeReqVar NUMBER := NULL; deviceGroupNameVar VARCHAR2(2048) := '359B3B'; deviceGroupNameIdVar NUMBER := NULL; statusVar NUMBER := 0; -- UNIT_UP = 0 tapeServerNameVar VARCHAR2(2048) := 'tpsrv250'; tapeServerIdVar NUMBER := NULL; BEGIN SELECT id INTO deviceGroupNameIdVar FROM DeviceGroupName WHERE dgName = deviceGroupNameVar; SELECT id INTO tapeServerIdVar FROM TapeServer WHERE serverName = tapeServerNameVar; INSERT INTO TapeDrive(id, jobId, modificationTime, resetTime, useCount, errCount, transferredMB, totalMB, driveName, tape, runningTapeReq, deviceGroupName, status, tapeServer) VALUES(ids_seq.nextval, jobIdVar, modificationTimeVar, resetTimeVar, useCountVar, errCountVar, transferredMBVar, totalMBVar, driveNameVar, tapeVar, runningTapeReqVar, deviceGroupNameIdVar, statusVar, tapeServerIdVar) RETURNING id INTO driveIdVar; INSERT INTO Id2Type (id, type) VALUES(driveIdVar, 87); -- OBJ_TapeDrive = 87 END; / commit; How to determine the path, mount point an disk server of a castor file ====================================================================== The easy way: SQL> set linesize 1000 SQL> column diskpool format a8 SQL> column location format a70 SQL> column creationtime format a20 SQL> SELECT * FROM TABLE (getdcs(399131859)); ID DISKPOOL LOCATION A STATUS CREATIONTIME GCWEIGHT ---------- -------- -------- - ---------- -------------------- ---------- 1010785562 default lxfsre43 Y 0 21-DEC-2009 09:57:35 1261389898 04.cern. ch:/srv/ castor/0 5/59/399 131859@c astorns. 10107855 62 SQL> The not-so-easy way: SQL> select path from diskcopy where castorfile = (select id from castorfile where fileid=399131859); PATH -------------------------------------------------------------------------------- 59/399131859@castorns.1010785562 SQL> select mountpoint from filesystem where id = (select filesystem from diskcopy where castorfile = (select id from castorfile where fileid=399131859)); MOUNTPOINT -------------------------------------------------------------------------------- /srv/castor/05/ SQL> select name from diskserver where id = (select diskserver from filesystem where id = (select filesystem from diskcopy where castorfile = (select id from castorfile where fileid=399131859))); NAME -------------------------------------------------------------------------------- lxfsre4304.cern.ch SQL> How to display the checksum extended file system attribute of a disk file ========================================================================= [root@lxfsre4304 59]# getfattr -d 399131859@castorns.1010785562 # file: 399131859@castorns.1010785562 user.castor.checksum.type="ADLER32" user.castor.checksum.value="99653f32" [root@lxfsre4304 59]# How to find the VDQM production server ====================================== The VDQM production server runs on the following two nodes: castorsrv203 castorsrv303 How to install, upgrade and use repack ====================================== The instructions on how to install, upgrade and use repack can be found at the following web url: https://twiki.cern.ch/twiki/bin/view/FIOgroup/CastorV2HowToUseRepack How to create and reset a group of tape copies ready for testing the migunter ============================================================================= With a stager database containing no tape copies, stop the rtcpclientd daemon: /etc/init.d/rtcpclientd stop Create a relatively small source file on a local disk which will be copied into CASTOR a 100 times to create 100 tape copies: dd if=/dev/urandom of=/tmp/100M Create a destination directory in your CASTOR namespace: nsmkdir /castor/cern.ch/dev/m/murrayc3/mighuntertest Execute 100 rfcp commands to create 100 tape copies in the stager database for I in `seq 100`; do DEST=/castor/cern.ch/dev/m/murrayc3/mighuntertest/100M_$I; echo $DEST; rfcp /tmp/100M $DEST; done The database is now ready for testing the mighunter. After a test is complete the database can be "reset" by deleting all of the rows in the stream2tapecopy table, then deleting all of the rows in the stream table together with the associated rows in the id2type table and then reseting the status column of all the rows in the tapecopy table to 0 (TAPECOPY_CREATED). delete from stream2tapecopy; delete from id2type where id in (select id from stream); delete from stream; update tapecopy set status = 0; How to make stager_oracle_create.sql work on Oracle Express Edition =================================================================== Remove all PARTITION and LOCAL clauses. Remove all JOB_CLASS job attributes. How to make a castor rpm ======================== Go into the CASTOR debian directory: cd trunk/debian create the following files as needed: hyphenated-rpm-name.dirs hyphenated-rpm-name.install.perm hyphenated-rpm-name.logrotate hyphenated-rpm-name.manpages hyphenated-rpm-name.postinst hyphenated-rpm-name.postun hyphenated-rpm-name.pre hyphenated-rpm-name.preun add the name, description and dependencies of the new rpm to the following file: trunk/debian/control How to install a test version of the tapegateway on c2itdc ========================================================== c2itdc has two head nodes. The rtcpclientd daemon runs on the following head node: c2itdcsrv102 This is where the tapegateway daemon will run when it replaces rtcpclientd. The software packages installed on the c2itdc headnodes and "worker" disk servers are managed by the quattor system. Quattor needs to be told the following: * The CASTOR version of the test tapegateway software * Any new rpms required by the the tapegateway software * Any rpms that must be unistalled The quattor system will not be able to install the test version rpms because they will not be in the CERN software repository, however quattor must agree with their presence on c2itdcsrv102 otherwise it will remove them. The c2itdcsrv102 profile template must be modified so that it includes the following two new rpms: castor-lib-tape-W.X.Y-Z.x86_64.rpm castor-tapegateway-server-W.X.Y-Z.x86_64.rpm The c2itdcsrv102 profile template must be modified so that it does not include the following rpm: castor-lib-policy-W.X.Y-Z.x86_64.rpm The castoritdc service template must be modified so that the version of CASTOR to be installed on c2itdcsrv102 is the test version, for example 2.1.9-5 Do the following in order to modify the c2itdcsrv102i-profile and castoritdc- service templates: Logon to a machine in the lxadm cluster: ssh lxadm Make a directory to work in, for example: mkdir cdbop Enter the directory and run cdbop to open its console: [lxadm05] /afs/cern.ch/user/m/murrayc3 > cd cdbop [lxadm05] /afs/cern.ch/user/m/murrayc3/cdbop > cdbop quattor CDB CLI: Version 2.2.0 Connecting to https://cdbserv.cern.ch... Welcome to CDB Command Line Interface Opening session... [INFO] session opened with ID Type 'help' for more info Add the two new rpms and remove the unwanted rpm by adding the following 4 lines to the c2itdcsrv102 profile template: # Tape Gateway Software "/software/packages" = pkg_add("castor-lib-tape", castorversion, ELFMS_ARCH); "/software/packages" = pkg_add("castor-tapegateway-server", castorversion, ELFMS_ARCH); "/software/packages" = pkg_del("castor-lib-policy"); For example enter the following in the cdbop console: get -f profiles/profile_c2itdcsrv102 [INFO] 'profiles/profile_c2itdcsrv102.tpl': received !vi profiles/profile_c2itdcsrv102.tpl Replace the castorversion variable in the castoritdc service template with the following 9 linesi ("worker" means disk server): variable castorversion = { if (role == "worker") { "2.1.9-4"; } else if (value("/system/network/hostname") == "c2itdcsrv102") { "2.1.9-5"; } else { "2.1.9-4"; }; }; For example enter the following in the cdbop console: get -f prod/services/castor/service/castoritdc [INFO] 'prod/services/castor/service/castoritdc.tpl': received !vi prod/services/castor/service/castoritdc.tpl Upload the two modified templates. For example enter the following in the cdbop console: up prod/services/castor/service/castoritdc.tpl profiles/profile_c2itdcsrv102.tpl [INFO] '/prod/services/castor/service/castoritdc': scheduled to be updated [INFO] '/profiles/profile_c2itdcsrv102': scheduled to be updated Commit the changes and quit. For example enter the following in the cdbop console: com -f [INFO] '/prod/services/castor/service/castoritdc': will be updated [INFO] '/profiles/profile_c2itdcsrv102': will be updated Comment: Upgrade c2itdcsrv102 to 2.1.9-5 [INFO] please wait... [INFO] commit OK cdbop@cdbserv.cern.ch: ~/cdbop> quit Run the spma_wrapper.sh command on c2itdcsrv102. Warning this will intentionally display many errors as the test rpms will not exist in the CERN software repository: [root@c2itdcsrv102 ~]# spma_wrapper.sh ... [INFO] Please be patient... 31 operation(s) to verify/execute. [WARN] rpmt STDERR output produced: [WARN] IOError with package http://lxc1rk25/swrep/x86_64_slc4//castor-config-2.1.9-5.x86_64.rpm : HTTP Error 404: Not Found [ERROR] rpmt failed to run, exit status: 256 [ERROR] SPMA finished. with exit status 1 [root@c2itdcsrv102 ~]# Still logged into c2itdcsrv102, manually install the two new rpms: rpm -Uvh --force /afs/cern.ch/user/m/murrayc3/public/castor/RPMS/castor-lib-tape-2.1.9-5.x86_64.rpm rpm -Uvh --force /afs/cern.ch/user/m/murrayc3/public/castor/RPMS/castor-tapegateway-server-2.1.9-5.x86_64.rpm Still logged into c2itdcsrv102, manually update all of the already installed CASTOR rpms: [root@c2itdcsrv102 ~]# rpm -qa | grep castor | grep 2.1.9-4 | grep -v castor-lib-policy | sed 's/2.1.9-4/2.1.9-5/' | awk '{ print "/afs/cern.ch/user/m/murrayc3/public/castor/RPMS/"$1".x86_64.rpm" }' | xargs rpm -Uvh --force Run spma_wrapper to check everything is OK. The spma_wrapper console output should report it erased the castor-lib-policy rpm. Here they are the db queries that I am using to see the migration situation. How to see the status of the tape copies in the stager database =============================================================== select status,count(*) from tapecopy group by status; status: 0-1 => not picked by the mighunter 2 => attached to a stream 7 => the mighunter is analyzing them 3 => detached from the stream and sent to the aggregator How to see the status of the streams in the stager database =========================================================== select parent,count(*) from stream2tapecopy group by parent; The parent is the stream id and like this you can see how many tapecopies can be migrated from that stream. How to locate a castor file in the disk cache using the stager database ======================================================================= SQL> select diskserver.name "server", filesystem.mountpoint, diskcopy.path, diskcopy.diskcopysize from diskcopy inner join filesystem on diskcopy.filesystem = filesystem.id inner join diskserver on filesystem.diskserver = diskserver.id where castorfile=7168148768; server MOUNTPOINT PATH DISKCOPYSIZE -------------------- -------------------- ---------------------------------------- ------------ lxfsre1907.cern.ch /srv/castor/02/ 64/397637364@castorns.7168148769 609671751 SQL> How to build CASTOR rpms for SLC5 from the trunk of the svn repository ====================================================================== 1. Log on to an SLC5 build machine using your developer account, for example lxc2build04.cern.ch 2. Create a .rpmmacros file in your home directory with the following content: %_topdir /tmp/murrayc3_build %_tmppath /tmp/murrayc3_build %_buildroot %{_tmppath}/%{name}-build 3. Create the rpm build directory structure with: BUILD_DIR=/tmp/murrayc3_build mkdir -p $BUILD_DIR/RPMS/i386 mkdir -p $BUILD_DIR/RPMS/x86_64 mkdir -p $BUILD_DIR/RPMS/noarch mkdir -p $BUILD_DIR/SRPMS mkdir -p $BUILD_DIR/SPECS mkdir -p $BUILD_DIR/BUILD mkdir -p $BUILD_DIR/SOURCES 4. Check out the castor source cd $BUILD_DIR svn co svn+ssh://murrayc3@svn.cern.ch/reps/CASTOR/CASTOR2/trunk 5. Set environment variables for using Oracle and makedepend # Set environment variables for using Oracle export ORACLE_HOME="/afs/cern.ch/project/oracle/@sys/10203" export ORACLE_LIB="${ORACLE_HOME}/lib" export ORACLE_BIN="${ORACLE_HOME}/bin" export PATH="${ORACLE_BIN}:${PATH}" export LD_LIBRARY_PATH="${ORACLE_LIB}:${LD_LIBRARY_PATH}" # Set environment variables for using makedepend which is in /usr/bin/X11 export PATH="/usr/bin/X11:${PATH}" 6. Set environment variables for the version of CASTOR export MAJOR_CASTOR_VERSION="2.1" export MINOR_CASTOR_VERSION="9.5" 7. Build the rpms cd $BUILD_DIR/trunk ./makerpm.sh How to avoid the insufficient user privileges error when using rfcp =================================================================== If you encounter an error of the following form on your development box: [root@lxcastordev08 ~]# sudo -u nbessone rfcp /tmp/nbessone/100MFile /castor/cern.ch/dev/n/nbessone/ stage_put: Insufficient user privileges to make a request of type StagePutRequest in service class 'default' /castor/cern.ch/dev/n/nbessone/ : Permission denied Then as a developer you can avoid this permissions problem by adding your group id to the AdminUsers table of the stager database. You could add both your user id and group id if really want to be precise. Here's how to add your group id: [lxcastordev04] /var/murrayc3 > id nbessone uid=8980(nbessone) gid=1022(cs) groups=1022(cs) [root@lxcastordev04 ~]# sqlplus stager_dev04@c2castordevdb ... SQL> insert into AdminUsers(egid) values(1022); 1 row created. SQL> commit; Commit complete. SQL> How to switch from using rtcpclientd to using tapegatewayd ========================================================== [root@lxcastordev04 db]# ACTION=stop; for DAEMON in stagerd rechandlerd rtcpclientd mighunterd; do CMD="/etc/init.d/$DAEMON $ACTION"; $CMD; done Stopping stagerd: [ OK ] Stopping rechandlerd: [ OK ] Stopping rtcpclientd: [ OK ] Stopping mighunter: [ OK ] [root@lxcastordev04 db]# [root@lxcastordev04 ~]# cd /tmp/checkout/v2_1_9Version/castor/db [root@lxcastordev04 db]# sqlplus stager_dev04@c2castordevdb @switchToTapegatewayd.sql SQL*Plus: Release 10.2.0.3.0 - Production on Mon May 3 17:09:25 2010 Copyright (c) 1982, 2006, Oracle. All Rights Reserved. Enter password: Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production With the Partitioning, Data Mining and Real Application Testing options SP2-0310: unable to open file "?/sqlplus/admin/glogin.sql" PL/SQL procedure successfully completed. 1 row updated. Commit complete. PL/SQL procedure successfully completed. Trigger created. Trigger created. PL/SQL procedure successfully completed. 1 row updated. Commit complete. SQL> quit Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production With the Partitioning, Data Mining and Real Application Testing options [root@lxcastordev04 db]# [root@lxcastordev04 db]# ACTION=start; for DAEMON in stagerd rechandlerd mighunterd tapegatewayd; do CMD="/etc/init.d/$DAEMON $ACTION"; $CMD; done Starting stagerd: [ OK ] Starting rechandlerd: [ OK ] Starting mighunter: For svcclass: default [ OK ] Starting tapegatewayd: [ OK ] [root@lxcastordev04 db]# How to log into the c2itdc stager hosts ======================================= There are two c2itdc stager hosts: c2itdcsrv101 c2itdcsrv102 The c2itdcsrv102 host is the one used by the tape-related daemons of a stager host installation. How to rename a database table constraint ========================================= SQL> ALTER TABLE TapeGatewayRequest RENAME CONSTRAINT fk_TGSubrequest_sm to fk_TapeGatewayRequest_sm; Table altered. SQL> ALTER TABLE TapeGatewayRequest RENAME CONSTRAINT fk_TGSubrequest_tr to fk_TapeGatewayRequest_tr; Table altered. SQL> How to browse DLF via the web ============================= Open the web page at the following url: http://c2adm.cern.ch/dlf How to compile all INVALID functions and procedures in your schema ================================================================== BEGIN FOR obj in ( SELECT * FROM user_objects WHERE object_type IN ('FUNCTION', 'PROCEDURE') AND status = 'INVALID') LOOP EXECUTE IMMEDIATE 'ALTER ' || obj.object_type || ' ' || obj.object_name || ' COMPILE'; END LOOP; END; / How to restart recalls which are stuck due to repack ==================================================== A recall can get stuck because repack moved its tape segments to one or more other tapes. To reset a set of such stuck recalls do the following. Create a table of the castor files that are involved. In the following example tape T01537 was repacked. Please puts OPS_ at the beginning of the table name so that automated scripts looking for schema irregularities ignore the table. SQL> create table ops_steve_T01537_castorfile as select * from castorfile where id in (select tapecopy.castorfile from tapecopy where tapecopy.status = 4 and id in (select distinct copy from segment where tape in (select id from tape where vid = 'T01537' and tpmode = 0))); Table created. SQL> Run the following PL/SQL block modified to use your table as opposed to the one of this example. This script deletes the out-of-date tape copies and their associated segments. It fails the associated recall disk-copies and restarts the associated recall sub-requests. BEGIN FOR cf in (SELECT * from ops_steve_T01537_castorfile) LOOP deleteTapeCopies(cf.id); UPDATE DiskCopy SET status = 4 -- DISKCOPY_FAILED WHERE status = 2 -- DISKCOPY_WAITTAPERECALL AND castorFile = cf.id; UPDATE SubRequest SET status = 0 -- SUBREQUEST_START WHERE castorFile = cf.id AND status = 4; -- SUBREQUEST_WAITTAPERECALL END LOOP; END; / Your done. Another way to identify stuck recalls is to create a tape of the involved castor files based on recalls that have taken far too long to complete, e.g. over 1 week: CREATE TABLE OPS_STEVE_STUCK_RECALL_CFS AS SELECT DISTINCT CastorFile.id, CastorFile.FileId, CastorFile.nsHost, CastorFile.lastKnownFilename FROM TapeCopy INNER JOIN DiskCopy ON (TapeCopy.castorFile = DiskCopy.castorFile) INNER JOIN CASTORFILE ON (TapeCopy.castorFile = CastorFile.id) WHERE TapeCopy.status = 4 -- TAPECOPY_TOBERECALLED AND DiskCopy.status = 2 -- DISKCOPY_WAITTAPERECALL AND DiskCopy.creationtime < getTime() - (7 * 24 * 3600); Setup VDQM on a devbox using a scratchDB ======================================== 1) Prepare the VDQM DB Drop the DB schema (if persent) Create the VDQM schema SQL> @trunk/castor/vdqm/vdqm_oracle_create.sql 2) Create/Modify the ORAVDQMCONFIG file of the devBox to point to the scratch DB # vi /etc/castor/ORAVDQMCONFIG 3) Modify the VDQM HOST in the castor.conf to point to the devBox # vi /etc/castor/castor.conf .... VDQM HOST .... 4) install the VDQM server on your devbox (NOTE castor version match) # rpm -Uvh # http://swrepsrv.cern.ch/swrep/x86_64_slc4/castor-vdqm2-server-2.1.9-5.x86_64.rpm # vdqmDBInit # /etc/init.d/vdqmd start 5) Chose te tapeserver(s) to be used by the new VDQM # ssh tpstvABC 6) Put DOWN the drive # tpmaint stop 7) Modify the VDQM HOST and add VDQM host to ADMIN HOSTS in the castor.conf # vi /etc/castor/castor.conf ... ADMIN HOSTS ... .... VDQM HOST .... 8) Restart the drive # tpmaint start (just in case) # /etc/init.d/taped restart # /etc/init.d/rtcpd restart # /etc/init.d/rsyslog restart 9) Chek that it worked [root@tpsrvABC ~]# showqueues 35921005@tpsrvABC (0 MB) FREE vid: last update May 26 17:24:14 How to restore the rpms of your development box using afs ========================================================= With an SLC4 development box running veersion 2.1.9-5 of CASTOR: [root@lxcastordev04 mighunter]# cd /afs/cern.ch/project/castor/www/DIST/CERN/savannah/CASTOR.pkg/2.1.9-\*/2.1.9-5/SL4/x86_64/ [root@lxcastordev04 x86_64]# rpm -qa | egrep ^castor- | awk '{print "/afs/cern.ch/project/castor/www/DIST/CERN/savannah/CASTOR.pkg/2.1.9-*/2.1.9-5/SL4/x86_64/" $1 ".x86_64.rpm";}' | xargs rpm -Uvh --force Preparing... ########################################### [100%] 1:castor-lib ########################################### [ 3%] 2:castor-lib-oracle ########################################### [ 7%] 3:castor-lib-monitor ########################################### [ 10%] 4:castor-lib-tape ########################################### [ 14%] 5:castor-rmmaster-client ########################################### [ 17%] 6:castor-rechandler-serve########################################### [ 21%] 7:castor-rh-server ########################################### [ 24%] 8:castor-hsmtools ########################################### [ 28%] 9:castor-csec ########################################### [ 31%] 10:castor-rmmaster-server ########################################### [ 34%] 11:castor-stager-server ########################################### [ 38%] 12:castor-expert-server ########################################### [ 41%] 13:castor-rtcopy-clientser########################################### [ 45%] 14:castor-config ########################################### [ 48%] 15:castor-logprocessor-ser########################################### [ 52%] 16:castor-vdqm2-lib-oracle########################################### [ 55%] 17:castor-tape-client ########################################### [ 59%] 18:castor-vdqm2-client ########################################### [ 62%] 19:castor-dbtools ########################################### [ 66%] 20:castor-vmgr-client ########################################### [ 69%] 21:castor-ns-client ########################################### [ 72%] 22:castor-transfermanager-########################################### [ 76%] 23:castor-policies ########################################### [ 79%] 24:castor-mighunter-server########################################### [ 83%] 25:castor-lsf-plugin ########################################### [ 86%] 26:castor-stager-client ########################################### [ 90%] 27:castor-upv-client ########################################### [ 93%] 28:castor-rfio-client ########################################### [ 97%] 29:castor-tapegateway-serv########################################### [100%] [root@lxcastordev04 x86_64]# How to add a new rpm to a development box ========================================= The following example shows the additrion of the screen rpm to development box lxcastordev04: -bash-3.2$ ssh lxadm Warning: Permanently added the RSA host key for IP address '137.138.4.44' to the list of known hosts. Scientific Linux CERN SLC release 4.8 (Beryllium) Last login: Wed Jun 9 09:27:32 2010 from lxcastordev04.cern.ch ******************************************************************************* * This system is ACTIVELY MONITORED * * * * * WARNING: This system MUST NOT be used for PRIVATE PURPOSES * * * Users activity, including commands, is logged * * * * ******************************************************************************* [lxadm04] /afs/cern.ch/user/m/murrayc3 > cdbop quattor CDB CLI: Version 2.2.0 Connecting to https://cdbserv.cern.ch... Welcome to CDB Command Line Interface Opening session... [INFO] session opened with ID Type 'help' for more info get prod/cluster/castordev/customization/headnode/dev04 [INFO] 'prod/cluster/castordev/customization/headnode/dev04.tpl': received !vi prod/cluster/castordev/customization/headnode/dev04.tpl up prod/cluster/castordev/customization/headnode/dev04.tpl [INFO] '/prod/cluster/castordev/customization/headnode/dev04': scheduled to be updated com [INFO] '/prod/cluster/castordev/customization/headnode/dev04': will be updated please confirm [yes]: yes Comment: Addeed^? [INFO] please wait... [INFO] commit OK exit [lxadm04] /afs/cern.ch/user/m/murrayc3 > ssh root@lxcastordev04 Last login: Wed Jun 16 10:51:01 2010 from lxcastordev04.cern.ch ******************************************************************************* * * * Reminder: You have committed to obey the Computing Rules * * * http://cern.ch/ComputingRules * * * * ******************************************************************************* [root@lxcastordev04 ~]# spma_wrapper.sh [INFO] NCM-NCD version 1.2.22 started by root at: Wed Jun 16 10:56:18 2010 [INFO] executing configure on components.... [INFO] running component: spma --------------------------------------------------------- [OK] updated SPMA configuration file /etc/spma.conf [OK] updated SPMA target configuration file in /var/lib/spma-target.cf [INFO] configure on component spma executed, 0 errors, 0 warnings ========================================================= [OK] 0 errors, 0 warnings executing configure [INFO] SPMA version 1.11.5 started by root at: Wed Jun 16 10:56:29 2010 [INFO] using local package cache in: /var/spma-cache/ [INFO] proxy server not activated [WARN] /bin/rpm--version produced STDERR output: [INFO] examining local installations.. [INFO] reading target configuration .. [INFO] executing operations.. [INFO] The following package operations are required: replace ncm-chkconfig 1.1.7 1 noarch with http://swrep/swrep/x86_64_slc5/ ncm-chkconfig 1.1.9 1 noarch install http://swrep/swrep/x86_64_slc5/ screen 4.0.3 1.el5_4.1 x86_64 [INFO] Please be patient... 2 operation(s) to verify/execute. [OK] SPMA finished successfully. [INFO] NCM-NCD version 1.2.22 started by root at: Wed Jun 16 10:56:34 2010 [INFO] executing configure on components.... [INFO] running component: grub --------------------------------------------------------- [INFO] correct kernel (2.6.18-194.el5xen) already configured [OK] Updated boot kernel version to /boot/vmlinuz-2.6.18-194.el5xen [INFO] configure on component grub executed, 0 errors, 0 warnings ========================================================= [OK] 0 errors, 0 warnings executing configure [root@lxcastordev04 ~]# How to reset the stager database of development box =================================================== 1. Determine the CASTORversion of your development box. bash-3.2$ castor -v 2.1.9-6 bash-3.2$ 2. Checkout the database directory of the CASTOR source tree. bash-3.2$ cd /var/murrayc3/checkout bash-3.2$ svn co svn+ssh://murrayc3@svn.cern.ch/reps/CASTOR/CASTOR2/tags/v2_1_9_6/castor/db castor_db_v2_1_9_6 3. Drop the existing stager schema. bash-3.2$ sqlplus stager_dev04@c2castordevdb ... SQL> @/var/murrayc3/checkout/castor_db_v2_1_9_6/drop_oracle_schema.sql 4. Determine the user id and group id of user stage: -bash-3.2$ id stage uid=14029(stage) gid=1474(st) groups=1474(st) context=user_u:system_r:unconfined_t -bash-3.2$ 5. Determine the group id of group c3: -bash-3.2$ grep c3 /etc/group c3:x:1028: -bash-3.2$ 6. Create a new empty stager schema SQL> @/var/murrayc3/checkout/castor_db_v2_1_9_6/stager_oracle_create.sql ... Enter the stage user id: 14029 Enter the st group id: 1474 ... List of admins: 1028 ... 7. Check the diskservers are registered in the database bash-3.2$ rmGetNodes | egrep 'name:' name: lxc2disk07.cern.ch name: lxc2disk08.cern.ch bash-3.2$ 8. As root on the head node, enter the file classes of the name server into the stager database [root@lxcastordev04 ~]# for cname in `nslistclass | grep NAME | awk '{print $2}'` ; do enterFileClass --Name $cname --GetFromCns ; done 9. As root on the headnode, enter the default, dev and diskonly service classes into the stager database [root@lxcastordev04 ~]# enterSvcClass --Name default --DiskPools default --DefaultFileSize 10485760 --FailJobsWhenNoSpace yes --NbDrives 1 --TapePool stager_dev04 ... [root@lxcastordev04 ~]# enterSvcClass --Name dev --DiskPools extra --DefaultFileSize 10485760 --FailJobsWhenNoSpace yes ... [root@lxcastordev04 ~]# enterSvcClass --Name diskonly --DiskPools extra --ForcedFileClass temp --DefaultFileSize 10485760 --Disk1Behavior yes --FailJobsWhenNoSpace yes 10. As root on the headnode, move the file systems of the disk servers into the default and disk pools [root@lxcastordev04 ~]# moveDiskServer default lxc2disk07.cern.ch [root@lxcastordev04 ~]# moveDiskServer extra lxc2disk08.cern.ch [root@lxcastordev04 ~]# 11. As root release the disk servers into production [root@lxcastordev04 ~]# rmAdminNode -r -R -n lxc2disk07.cern.ch [root@lxcastordev04 ~]# rmAdminNode -r -R -n lxc2disk08.cern.ch [root@lxcastordev04 ~]# How to upgrade a development box ================================ Login as yourself on lxadm, run cdop and download the following template file: /prod/cluster/castordev/instance/dev04 -bash-3.2$ ssh murrayc3@lxadm Scientific Linux CERN SLC release 4.8 (Beryllium) Last login: Fri Jun 18 18:00:27 2010 from pb-d-128-141-48-106.cern.ch ******************************************************************************* * This system is ACTIVELY MONITORED * * * * WARNING: This system MUST NOT be used for PRIVATE PURPOSES * * Users activity, including commands, is logged * * * ******************************************************************************* [lxadm04] /afs/cern.ch/user/m/murrayc3 > cdbop quattor CDB CLI: Version 2.2.0 Connecting to https://cdbserv.cern.ch... Welcome to CDB Command Line Interface Opening session... [INFO] session opened with ID Type 'help' for more info get -f prod/cluster/castordev/instance/dev04 [INFO] 'prod/cluster/castordev/instance/dev04.tpl': received Modify the CASTOR_VERSION variable as needed, then upload the template and commit. up prod/cluster/castordev/instance/dev04.tpl [INFO] '/prod/cluster/castordev/instance/dev04': scheduled to be updated commit [INFO] '/prod/cluster/castordev/instance/dev04': will be updated please confirm [yes]: yes Comment: Upgrading to 2.1.9-7 [INFO] please wait... [INFO] commit OK !vi prod/cluster/castordev/instance/dev04.tpl Upgrade the stager database and then run spma_ncm_wrapper.sh as root. Don't forget to update your disk servers by running spma_ncm_wrapper.sh on them as well. How to modify the Quattor templates of a development box ======================================================== The Quattor templates for a development box are included in the following order: 1. prod/cluster/castordev/config <- NEVER CHANGE THIS 2. prod/cluster/castordev/instance/dev04 <- Override variables here 3. prod/cluster/castordev/role/headnode <- This is where the CASTOR_[CNS|VMGR|VDQM|…] variables are used – NEVER CHANGE THIS 4. prod/cluster/castordev/customization/headnode/dev04 <- Add rpms here How to build and install rlwrap on a development box ==================================================== -bash-3.2$ ssh murrayc3@lxadm Scientific Linux CERN SLC release 4.8 (Beryllium) Last login: Thu Jun 24 14:04:54 2010 from lxcastordev04.cern.ch ******************************************************************************* * This system is ACTIVELY MONITORED * * * * WARNING: This system MUST NOT be used for PRIVATE PURPOSES * * Users activity, including commands, is logged * * * ******************************************************************************* [lxadm04] /afs/cern.ch/user/m/murrayc3 > cdbop quattor CDB CLI: Version 2.2.0 Connecting to https://cdbserv.cern.ch... Welcome to CDB Command Line Interface Opening session... [INFO] session opened with ID <2UdGA9ILSI> Type 'help' for more info get -f prod/cluster/castordev/customization/headnode/dev04 !vi prod/cluster/castordev/customization/headnode/dev04.tpl In vi add the following 3 rpms in order: pkg_repl("libtermcap-devel"); pkg_repl("ncurses-devel"); pkg_repl("readline-devel"); up prod/cluster/castordev/customization/headnode/dev04.tpl [INFO] '/prod/cluster/castordev/customization/headnode/dev04': scheduled to be updated commit [INFO] '/prod/cluster/castordev/customization/headnode/dev04': will be updated please confirm [yes]: yes Comment: Adding libtermcap-devel, ncurses-devel and readline-devel so I build rlwrap which the slc5 rpm version of does not work [INFO] please wait... [INFO] commit OK exit [lxadm04] /afs/cern.ch/user/m/murrayc3 > [lxadm04] /afs/cern.ch/user/m/murrayc3 > exit logout Connection to lxadm closed. -bash-3.2$ ssh root@lxcastordev04 Last login: Thu Jun 24 18:47:55 2010 from lxcastordev04.cern.ch ******************************************************************************* * * * Reminder: You have committed to obey the Computing Rules * * http://cern.ch/ComputingRules * * * ******************************************************************************* [root@lxcastordev04 ~]# spma_wrapper.sh ... [root@lxcastordev04 ~]# cd /var/murrayc3 [root@lxcastordev04 murrayc3]# mkdir rlwrap [root@lxcastordev04 murrayc3]# cd rlwrap [root@lxcastordev04 rlwrap]# wget http://utopia.knoware.nl/~hlub/uck/rlwrap/rlwrap-0.37.tar.gz --2010-06-25 09:42:18-- http://utopia.knoware.nl/~hlub/uck/rlwrap/rlwrap-0.37.tar.gz Resolving utopia.knoware.nl... 213.197.30.29 Connecting to utopia.knoware.nl|213.197.30.29|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 251438 (246K) [application/x-gzip] Saving to: `rlwrap-0.37.tar.gz' 100%[==================================================================================================>] 251,438 1.53M/s in 0.2s 2010-06-25 09:42:18 (1.53 MB/s) - `rlwrap-0.37.tar.gz' saved [251438/251438] [root@lxcastordev04 rlwrap]# tar -xzf rlwrap-0.37.tar.gz [root@lxcastordev04 rlwrap]# cd rlwrap-0.37 [root@lxcastordev04 rlwrap-0.37]# ./configure ... [root@lxcastordev04 rlwrap-0.37]# make ... [root@lxcastordev04 rlwrap-0.37]# make install ... [root@lxcastordev04 rlwrap-0.37]# How to install repack client and server on a development box ============================================================ In CDB add: include { 'cluster/' + ELFMS_SVCCLASS + '/roles/repacknode' }; to: profiles/profile_lxcastordev04: Reconfigure the headnode: [root@lxcastordev04 ~]# spma_wrapper.sh .... [root@lxcastordev04 ~]# ncm-ncd –co castorconf .... Setup the password file: [root@lxcastordev04 castor]# cp ORAREPACKCONFIG.example ORAREPACKCONFIG [root@lxcastordev04 castor]# nano ORAREPACKCONFIG [root@lxcastordev04 castor]# chown root:st ORAREPACKCONFIG Start the repackd How to get useful information about the migration streams ========================================================= SQL> select stream.id stream, stream.status, tape, tape.vid, stats.nbtapecopies, tapepool.name tapepool, svcclass.name svcclass, svcclass.id svcclass_id from stream left outer join (select parent, count(*) nbtapecopies from stream2tapecopy group by parent) stats on (stream.id = stats.parent) left outer join tapepool on(stream.tapepool = tapepool.id) inner join svcclass2tapepool on (tapepool.id = svcclass2tapepool.child) inner join svcclass on (svcclass2tapepool.parent = svcclass.id) left outer join tape on (stream.tape = tape.id) order by tapepool.name; STREAM STATUS TAPE VID NBTAPECOPIES TAPEPOOL SVCCLASS SVCCLASS_ID ---------- ---------- ---------- ------- ------------ -------------------- -------- ----------- 5209843158 0 2352 gatewayendtape default 11640 5209843157 3 5209879812 I12034 2352 gatewayendtape default 11640 SQL> How to get trace information from rfcp ====================================== export RFIO_TRACE=3 How to fix an installation of the globus rpms ============================================= When the globus rpms have installed themselves incorrectly the castor build will fail with error messages of the form: In file included from /opt/globus/include/gcc64dbgpthr/globus_common.h:58, from /opt/globus/include/gcc64dbgpthr/gssapi.h:44, from /opt/globus/include/gcc64dbgpthr/globus_gss_assist.h:38, from /var/CASTOR_SVN_CO/trunk/security/Csec_plugin_GSS.c:51: /opt/globus/include/gcc64dbgpthr/globus_common_include.h:19:27: error: globus_config.h: No such file or directory Run the follwing as root in order to fix the installation of the globus rpms: rpm -qa | grep -E 'VDT|globus|GSI' | xargs rpm -e --nodeps; spma_wrapper.sh How to install plotutils to get the pic2plot command-line on an slc5 dev box ============================================================================ [root@lxcastordev04 ~]# ssh murrayc3@lxadmWarning: Permanently added the RSA host key for IP address '137.138.4.173' to the list of known hosts. ******************************************************************************* * http://cern.ch/ComputingRules : Govern the use of CERN computing facilities * ******************************************************************************* Last login: Mon Jun 14 20:04:42 2010 from lxadm05.cern.ch ******************************************************************************* * This system is ACTIVELY MONITORED * * * * WARNING: This system MUST NOT be used for PRIVATE PURPOSES * * Users activity, including commands, is logged * * * ******************************************************************************* [murrayc3@lxadm06 ~]$ cdbop quattor CDB CLI: Version 2.2.0 Connecting to https://cdbserv.cern.ch... Welcome to CDB Command Line Interface Opening session... [INFO] session opened with ID Type 'help' for more info get -f prod/cluster/castordev/customization/headnode/dev04 [INFO] 'prod/cluster/castordev/customization/headnode/dev04.tpl': received !vi prod/cluster/castordev/customization/headnode/dev04.tpl Add the following ploutils rpm using the pkg_repl("plotuils"), for example: # # template cluster/castordev/customization/headnode/dev04 # template cluster/castordev/customization/headnode/dev04; "/software/packages" = { pkg_repl("ghostscript"); pkg_repl("ghostscript-fonts"); pkg_repl("graphviz"); pkg_repl("libcroco"); pkg_repl("libgsf"); pkg_repl("librsvg2"); pkg_repl("screen"); pkg_repl("urw-fonts"); pkg_repl("libtermcap-devel"); pkg_repl("ncurses-devel"); pkg_repl("readline-devel"); pkg_repl("plotutils"); }; up prod/cluster/castordev/customization/headnode/dev04.tpl [INFO] '/prod/cluster/castordev/customization/headnode/dev04': scheduled to be updated commit [INFO] '/prod/cluster/castordev/customization/headnode/dev04': will be updated please confirm [yes]: yes Comment: Adding plotutils package in order to get the pic2plot command-line program [INFO] please wait... [INFO] commit OK exit [murrayc3@lxadm06 ~]$ exit logout Connection to lxadm closed. [root@lxcastordev04 ~]# spma_wrapper.sh [INFO] NCM-NCD version 1.2.22 started by root at: Mon Oct 18 11:37:18 2010 [INFO] executing configure on components.... [INFO] running component: spma --------------------------------------------------------- [OK] updated SPMA configuration file /etc/spma.conf [OK] updated SPMA target configuration file in /var/lib/spma-target.cf [INFO] configure on component spma executed, 0 errors, 0 warnings ========================================================= [OK] 0 errors, 0 warnings executing configure [INFO] SPMA version 1.11.6 started by root at: Mon Oct 18 11:37:28 2010 [INFO] using local package cache in: /var/spma-cache/ [INFO] proxy server not activated [WARN] /bin/rpm--version produced STDERR output: [INFO] examining local installations.. [INFO] reading target configuration .. [INFO] executing operations.. [INFO] The following package operations are required: install http://swrep/swrep/x86_64_slc5/ plotutils 2.5 5.el5 x86_64 [INFO] Please be patient... 1 operation(s) to verify/execute. [OK] SPMA finished successfully. [INFO] NCM-NCD version 1.2.22 started by root at: Mon Oct 18 11:37:33 2010 [INFO] executing configure on components.... [INFO] running component: grub --------------------------------------------------------- [INFO] correct kernel (2.6.18-194.17.1.el5xen) already configured [OK] Updated boot kernel version to /boot/vmlinuz-2.6.18-194.17.1.el5xen [INFO] configure on component grub executed, 0 errors, 0 warnings ========================================================= [OK] 0 errors, 0 warnings executing configure [root@lxcastordev04 ~]# How do I find the source files of the CASTOR web-site? ====================================================== The CASTOR web-site has the following URL: http://castor.web.cern.ch/castor/ The source files of the CASTOR web-site are located in the following afs directory: /afs/cern.ch/project/cndoc/wwwds/HSM/CASTOR/ How to find the locks existing in the DB? ========================================= -- Find locks in the DB, age, SQL. -- This relies on sql_prev_id, which might or might not point to the sql that took the lock select unique s.sid, s.program, s.process, l.type, l.id1, l.id2, l.ctime, l.block, do.object_name, sq.SQL_TEXT from v$session s inner join v$lock l on l.sid = s.sid left outer join v$sql sq on sq.sql_id = s.prev_sql_id left outer join dba_objects do on do.object_id = l.id1 where s.SCHEMANAME = 'STAGER_DEV03' order by OBJECT_NAME; How to find the blocked - blocker relationships in the DB? ========================================================== -- Find blocker, blocked and what they do or did. -- This relies on sql_prev_id, which might or might not point to the sql that took the lock select s.sid, s.process, s.program, s_sql.sql_text, blocker.sid "blocker sid", b_sql.sql_text "blocker prev sql" from v$session s left outer join v$sql s_sql on s_sql.sql_id = s.sql_id left outer join v$session blocker on blocker.sid = s.blocking_session left outer join v$sql b_sql on b_sql.sql_id = blocker.prev_sql_id where s.username = 'STAGER_DEV03' and s.blocking_session is not null; How to move to subversion 1.5.7 on a dev box ============================================ In template prod/cluster/castordev/customization/headnode/dev03.tpl, add: "/software/packages" = pkg_repl("subversion", "1.5.7-1", ELFMS_ARCH); "/software/packages" = pkg_add("neon", DEF, ELFMS_ARCH, "multi"); "/software/packages" = pkg_add("neon", "0.27.2-1", ELFMS_ARCH, "multi"); (neon is a dependancy). How to make a tape-server point at a different vdqm and vmgr ============================================================ ssh user@lxadm cdbop get -f profiles/profile_tpsrv001 [INFO] 'profiles/profile_tpsrv001.tpl': received !vi profiles/profile_tpsrv001.tpl Add the following lines: # # use development server of Steven Murray to provide VMGR and VDQM2 # "/software/components/castorconf/VDQM/HOST" = "lxcastordev04"; "/software/components/castorconf/VMGR/HOST" = "lxcastordev04"; "/software/components/castorconf/ADMIN/HOSTS" = "lxcastordev04 localhost"; up profiles/profile_tpsrv001.tpl [INFO] '/profiles/profile_tpsrv001': scheduled to be updated commit How to populate a new cupv database =================================== The following command-line will extract the permissions from an existing cupv database: Cupvlist | tail -n +2 | awk "{print \"Cupvadd --user \" \$1 \" --group \" \$2 \" --src '\" \$3 \"' --tgt '\" \$4 \"' --priv '\" \$5 \"'\";}" The following commands will populate a new cupv database. They should be ran as root on the cupv server: Cupvadd --user root --group root --src '^tpsrv016.cern.ch$' --tgt '^lxcastordev04.cern.ch$' --priv 'TP_SYSTEM' Cupvadd --user root --group root --src '^tpsrv029.cern.ch$' --tgt '^lxcastordev04.cern.ch$' --priv 'TP_SYSTEM' Cupvadd --user root --group root --src '^tpsrv971.cern.ch$' --tgt '^lxcastordev04.cern.ch$' --priv 'TP_SYSTEM' Cupvadd --user root --group root --src '^tpsrv982.cern.ch$' --tgt '^lxcastordev04.cern.ch$' --priv 'TP_SYSTEM' Cupvadd --user stage --group st --src '^lxcastordev04.cern.ch$' --tgt '^lxcastordev04.cern.ch$' --priv 'TP_OPER' Cupvadd --user murrayc3 --group c3 --src '^lxcastordev04.cern.ch$' --tgt '^lxcastordev04.cern.ch$' --priv 'TP_OPER|UPV_ADMIN|ADMIN' How to overcome "remote access attempted" errors reported by the vdqm ===================================================================== If the vdqm is not configured correctly you may see error messages in the vdqm log file of the following format: 2011-10-04T21:19:51.327907+02:00 lxcastordev04 vdqmd[16538]: LVL=Error TID=16554 MSG="Unable to read Request from socket" REQID=af4ebba0-9b1f-475b-a5e7-f471997720d2 Message="OldProtocolInterpreter::readProtocol(): remote access attempted, host = tpsrv674.cern.ch" To correct this problem please make sure you have the correct entries in the following CASTOR configuration file that is used to determine which IP addresses should be considered as local to the vdqm's network: /etc/castor/castor.localhosts How to create a virtual tape, load it into the MAP and transfer it to a slot ============================================================================ mktape -m V00001 -s 1 -t data -d T10KC vtlcmd 10 open map vtlcmd 10 load map V00001 vtlcmd 10 close map mtx -f /dev/sg4 status mtx -f /dev/sg4 eepos 0 transfer 40 24 mtx -f /dev/sg4 status How to label a set of virtual tapes =================================== tplabel -D VD41STK0 -d 20G -g V41STK -l aul -V V41001 -v V41001 -f tplabel -D VD41STK0 -d 20G -g V41STK -l aul -V V41002 -v V41002 -f tplabel -D VD41STK0 -d 20G -g V41STK -l aul -V V41003 -v V41003 -f tplabel -D VD41STK0 -d 20G -g V41STK -l aul -V V41004 -v V41004 -f tplabel -D VD41STK0 -d 20G -g V41STK -l aul -V V41005 -v V41005 -f How to set the repeat interval of the tape mount database jobs ============================================================== Login into the stager database with sqlplus and then enter the followimng lines at the sqlplus prompt: exec dbms_scheduler.disable('MIGRATIONMOUNTSJOB'); exec dbms_scheduler.set_attribute(name => 'MIGRATIONMOUNTSJOB', attribute => 'repeat_interval', value => 'FREQ=secondly; INTERVAL=10'); exec dbms_scheduler.enable('MIGRATIONMOUNTSJOB'); exec dbms_scheduler.disable('RECALLMOUNTSJOB' ); exec dbms_scheduler.set_attribute(name => 'RECALLMOUNTSJOB', attribute => 'repeat_interval', value => 'FREQ=secondly; INTERVAL=10'); exec dbms_scheduler.enable('RECALLMOUNTSJOB'); How to know what the rtcpd daemon is doing ========================================== The best best way to know what the rtcpd daemon is doing is know how it uses threads. The rtcpd daemon uses 5 types of thread: main, client listening, self monitoring, disk I/O and tape I/O. An rtcpd daemon with the default configuration runs with a total of 7 threads: the main thread, a single client listening thread, a single self monitoring thread, three disk I/O threads and a single tape I/O thread. PSEUDO CODE OF THE MAIN THREAD 1. Get the full request from client 2. Create the single client listening thread to service client pings and abort requests 3. If dumping a tape then 4. Dump the tape 5. Join with the client listening thread 6. Else 7. Create the pool of Disk I/O threads (3 threads by default). 8. Create the single self monitoring thread 9. Create the single tape I/O 10. Loop while not finished 11. If writing to tape 12. Get more work from the client 13. Else 14. Wait for tape I/O thread to request more work 15. End if writing to tape 16. If more work then 17. Assign work to pool of Disk I/O threads 18. End if more work 19. End loop while not finished 20. Join with tape I/O thread 21. Join with client listening thread 22. End if dumping a tape. PSEUDO CODE OF THE CLIENT LISTENING THREAD 1. Loop while not finished 2. Listen with a time out for data on the client connection 3. If data is available 4. Read in message from client 5. Handle ABORT, ENDOF, KILLJID, RSLCT or PING message 6. End if data was received 7. End while don’t stop PSEUDO CODE OF THE SELF MONITORING THREAD 1. Loop forever 2. Sleep 30 seconds 3. If the rtcpd daemon has hung then 4. Take action 5. End if the rtcpd daemon has hung 6. End loop foever PSEUDO CODE OF THE DISK I/O THREAD 1. Open disk file 2. If writing to tape then 3. Write memory to disk 4. Close disk file 5. Else 6. Write disk to memory 7. Close disk file 8. End if writing to tape PSEUDO CODE OF THE TAPE I/O THREAD 1. Mount tape 2. Loop while not finished 3. If reading from tape then 4. Get more work from the client 5. Else 6. Wait for more work 7. End if reading from tape 8. If more work 9. Position tape 10. If writing to tape then 11. Write memory to tape 12. Else 13. Write tape to memory 15. End if writing to tape 16. End if more work 17. End loop while not finished 18. Unmount tape