NexentaStor Notice Report by alert mail

FAULT: **********************************************************************

FAULT: Appliance   : nssan (OS v3.0.4, NMS v3.0.4 (r8917))

FAULT: Machine SIG : 32HFABADA

FAULT: Primary MAC : 18:a9:5:6e:a1:db

FAULT: Time        : Sun Jan 16 00:00:32 2011

FAULT: Trigger     : runners-check

FAULT: Fault Type  : ALARM

FAULT: Fault ID    : 20

FAULT: Fault Count : 2

FAULT: Severity    : NOTICE

FAULT: Action      : Administrative action required to clear the original

FAULT:             : fault that has caused ‘nms-check’ to go into

FAULT:             : maintenance. Once cleared, run ‘setup trigger nms-check

FAULT:             : clear-faults’ to clear the faults and re-enable

FAULT:             : ‘nms-check’. If the problem does not appear to be an

FAULT:             : actual fault condition, use ‘setup trigger nms-check’ to

FAULT:             : tune-up the fault trigger’s properties. See NexentaStor

FAULT:             : User Guide at http://www.nexenta.com/docs for more

FAULT:             : information.

FAULT: Description : Runner nms-check went into maintenance state

FAULT: **********************************************************************

 

!

! For more detais on this trigger click on link below:

! http://10.10.207.45:2000/data/runners?selected_runner=runners-check

!

 

Runner nms-check (description: “Track NMS connectivity failures and internal errors”) went into maintenance state

 

Advertisements

NexentaStor ZFS based SAN, console,web-gui error

I constantly have problems at access to nexentastor console and web-gui

NMV trouble:

After long time out i see:

**Proxy Error

The proxy server received an invalid response from an upstream server. The proxy server could not handle the request GET /data/services/.

Reason: Error reading from remote server

Apache/2.2.8 Ubuntu DAV/2 mod_ssl/2.2.8 OpenSSL/0.9.8k Server at x.x.x.x Port 2000

after restart the nms service (svcadm restart nms) i can now access the web-gui but still not thru i-explorer.

wondering what could be the issue, i added the host url to trusted site list but still unable to access the web-gui vie IE,

chrome, firefox & opera rocks…..

at times i pat myself… great buddy, were u not thinking of putting the SAN (ZFS based nexentastor) to production straight away after test running @home.

phew… its better to get your hands burnt a few times and tame, n get to know the pros n cons of any new technology before adopting it for production…



Exploring the ZFS arc..?

ARC – adaptive replacement cache …

well there is no dedicated readzilla (L2-ARC) or logzilla (ZIL) for my test SAN (NexentaStor)

i was curios to see the dataset of ARC content and ended up with this script,

admin@nssan:~$ ./arc_summary.pl
System Memory:
         Physical RAM:  8173 MB
         Free Memory :  770 MB
         LotsFree:      127 MB

ZFS Tunables (/etc/system):

ARC Size:
         Current Size:             5613 MB (arcsize)
         Target Size (Adaptive):   5613 MB (c)
         Min Size (Hard Limit):    893 MB (zfs_arc_min)
         Max Size (Hard Limit):    7149 MB (zfs_arc_max)

ARC Size Breakdown:
         Most Recently Used Cache Size:          44%    2497 MB (p)
         Most Frequently Used Cache Size:        55%    3116 MB (c-p)

ARC Efficency:
         Cache Access Total:             5292249
         Cache Hit Ratio:      96%       5110509        [Defined State for buffer]
         Cache Miss Ratio:      3%       181740         [Undefined State for Buffer]
         REAL Hit Ratio:       93%       4939551        [MRU/MFU Hits Only]

         Data Demand   Efficiency:    97%
         Data Prefetch Efficiency:    62%

        CACHE HITS BY CACHE LIST:
          Anon:                        1%        97516                  [ New Customer, First Cache Hit ]
          Most Recently Used:         14%        716979 (mru)           [ Return Customer ]
          Most Frequently Used:       82%        4222572 (mfu)          [ Frequent Customer ]
          Most Recently Used Ghost:    0%        11400 (mru_ghost)      [ Return Customer Evicted, Now Back ]
          Most Frequently Used Ghost:  1%        62042 (mfu_ghost)      [ Frequent Customer Evicted, Now Back ]
        CACHE HITS BY DATA TYPE:
          Demand Data:                52%        2706672
          Prefetch Data:               1%        95218
          Demand Metadata:            43%        2198797
          Prefetch Metadata:           2%        109822
        CACHE MISSES BY DATA TYPE:
          Demand Data:                31%        57731
          Prefetch Data:              31%        57053
          Demand Metadata:            33%        60038
          Prefetch Metadata:           3%        6918
———————————————

Test SAN is UP & running.. now hope it doesn’t fall again….

well, after the grueling hrs of all attempt proved futile (this is what happens when you don’t have the knowledge about the platform/OS/technology), googled for some solaris networking commands.

having collected some basic solaris network configuring commands(for time being, i don’t want to elaborate what happened) but the following commands gave life to the network which was killed or probably i don’t know what had happened.

**

ifconfig ntxn3 unplumb

ifconfig ntxn3 plumb

ifconfig ntxn3 x.x.x.x netmask 255.255.255.0

route add default x.x.x.x

**

ntxn3 is the network port identified by #nexentastor (opensolaris)

all i was worried about was about the pool data.. the pool is healthy and the data intact… though it is test environment and the data also was test data, still generating the same environment when something goes wrong in test environment is cumbersome (part of IT  tech field).. non-technical persons/managers thinks that its vegetable business….. when its rots throw away, but in IT it has to be fixed.. as long as it is recoverable.

date change on nexentastor rendered it unreachable..

Am now testing NexentaStor on HP ML370 G6 tower, here are the specs, Dual Socket, Quad Core, HT enabled Intel(R) Xeon(R) CPU E5540 @ 2.53GHz 8 Gig ram (4g populated for each proc) L3 cache 8 mb 2 x 73 gb SAS (15k) RAID-1= System OS 6 x 300 gb SAS (10k) RAID-0 single disk in each array to expose the disk for NexentaStor = Storage (RAID 10 = approx 838 gb pool) M410i raid card 256 mb Quad port Multifunction 1GbE card (identified as ntxn0..3 in Solaris (NexentaStor))

NexentaStor 3.0.4 the installation went fine and i even created a volume of 200 gb NFS attached it to the Test Server (XenServer 5.6) copied existing vm from XS local Storage to NFS Storage. configured smtp server for fault alerts.

everything was working fine.

after about 15+ hrs of uptime, i received the following fault alert email notification:

Subject: [NMS Report] NOTICE: host nssan

FAULT: **********************************************************************

FAULT: Appliance   : nssan (OS v3.0.4, NMS v3.0.4 (r8917))

FAULT: Machine SIG : 32HFABADA

FAULT: Primary MAC : 18:a9:5:6e:a1:db

FAULT: Time        : Thu Sep 30 00:00:34 2010

FAULT: Trigger     : runners-check

FAULT: Fault Type  : ALARM

FAULT: Fault ID    : 20

FAULT: Fault Count : 2

FAULT: Severity    : NOTICE

FAULT: Action      : Administrative action required to clear the original

FAULT:             : fault that has caused ‘nms-check’ to go into

FAULT:             : maintenance. Once cleared, run ‘setup trigger nms-check

FAULT:             : clear-faults’ to clear the faults and re-enable

FAULT:             : ‘nms-check’. If the problem does not appear to be an

FAULT:             : actual fault condition, use ‘setup trigger nms-check’ to

FAULT:             : tune-up the fault trigger’s properties. See NexentaStor

FAULT:             : User Guide at http://www.nexenta.com/docs for more

FAULT:             : information.

FAULT: Description : Runner nms-check went into maintenance state

FAULT: **********************************************************************

 

!

! For more detais on this trigger click on link below:

! http://x.x.x.x:2000/data/runners?selected_runner=runners-check

!

Runner nms-check (description: “Track NMS connectivity failures and internal errors”) went into maintenance state

Before, i could follow the suggestion in the alert i noticed (from the report) that the DATE on the server is old. So i changed the date from console in the recommended format (date -s “20 dec 2010 00:00:00:) then after pressing enter… the prompt took pretty long time to return. and the web console was not responding….

issue: i can ping the host ip locally from the console but not the gateway

1.suspecting the switch port, i connected laptop (windows xp) on the same port and tested it, there does not seems to be any issue with the switch port since i was able to ping the gateway.

2.suspecting the network card: rebooted the server with ubuntu 9.10 x64 live CD configured the network settings, here also there seems to be no issue with the ethernet card

some info: in linux the ethernet driver loaded is netxen_nic drivers, under nexentastor it ifconfig -a shows ntxn0 to ntxn3

the gruelling hrs of all attempt proved futile.. i dont have networking knowledge of solaris platform. linux wether its rpm or debian based distro am quite comfortable…

will update when its fixed…

SAN Machine ready

NexentaStor 3.0.4 on HP ML370 G6 tower,

here are the specs,

Dual Socket, Quad Core, HT enabled Intel(R) Xeon(R) CPU E5540 @ 2.53GHz 8 Gig ram (4g populated for each proc)

L3 cache 8 mb

2 x 73 gb SAS (15k) RAID-1= System OS

6 x 300 gb SAS (10k) RAID-0 single disk in each array to expose the disk for NexentaStor = Storage (RAID 10 = approx  838 gb pool)

M410i raid card 256 mb

Quad port Multifunction 1GbE card 

 

The above server turned into SAN storage is ready to be deployed on network to test & assess it features..

well, here is the server is already on network, configured the IP and here i go http://x.x.x.x:2000

the first screen shot …