The 5GinFIRE Healthcheck service (HCS)
Introduction and Usage of the 5GinFIRE Healthcheck service
Introduction and Usage of the 5GinFIRE Healthcheck service (HCS)
The Healthcheck Service (HCS) is a simple service that displays the status of various services, VIMs, connectivity, components and processes of the 5GinFIRE infrastructure.
It is deployed at http://status.5ginfire.eu/
How it works
Components are polled (ACTIVE mode) or report their status (PASSIVE mode).
If the HCS fails to learn the status of a Component within a defined Threshold then the component is marked as FAIL/DOWN.
We have defined the following Component Types:
- SERVICE : a generic service, e.g. PORTAL,OSM
- PROCESS: an automated process that is successful or not. E.g. PING_PING_INSTANTIATION_TEST executed by jenkins every night
- VIM: a facility,e.g. BRISTOL
- CONNECTIVITY: a connection between components. eg PORTAL-OSM, OSM-ITAV, OSM-BRISTOL, etc
There are two modes ACTIVE and PASSIVE.
During active mode the HCS polls the Component via a URL. If the URL returns 200 OK then the Component is UP.
A constraint here is that the HCS must have a connectivity with the target component.
Example the HCS will poll the PORTAL component.
Component name = PORTAL
In this example the HCS will check regularly the URL https://5ginfire.portal.eu. If it fails to get a 200 OK will try again in a few minutes. If within 10 minutes (600 seconds) fails to get a 200 OK then the service will be marked as DOWN. An Issue will be raised automatically.
The PASSIVE mode can be used by components that cannot accept connections from the HCS but have internet connection (e.g. a VIM).
The component reports that is alive through a GET request to the HCS. The URL format is:
componentname is a Unique name of the component, e.g. PORTAL, OSM, BRISTOL
apikey is a unique string for each component. (apikeys will be given by the HCS admins)
Again if the service will not report its status within failoverThreshold the service will be marked as DOWN. An Issue will be raised automatically for this component via Bugzilla to respective owners of the Component if previously was UP.
Example 1: VIM is alive
We have defined that failoverThreshold for BRISTOL VIM is 10 minutes.
Then BRISTOL needs to report that it is alive every e.g. 5 minutes ( for example in a cron with a script ):
If BRISTOL fails to send the above request within 10 minutes an issue will be raised in Bugzilla for BRISTOL and will marked as down, if previously was UP.
Example 2: PORTAL-OSM link is alive
Portal needs to report that the link to OSM host machine is alive (note: not the OSM service).
Portal has a VPN connection with OSM at 5TONIC.
If PING is succesful from Portal to OSM host machine then we will report that the PORTAL-OSM link is UP.
We create a file : pingOSMlink.sh
ping -c1 10.4.16.15 if [ $? -eq 0 ] then curl http://status.5ginfire.eu/hcs/services/api/admin/components/PORTAL-OSM/12345678-1234-1234-e830-22e37fa6a000 exit 0 fi
then in crontab, to check every minute:
*/1 * * * * /home/ubuntu/pingOSMlink.sh
A Component can be in the following states:
- UP: is alive
- DOWN: is not alive
- PASS: used in PROCESS type. The process passes test within threshold
- FAIL: used in PROCESS type. The process failed test within threshold
Component description model
Each component has a defined model internally in HCS. For example, PORTAL Web:
"longitude": "21.234456", "name": "PORTAL", "locationName": "Patras,GR", "mode": "ACTIVE", "apikey": "*******", "checkURL": "https://5ginfire.portal.eu", "type": "SERVICE", "latitude": "38.220220", "onIssueNotificationProduct": "Platform", "onIssueNotificationComponent": "Portal", "description": "Portal Component", "failoverThreshold": 600
onIssueNotificationProduct: On ERROR the Product to report in Bugzilla
onIssueNotificationComponent On ERROR the Component to report in Bugzilla
failoverThreshold: In seconds
checkURL: The URL to call if MODE=ACTIVE
Registering to the 5GinFIRE HCS, providing Name and APIKEY
Admins of HCS will create the component and give a Unique Name and APIKEY. Will also agree the mode, the Bugzilla product, component, etc.
If you would like to include your component in HCS please raise an issue in Bugzilla, 5GinFIRE Operations/Operations Support (https://portal.5ginfire.eu/bugzilla/enter_bug.cgi?product=5GinFIRE Operations),
We will provide you with an APIKEY
HCS Historical data
All activity of components is stored in an elasticsearch cluster for up-time analysis, component reliability and historical data, as the next figures display:
You can download or contribute to https://github.com/5GinFIRE/eu.5ginfire.healthcheck