Skip to content

INSTANCES: versions, REST hosts, clusters, DataBases

Stefano Belforte edited this page Jan 19, 2022 · 12 revisions

Summary

  • CRAB is a DataBase-centric service, various processes communicate via an Oracle DB and a File Cache service
  • We call SERVICE_INSTANCE one "CRAB service" i.e. a set or REST, DB, FileCache
  • The REST server offers access to the Oracle DB and points to the FileCache server in use for this SERVICE_INSTANCE
  • Multiple REST servers run behind a CMSWEB FrontEnd to provide load balance and high availability behind a common DNS alias which indicates one specific CMSWEB Cluster.
  • CRABClient, TaskWorker and Publisher are client of this SERVICE_INSTANCE, they need to use the same instance in order to have TW and Publisher process tasks submitted by that client. For this the relevant configuration parameters are:
Service configuration parameter
CRABClient config.General.instance
TaskWorker config.TaskWorker.instance
Publisher config.General.instance
  • CRABClient, TaskWorker and Publisher will allow to indicate the REST endpoint to use to access CRAB data base as one of a small set of predefined SERVICE_INSTANCES defined in CRABServer/src/python/ServerUtilities.py
instance REST host DB instance
prod cmsweb.cern.ch prod
preprod cmsweb-testbed.cern.ch preprod
dev cmsweb-test2.cern.ch dev
other none none

When SERVICE_INSTANCE is set to other the configuration must have one pair of strings indicating the REST host fqdn and the DB instance [prod|preprod|dev], this allows full flexibility in connecting pieces and moving server instances around. The parameters to use when picking instance='other' are

Service REST host name database name
CRABClient config.General.restHost config.General.dbInstance
TaskWorker config.TaskWorker.restHost config.TaskWorker.dbInstance
Publisher config.General.restHost config.General.dbInstance

NOTE CRABClient users do not need to specify the CRAB service instance, in which case it defaults to "prod"

  • instance is instead a mandatory parameter for TaskWorker and Publisher

Concepts and definitions

  • CRABServer service is a REST interface to CRAB Oracle Data Base
  • A given REST service is usually a DNS alias for a set of actual hosts which implement load balance and HighAvailability. The external users only see the DNS alias which will be called restHost.
  • CRABServer runs inside CMSWEB framework, so it is part of a given CMSWEB cluster
    • Numerous CMSWEB clusters exist
      • cmsweb.cern.ch aka main production one
      • cmsweb-testbed.cern.ch aka testbed
      • cmsweb-k8s-testbed.cern.ch supposedly identical to cmsweb-testbed
      • cmsweb-test.cern.ch K8s developemnt (Valentin's playground)
      • cmsweb-test[1-6].cern.ch test (developers') clusters for application developers
        • cmsweb-test2.cern.ch is reserved for CRAB usage
      • private VM's like stefanovm.cern.ch or stefanovm2
  • One CMSWEB cluster is the interface to a particular service instance of CRAB. I.e. a full set of services which make it possible to submit, track, execute, bookkeep, one CRAB task.
  • REST server contains and serves informations about itself, the CRAB File Cache server, the HTCondor pool to use for submissions, who will be allowed to use credentials uploaded by users to myproxdy etc. Many such informations can change frequently and are thus stored in a remote, web-accessible, file divided in sections, one for each cluster.
  • Oracle Data Base has several instances, meaning "different data bases"
    • Production on CMS Production Oracle cluster cmsr
    • Preprod on devdb11 username: cmsweb_analysis_preprod
    • Dev on devdb11 username: cmsweb_analysis_dev
    • private like Stefano's or Diego's private DB's on devdb11
  • while Oracle DBA's usually refer to cmsr or devdb11 as instances (again)
  • CRABClient allows to submit to a given 'CRAB instance' which means a given Data Base instance: global (i.e. production) or preprod or dev etc.

History and evolving requirements

  • CRAB was developed at a time when it was easy to get multipl DB instances, but almost inconceivable to have more than two cmsweb clusters (cmsweb.cern.ch and cmsweb-testbed.cern.ch) therefore
    • one CRABServer REST instance is capable to connect to multiple DataBases, i.e. support multiple DB instances
  • So the DataBase instance (prod/preprod/dev) could not be part of the CRABServer Rest configuration, but it was specified as something that the client (clients of the CRBServer REST are CRABClient and CRABTaskWorker) indicates in the URL (API) used. Which is constructed as hostname/crabserver/dbinstance/API
  • in the initial design the view was: the CRABClient (i.e. who submits) is only interested in deciding if to submit to the production or preproduction DataBase (or some private test instance) so the CRABClient configuration file accepts the parameter config.General.instance and "CRAB" would figure out everything
  • in the migration to K8s we have multiple several cmsweb clusters, i.e. multiple REST instances which may e.g. all connect to the same DB instance and want to be able to connect explicitly to one or another such clusters in order to test specific REST instances.

Implementation details and configurations as of April 2020

CRABCache i.e. FileCache

CrabClient and TaskWorker communicate via the Oracle CRAB DataBase (so need a REST hostname and a DB instance name), but the CRABClient also needs to upload a sandbox to be used in job submission. FOr this it uses a dedicated CRABCache service, which also has a REST interface. The URL to be used for the CRABCache file is obtained by querying the CRABServer REST. In other words, each CRABServer REST instance knows which CRABCache service should be used and communicates this to both CRAB Client and CRAB TaskWorker via the query

https://<restHost>/crabserver/<dbInstance>/info?subresource=backendurls

e.g.

https://cmsweb.cern.ch/crabserver/prod/info?subresource=backendurls

CRABServer

  • each CRABServer host has a configuration file /data/srv/current/config/crabserver/config.py which among other things has a way to indicate which "service cluster" this crabserver process will be part of (remember, there are multiple processes running on multiple hosts in the same service cluster), via the parameter data.mode which points to one particular section of the data.extconfigurl file where informations are kept about the CRABCache service to be used, ASO config, and HTCondor resources to be used.
  • the possible DataBase instances it can connect to are specified via the file /data/srv/current/auth/crabserver/CRABServerAuth.py which is not part of CRAB source code in this github repository but in principle is written ad-hoc for every machine where CRABServer is installed (see https://twiki.cern.ch/twiki/bin/view/CMSPublic/CMSCrabRESTInterface#Authentication_with_CERN_Oracle ). E.g. the CRAB REST production instance in cmsweb.cern.ch uses this (passwords have been removed)
import cx_Oracle as DB
import socket
fqdn = socket.getfqdn().lower()
dbconfig = {'preprod': {'.title': 'Pre-production',
                        '.order': 1,
                        '*': {'clientid': 'cmsweb-preprod@%s' %(fqdn),
                              'dsn': 'devdb11',
                              'liveness': 'select sysdate from dual',
                              'password': '*****' ,
                              'schema': 'cmsweb_analysis_preprod',
                              'timeout': 300,
                              'trace': True,
                              'type': DB,
                              'user': 'cmsweb_analysis_preprod'}},
            'prod': {'.title': 'Production',
                     '.order': 0,
                     'GET': {'clientid': 'cmsweb-prod-r@%s' %(fqdn),
                             'dsn': 'cmsr',
                             'liveness': 'select sysdate from dual',
                             'password': '*****',
                             'schema': 'cms_analysis_reqmgr_r',
                             'timeout': 300,
                             'trace': True,
                             'type': DB,
                             'user': 'cms_analysis_reqmgr_r'},
                     '*':  {'clientid': 'cmsweb-prod-w@%s' %(fqdn),
                            'dsn': 'cmsr',
                            'liveness': 'select sysdate from dual',
                            'password': '******',
                            'schema': 'cms_analysis_reqmgr_w',
                            'timeout': 300,
                            'trace': True,
                            'type': DB,
                            'user': 'cms_analysis_reqmgr_w'}}}

since this CRABServerAuth.py file contains passwords, they are not kept in publicly available repositories.

  • the CRABServer REST API machinery detects the Data Base instance from the URL in the HTTP request and selects the appropriate Oracle connection instance.

CRABClient

  • CRABClient configuration file accepts the parameter config.General.instance which can also be passed as an option in the command line, and e.g. crab submit --help lists this option:
 --instance=INSTANCE   Running instance of CRAB service. Valid values are
                        ['test1', 'test3', 'test2', 'prod', 'preprod', 'test',
                        'k8s'].

where it is apparent how in January we added some K8s cluster overloading the parameter "instance" to indicate a particular REST instance instead of the DB instance.

  • this was justified since already config.General.instance was used to indicate a particular REST host in order to support submission to private developer VM's via thins like General.instance = 'stefanovm2.cern.ch'
  • this require that there is always a 1:1 mapping between Data Base instance and REST Host instance, so that the CRAB Client can figure out the two (needed to build the HTTP queries) from a single parameter.
  • the code which maps the General.instance parameter into a REST hostname and a DataBase instance is in https://github.com/dmwm/CRABClient/blob/301de634b1fe16bf11696d975133487cd0094d37/src/python/CRABClient/ClientUtilities.py#L195

CRABTaskWorker

As an user of CRAB DataBase each TaskWorker instance need to identify one REST host to talk to and the DB instance to use.

  • there is a set of pre-defined host/instance pair in the code, each TW instance can pick one of those via the configuration parameter config.TaskWorker.mode in the TaskWorkerConfig.py file. Relevant code is in MasterWorker.py where the value of this configuration parameter is called MODEURL:
MODEURL = {'cmsweb-dev': {'host': 'cmsweb-dev.cern.ch', 'instance':  'dev'},
           'cmsweb-test': {'host': 'cmsweb-test.cern.ch', 'instance': 'preprod'},
           'cmsweb-preprod': {'host': 'cmsweb-testbed.cern.ch', 'instance': 'preprod'},
           'cmsweb-prod': {'host': 'cmsweb.cern.ch', 'instance':  'prod'},
           'test' :{'host': None, 'instance': 'preprod'},
           'private': {'host': None, 'instance':  'dev'},
          }
  • if mode is set to 'test' or 'private', then the host name for the REST needs to be specified in the TaskWorkerConfig.py configuration file via the (badly named) parameter config.TaskWorker.resturl e.g.:
config.TaskWorker.resturl = 'stefanovm.cern.ch'

Changes to better manage current situation

1. CRAB Client

Modify CRAB Client so that the submitter can select REST host and Data Base instance independently

  • be backward compatible with pre-2020 use (it is OK to break compatibility for K8s clusters)
  • introduce instance='other' as a switch to allow specifying restHost and dbInstance
  • get rid of old instance='private' which in the end was a confusing way to allow a indicate restHost while forcing data base instance to dev

2. CRAB Task Worker

Should do like for CrabClient, while taking advantage that here we have freedome with configuration file. keep a smaller set of nicknames (MODEURLs) where both REST host and DB instance are hardcoded. Support also MODERUL='other' which incorporates old test/private in which case both instance and url must be specified :