Documentation for the Teacher services application developers
NOTE: For additional info about infrastructure see Sharepoint: Teacher services infrastructure
- There is an assumption that you have been given a CIP account. For BYOD users, please make sure to request a digitalauth account.
- It is recommended to set up 2FA with the Microsoft authenticator app as it provides useful notifications. Also, set up a backup 2FA using your phone for text messages.
- BYOD users will have 2 2FA challenges, one for digital, one for DFE Platform identity
- The technical lead of your team will then add you to the AD group of your area. For example if you work on a BAT service, you will be added to "s189 BAT delivery team". You will now be able to:
- Access (read-only) the s189 subscriptions in the Azure portal
- Access (read-write) to your test Kubernetes namespaces and Azure resource groups in the test subscription
- Elevate your permissions via PIM and access (read-write) temporarily the production Kubernetes namespaces and Azure resource groups
- Approve other developers' PIM requests
- BYOD should try to identify which 2FA is failing (digital or DfE Platform identity), as the team needs to know which one to reset
- Raise a Service now CIP request and choose Azure Cloud General Request
Microsoft Entra Privileged Identity Management (PIM) allows gaining temporary (up to 8h) user permissions to access production resources. This is sometimes required to access the Kubernetes cluster and troubleshoot the application or database.
- Use PIM for groups to elevate your access. You should see the PIM group of your area. For example if you work on a BAT service, you should see: "s189 BAT production PIM".
- Click "Activate", select the time and give a brief justification, which is important to gain approval and audit purpose.
- The other members of the team will receive an email with a link to PIM so they can review and approve your request.
- After a few minutes, your access will be active. It may require login out and in again.
- When Access the Azure portal make sure you switch to your digitalauth account and switch directory to DfE Platform Identity.
The infra team maintains several AKS clusters. Two are usable by developers to deploy their services:
Used for all your non-production environments: review, development, qa, staging...
- Name:
s189t01-tsc-test-aks
- Resource group:
s189t01-tsc-ts-rg
- Subscription:
s189-teacher-services-cloud-test
Used for all your production and production-like environments, especially if they contain production data: production, pre-production, production-data...
- Name:
s189p01-tsc-production-aks
- Resource group:
s189p01-tsc-pd-rg
- Subscription:
s189-teacher-services-cloud-production
- If not present in your repository, set up the
get-cluster-credentials
make command from the template Makefile. - If the environment is production, raise a PIM request
- Login to azure command line using
az login
oraz login --use-device-code
- Run
make <environment> get-cluster-credentials
- This configures the
kubectl
context so you can run commands against this cluster - NOTE: If problems running
az login
make sure you have accessed the Azure Portal as above first. Also runaz logout
before runningaz login
.
Namespaces are a way to logically partition and isolate resources within a Kubernetes cluster. Each namespace has its own set of isolated resources like pods, services, deployments etc. By default, a Kubernetes cluster will have a few initial namespaces created like "default", "kube-system", "kube-public" etc. We have created specific namespaces per area, such as "BAT" or "TRA". For instance, you will see:
- tra-development and tra-staging on the test cluster
- tra-production on the production cluster
Here is the full list of namespaces in the test cluster and in the production cluster.
kubectl commands run in a particular namespace using -n <namespace>
.
First get access to the desired cluster. Then you can run commands using kubectl against different kubernetes resources.
Allows you to specify the desired state of your application. It allows you to deploy multiple pods and services and manage them as a single entity. It also allows you to do rolling updates and rollbacks.
Examples kubectl deployment usage:
- List deployments in a namespace:
kubectl -n <namespace> get deployments
- Get configuration and status:
kubectl -n <namespace> describe deployment <deployment-name>
- Scale deployment horizontally:
kubectl -n <namespace> scale deployment <deployment-name> --replicas=3
Each deployment runs 1 or more instances of the application to scale horizontally. Each one runs in a pod, which is ephemeral and can be deleted or recreated at any time. Deployments provide a way to keep pods running and provide a way to update them when needed.
Examples kubectl pod usage:
- List pods in a namespace:
kubectl -n <namespace> get pods
- Get pod configuration and status:
kubectl -n <namespace> describe pod <pod-name>
- Get pod logs:
kubectl -n <namespace> logs <pod-name>
- Get logs from the first pod in the deployment:
kubectl -n <namespace> logs deployment/<deployment-name>
- Stream logs from all pods in the deployment:
kubectl -n <namespace> logs -l app=<deployment-name> -f
- Display CPU and memory usage:
kubectl -n <namespace> top pods
- Execute a command inside a pod:
kubectl -n <namespace> exec <pod-name> -- <command>
- Execute a command inside the first pod in the deployment:
kubectl -n <namespace> exec deployment/<deployment-name> -- <command>
- Open an interactive shell inside a pod:
kubectl -n <namespace> exec -ti <pod-name> -- sh
All HTTP requests enter the cluster via the ingress controller. Then it sends them to the relevant pods. We can observe the HTTP traffic to a particular deployment.
- Deployment filter:
<namespace>-<deployment-name>-80
e.g.bat-qa-register-qa-80
- Stream logs from all ingress controllers and filter on the deployment:
kubectl logs -l app.kubernetes.io/name=ingress-nginx -f --max-log-requests 20 | grep <deployment-filter>
Application logs are sent to Logit.io. They are ingested, decoded and stored. Users can then query the logs, create dashboards, create alerts...
Request access using the Logit Service Now form. Choose Teacher services UK organisation, and all stacks.
There are 2 important stacks: one for the test cluster, and one for the production cluster. Choose the one corresponding to your application environment and click LAUNCH LOGS. This opens Kibana.
- Filter by
kubernetes.deployment.name
to select your application logs - The HTTP logs are in the ingress-nginx-controller deployment name. Filter again by
ingress.proxy.upstream.name
to select your web application HTTP logs.
You can add more filters, use the full DQL language, refine the time range, select fields to display... Explore. And you can create visualizations and dashboards.
Application logs should be sent as json, as they are decoded automatically so that individual fields can be used in filters and queries. All app fields are stored under the app
field.
- Developers should pay attention not to send any personal data. It must be removed or obfuscated.
- The stacks are shared for all services. It is important not to send too many logs as it may exceed our quota
- In case of issue, contact the infra team who can add temporary fixes using Logstash
The main monitoring tools used are Grafana and Alertmanager. For further reading about Monitoring setup in the cluster click here.
Grafana could be accessed via the respective URLs based on the environment of interest. The URLs corresponding to each environment as below:
- Test | https://grafana.test.teacherservices.cloud
- Production | https://grafana.teacherservices.cloud/
The default access to the grafana interface is view only, which does not require authentication. In order to be able to make changes for example adding more dashboards and editing existing dashboards, requests will have to be made in the #teacher-services-infra slack channel to obtain admin credentials.
Grafana allows you to export your dashboard as a JSON file, which can be version controlled and shared with others. This could be achieved by following these steps:
- Open your dashboard in Grafana
- Click on the "Share" button(icon) in the top left corner
- In the "Export" tab, select "Export for sharing externally"
- Click "Save to file" to download the JSON file of your dashboard
The following steps are required for creating or editing dashboards. Please click for more extensive details
- Ensure you are logged in as an admin
- Identify the purpose of your dashboard. What insights the dashboard will provide and what messages it conveys
- Plan and Design how the dashboard would look when completed, paying attention to the placement of panels, alignment, spacing, colour and organisation
- Select the Appropriate Data Sources by identifying the right datasource to visualise in the dashboard (currently prometheus is the only datasource available to select )
- Click on the "Explore" view, select the datasource(prometheus) and then browse and search using the "Metric" dropdown
- Create Panels for each metric by adding panel for the metric and choosing the right visualisation (for example graph, gauge, table, heatmap) and configure the panel settings eg the query, data transformation and display options and add concise title for clarity
- Any changes made to the dashboard on the UI will be overwritten in the next deployment unless added to the codebase and a pull request made to merge it
- In order to ensure that the new dashboard created is permanent and not deleted by subsequent deployment, add a JSON file to the dashboards directory here by pasting the content of the json file exported from the dashboard and then make an entry to grafana_dashboards kubernetes_config_map resource in grafana.tf file and raise a PR in order to merge the change
- Log in to Grafana as admin
- Navigate to the Dashboard Import Page and click the "+" icon in the left sidebar to open the dashboard menu, select "Import" from the dropdown menu to access the dashboard import page.
- Import the JSON File by either clicking on "Upload JSON file" and selecting the json file from your computer or pasting the json file content into the text area provided
- Click on the "Import" button to initiate the dashboard import process
The alertmanager urls corresponding to the various environments are
- Test | https://alertmanager.test.teacherservices.cloud/
- Production | https://alertmanager.teacherservices.cloud/
Authentication details are usually required but this is stored in the keyvault. Please ask in the #teacher-services-infra channel for more details.