-
Notifications
You must be signed in to change notification settings - Fork 7
Migration
Migration from Scholarsphere, version 3, to version 4.
Borrowing on our previous experience with migrating to Fedora 4, and other migrations, the basic strategy is:
- perform an inventory of works and collections in Scholarsphere 3
- migrate data by pushing from the Scholarsphere 3 application into Scholarsphere 4, most likely via an API
- check the inventory against the content in Scholarsphere 4 to verify all content has been transferred
Local testing can be done by running instances of both SS v.3 and SS v.4 locally and creating sample records in one to be migrated over to the other. Once the initial steps are complete, and migration functions in a local context, we can move into the QA environment and use duplicated content from production. The procedures developed in the QA environment will then be replicated in Scholarsphere production, with version 4 resources that will ultimately become the version 4 production instance.
Login to the production jobs server and update the scholarsphere-client.yml
file.
vim /scholarsphere/config_prod_new/scholarsphere/scholarsphere-client.yml
Verify the current default settings:
SHRINE_CACHE_PREFIX: "cache"
S3_ENDPOINT: "https://s3.amazonaws.com"
SS_CLIENT_SSL: "false"
SS4_ENDPOINT: "https://scholarsphere.k8s.libraries.psu.edu/api/v1"
From another terminal session, login to the current production version 4 deployment
KUBECONFIG=~/.kube/config-oidc-prod && export KUBECONFIG
kubens scholarsphere
kubectl exec -it deployment/scholarsphere -- /app/bin/vaultshell
From the pod's shell, start a Rails console session
bundle exec rails c
Print out the AWS environment variables that correspond to the configuration settings in the client yaml file:
ENV['AWS_REGION']
ENV['AWS_SECRET_ACCESS_KEY']
ENV['AWS_ACCESS_KEY_ID']
ENV['AWS_BUCKET']
Verify that the SS_CLIENT_KEY
in the yaml file matches the token in the version 4 application. If they do not match, update the application's token with the one from the yaml file. Do not change the client key in the yaml file.
To update the application's token, from the Rails console of the version 4 pod:
token = [paste client key from yaml file]
ExternalApp.find_by(name: 'Scholarsphere 4 Migration').api_tokens[0].update(token: token)
Write out any changes to scholarsphere-client.yml
. If the file has changed, you will need to restart Resque, if not, you can proceed to deployment.
To restart the Resque process on the jobs server, first verify there are no currently running jobs in any queue https://scholarsphere.psu.edu/admin/queues/overview
Exit out of any current Rails console sessions on the server, and get root access to restart the process:
sudo su -
systemctl restart resque
Verify the processes have successfully restarted:
ps -aux | grep resque
You should see multiple processes, one for each worker configured in every queue.
If needed, tag and deploy the latest code from the main branch of the Scholarsphere 4 repo by following Production Deployment. Use the SHA of the commit or use a 4.0.0.betaN
where N is any number greater than zero.
Run any tests, including test migrations, by following the steps below under Migration.
During normal business hours, tag and deploy version 4.0.0 to production according to Production Deployment.
Perform some minimal tests, including migrating a few sample records to ensure everything is operating correctly.
Now wait until 7 pm that evening
Open a new terminal window or tmux session and obtain a list of production pods:
KUBECONFIG=~/.kube/config-oidc-prod && export KUBECONFIG
kubens scholarsphere
kubectl get pods
Login to one of the two Rails pods:
kubectl exec -it deployment/scholarsphere -- /app/bin/vaultshell
Remove any database data from any previous tests. This will completely delete all data from the database
bundle exec rails c
DISABLE_DATABASE_ENVIRONMENT_CHECK=1 bundle exec rake db:schema:load
Start a new Rails console session:
bundle exec rails c
From the console, remove any Solr index data:
Blacklight.default_index.connection.delete_by_query('*:*')
Blacklight.default_index.connection.commit
Exit out both the Rails console and pod shell, returning to your local shell, and restart the application by deleting both the running Rails pods:
kubectl delete po -l app.kubernetes.io/name=scholarsphere
Verify that the pods have been restarted:
kubectl get pods
You should see two new pods in the list with a recent value for AGE
.
Login to one of the newly restarted Rails pods:
kubectl exec -it deployment/scholarsphere -- /app/bin/vaultshell
Create a new Rails console session:
bundle exec rails c
Leave the Rails session open.
Verify there are no currently running jobs in any queue https://scholarsphere.psu.edu/admin/queues/overview
Put Scholarsphere 3 into "read-only" mode: https://github.com/psu-stewardship/scholarsphere/wiki/Read-Only-Mode
Login to the psu-access terminal, and being a new tmux session to login to the Scholarpshere 3 production jobs server.
Start a new Rails console session on the jobs server:
sudo su - deploy
cd scholarsphere/current
bundle exec rails c production
Verify Scholarsphere client configration:
ENV['S3_ENDPOINT']
ENV['AWS_REGION']
ENV['SS4_ENDPOINT']
ENV['AWS_SECRET_ACCESS_KEY']
ENV['AWS_ACCESS_KEY_ID']
ENV['AWS_BUCKET']
Take note of the client key:
ENV['SS_CLIENT_KEY']
Return to the Rails console for the version 4 instance
Create an external application record using the token you copied from the version 3 console:
token = [paste token here]
ExternalApp.find_or_create_by(name: 'Scholarsphere 4 Migration') do |app|
app.api_tokens.build(token: token)
app.contact_email = '[email protected]'
end
Return to the Rails console for the version 3 instance, and update the list of works and collections to be migrated:
ActiveFedora::SolrService.query('{!terms f=has_model_ssim}GenericWork,Collection', rows: 20_000, fl: ['id', 'has_model_ssim']).map do |hit|
Scholarsphere::Migration::Resource.find_or_create_by(pid: hit.id, model: hit.model)
end
Clear out the results from any previous runs:
Scholarsphere::Migration::Resource.update_all(client_status: nil, client_message: nil, exception: nil, started_at: nil, completed_at: nil)
Queue up the jobs, with works first, then collections:
Scholarsphere::Migration::Resource.where(model: 'GenericWork').map do |resource|
Scholarsphere::Migration::Job.perform_later(resource)
end
Scholarsphere::Migration::Resource.where(model: 'Collection').map do |resource|
Scholarsphere::Migration::Job.perform_later(resource)
end
Open a web page and visit the version 4 url, but don't login. You should begin to see work being migrated.
Visit https://scholarsphere.k8s.libraries.psu.edu/dashboard/profile/edit and make sure admin mode is enabled.
Visit https://scholarsphere.k8s.libraries.psu.edu/admin/queues/overview and verify the jobs are queued and are completing successfully.
At this point, you can stop and wait for the migration to finish, which will take a couple of hours.
Timing: From 17 Nov. at 7 am until TBD
Change DEFAULT_URL_HOST
to scholarsphere.psu.edu
during the migration we use the name scholarsphere.k8s.libraries.psu.edu
vault kv patch secret/app/scholarsphere/prod DEFAULT_URL_HOST=scholarsphere.psu.edu
Delete the pods one at a time to pick up the new config
kubectl get pods -l app.kubernetes.io/name=scholarsphere
kubectl delete pod {{ hash}}
# wait for it to become healthy
kubectl delete pod {{ hash 2 }}
TODO after Thurs Nov 5, change the sas records to libraries.psu.edu records that we control. keep one for legacy, and one for scholarsphere 4
- update DNS for scholarsphere.psu.edu from:
scholarsphere-lb.libraries.psu.edu IN A 146.186.106.173
to:
scholarsphere-lb.libraries.psu.edu IN CNAME ingress-prod.vmhost.psu.edu
See https://github.com/psu-stewardship/scholarsphere-4/issues/670
From the v.3 source application's console:
> Scholarsphere::Migration::Statistics.call
Copy the resulting csv file to the v.4 instance and run
bundle exec rake migration:statistics[statistics.csv]