-
Notifications
You must be signed in to change notification settings - Fork 7
Migration
Migration from Scholarsphere, version 3, to version 4.
Borrowing on our previous experience with migrating to Fedora 4, and other migrations, the basic strategy is:
- perform an inventory of works and collections in Scholarsphere 3
- migrate data by pushing from the Scholarsphere 3 application into Scholarsphere 4, most likely via an API
- check the inventory against the content in Scholarsphere 4 to verify all content has been transferred
Local testing can be done by running instances of both SS v.3 and SS v.4 locally and creating sample records in one to be migrated over to the other. Once the initial steps are complete, and migration functions in a local context, we can move into the QA environment and use duplicated content from production. The procedures developed in the QA environment will then be replicated in Scholarsphere production, with version 4 resources that will ultimately become the version 4 production instance.
Login to the production jobs server and update the scholarsphere-client.yml
file.
vim /scholarsphere/config_prod_new/scholarsphere/scholarsphere-client.yml
Verify the current default settings:
SHRINE_CACHE_PREFIX: "cache"
S3_ENDPOINT: "https://s3.amazonaws.com"
SS_CLIENT_SSL: "false"
SS4_ENDPOINT: "https://scholarsphere.k8s.libraries.psu.edu/api/v1"
From another terminal session, login to the current production version 4 deployment
KUBECONFIG=~/.kube/config-oidc-prod && export KUBECONFIG
kubens scholarsphere
kubectl exec -it deployment/scholarsphere -- /app/bin/vaultshell
From the pod's shell, start a Rails console session
bundle exec rails c
Print out the AWS environment variables that correspond to the configuration settings in the client yaml file:
ENV['AWS_REGION']
ENV['AWS_SECRET_ACCESS_KEY']
ENV['AWS_ACCESS_KEY_ID']
ENV['AWS_BUCKET']
Verify that the SS_CLIENT_KEY
in the yaml file matches the token in the version 4 application. If they do not match, update the application's token with the one from the yaml file. Do not change the client key in the yaml file.
To update the application's token, from the Rails console of the version 4 pod:
token = [paste client key from yaml file]
ExternalApp.find_by(name: 'Scholarsphere 4 Migration').api_tokens[0].update(token: token)
Write out any changes to scholarsphere-client.yml
. If the file has changed, you will need to restart Resque, if not, you can proceed to deployment.
To restart the Resque process on the jobs server, first verify there are no currently running jobs in any queue https://scholarsphere.psu.edu/admin/queues/overview
Exit out of any current Rails console sessions on the server, and get root access to restart the process:
sudo su -
systemctl restart resque
Verify the processes have successfully restarted:
ps -aux | grep resque
You should see multiple processes, one for each worker configured in every queue.
If needed, tag and deploy the latest code from the main branch of the Scholarsphere 4 repo by following Production Deployment. Use the SHA of the commit or use a 4.0.0.betaN
where N is any number greater than zero.
Run any tests, including test migrations, by following the steps below under Migration.
During normal business hours, tag and deploy version 4.0.0 to production according to Production Deployment.
Perform some minimal tests, including migrating a few sample records to ensure everything is operating correctly.
Now wait until 7 pm that evening
Open a new terminal window or tmux session and obtain a list of production pods:
KUBECONFIG=~/.kube/config-oidc-prod && export KUBECONFIG
kubens scholarsphere
kubectl get pods
Login to the Rails pod:
kubectl exec -it deployment/scholarsphere -- /app/bin/vaultshell
Remove any database data from any previous tests. This will completely delete all data from the database
DISABLE_DATABASE_ENVIRONMENT_CHECK=1 bundle exec rake db:schema:load
Re-seed the database with the required groups
bundle exec rake db:seed
Start a new Rails console session:
bundle exec rails c
Verify the groups were created
Group.all
From the console, remove any Solr index data:
Blacklight.default_index.connection.delete_by_query('*:*')
Blacklight.default_index.connection.commit
Verify there are no records
Blacklight.default_index.connection.get 'select', :params => {:q => '*:*'}
Exit out both the Rails console and pod shell, returning to your local shell, and restart the application by deleting both the running Rails pods:
kubectl delete po -l app.kubernetes.io/name=scholarsphere
Verify that the pods have been restarted:
kubectl get pods
You should see two new pods in the list with a recent value for AGE
.
Log back into the newly restarted Rails pod:
kubectl exec -it deployment/scholarsphere -- /app/bin/vaultshell
Create a new Rails console session:
bundle exec rails c
Leave the Rails session open.
- Login to Penn State's AWS portal: http://login.aws.psu.edu/
- Go to Storage > S3
- Select the
edu.psu.libraries.scholarsphere.prod
bucket with the radio button - Choose "Empty"
Make sure the empty operation completes before adding data into the bucket. Click into the bucket, make sure it's empty. click "show versions" and make sure versions are gone, too
Verify there are no currently running jobs in any queue https://scholarsphere.psu.edu/admin/queues/overview
Put Scholarsphere 3 into "read-only" mode: https://github.com/psu-stewardship/scholarsphere/wiki/Read-Only-Mode
Login to the psu-access terminal, and being a new tmux session to login to the Scholarpshere 3 production jobs server.
Check the swap status. If there isn't the full amount, reset it:
sudo su -
swapoff -a
swapon -a
Start a new Rails console session on the jobs server:
sudo su - deploy
cd scholarsphere/current
bundle exec rails c production
Verify Scholarsphere client configration:
ENV['S3_ENDPOINT']
ENV['AWS_REGION']
ENV['SS4_ENDPOINT']
ENV['AWS_SECRET_ACCESS_KEY']
ENV['AWS_ACCESS_KEY_ID']
ENV['AWS_BUCKET']
Take note of the client key:
ENV['SS_CLIENT_KEY']
Return to the Rails console for the version 4 instance
Create an external application record using the token you copied from the version 3 console:
token = [paste token here]
ExternalApp.find_or_create_by(name: 'Scholarsphere 4 Migration') do |app|
app.api_tokens.build(token: token)
app.contact_email = '[email protected]'
end
Return to the Rails console for the version 3 instance, and update the list of works and collections to be migrated. Note: File sets are also included in this list even though their files are migrated with the works. They are added as records to the database so that we can verify their checksums at a later date.
ActiveFedora::SolrService.query('{!terms f=has_model_ssim}GenericWork,Collection,FileSet', rows: 1_000_000, fl: ['id', 'has_model_ssim']).map do |hit|
Scholarsphere::Migration::Resource.find_or_create_by(pid: hit.id, model: hit.model)
end
Clear out the results from any previous runs:
Scholarsphere::Migration::Resource.update_all(client_status: nil, client_message: nil, exception: nil, started_at: nil, completed_at: nil)
Queue up the jobs, with works first, then collections:
Scholarsphere::Migration::Resource.where(model: 'GenericWork').map do |resource|
Scholarsphere::Migration::Job.perform_later(resource)
end
Scholarsphere::Migration::Resource.where(model: 'Collection').map do |resource|
Scholarsphere::Migration::Job.perform_later(resource)
end
Open a web page and visit the version 4 url, but don't login. You should begin to see works being migrated.
Verify the migration jobs are running on the version 3 instance https://scholarsphere.psu.edu/admin/queues/overview
Stop. Wait until the maintenance window the following day
Estimated time to complete the migration is 5 hours.
Verify the migration completed by checking the resque queue on the version 3 production jobs server. There should be no currently running jobs. Take note of any failures. We'll reprocess these later.
Login to the productions job server for version 3 and start a new rails console session. Alternatively, you can repoen the tmux session from last night.
Get a listing of the status responses from the version 4 client:
Scholarsphere::Migration::Resource.select(:client_status).distinct.map(&:client_status)
Look at the messages for any failed migrations, or where the client status is nil, or anything other than 200, 201, or 303.
Scholarsphere::Migration::Resource.where(client_status: nil).where.not(model: 'FileSet').map(&:exception)
Scholarsphere::Migration::Resource.where(client_status: 500).where.not(model: 'FileSet').map(&:message).uniq
Scholarsphere::Migration::Resource.where(client_status: 422).where.not(model: 'FileSet').map(&:message)
Nil status are usually RSolr errors, and these can simply be rerun:
Scholarsphere::Migration::Resource.where(client_status: 500).where.not(model: 'FileSet').map do |resource|
Scholarsphere::Migration::Job.perform_later(resource)
end
Any remaining nil statuses should be Ldp::Gone
errors and can be ignored.
Re-run the 500 errors as well, but this will not fix all of them:
Scholarsphere::Migration::Resource.where(client_status: 500).where.not(model: 'FileSet').map do |resource|
Scholarsphere::Migration::Job.perform_later(resource)
end
The remaining 500 errors should be ActionController::UrlGenerationError
and will need to be fixed in post-migration. As of our most recent test migration, there are 16 of these.
422 errors are usually all collections that don't have all the works they need. First, try rerunning the jobs:
Scholarsphere::Migration::Resource.where(client_status: 422).where.not(model: 'FileSet').map do |resource|
Scholarsphere::Migration::Job.perform_later(resource)
end
These should all be in the 500 errors above. You can generate a report for these:
Scholarsphere::Migration::Resource.where(client_status: 422).where.not(model: 'FileSet').map do |resource|
{ model: resource.model, pid: resource.pid, message: resource.message }
end
File.write('422-report.json', _.to_json)
Using the existing featured works in version 3, create a list of featured works from the version 4 console:
LegacyIdentifier.where(old_id: ['j3t945s668', '41n79h518r', '6dj52w505v']).map do |id|
FeaturedResource.create(resource_uuid: id.resource.uuid, resource: id.resource)
end
bundle exec rake solr:reindex_works
Update DNS
scholarsphere.psu.edu IN CNAME ingress-prod.vmhost.psu.edu
Change DEFAULT_URL_HOST
to the public-facing DNS name scholarsphere.psu.edu
vault kv patch secret/app/scholarsphere/prod DEFAULT_URL_HOST=scholarsphere.psu.edu
Perform a rolling restart to have the pods pickup the new config
kubectl rollout restart deployment/scholarsphere
kubectl rollout status deployment/scholarsphere
Verify https://scholarsphere.psu.edu is the new version 4.0.0 instance. Note: This process can take up to 5 minutes
On all three production VMS (web1, web2, jobs1), change service_instance
to be scholarsphere-3.libraries.psu.edu
vim /opt/heracles/deploy/scholarsphere/current/config/application.yml
Restart apache on the web VMs
sudo su -
systemctl restart httpd
Restart Resque on jobs
systemctl restart resque
Using Capistrano, we can clear out the crontabs on our version 3 hosts.
bundle exec cap prod whenever:clear_crontab
Send out appropriate emails and messages. TODO: More to come here.
At this point, 4.0.0 is officially released and is available to the public.
See https://github.com/psu-stewardship/scholarsphere-4/issues/670
From the v.3 source application's console:
Scholarsphere::Migration::Statistics.call
This will take several minutes. Afterwards, copy the file to your local account and gzip it:
cp statistics.csv /tmp
exit
cp /tmp/statistics.csv .
gzip statistics.csv
You should now have a file named statistics.csv.gz in your local account. Copy that to your laptop via scp, then copy to up to one of the scholarsphere pods:
kubectl get pods
kubectl cp ~/Downloads/statistics.csv.gz scholarsphere-xxxxxxxx-yyyyy:/app
Log into the pod:
kubectl exec -it scholarsphere-xxxxxxxx-yyyyy /bin/bash
Run the rake taks to import the statistics:
source /vault/secrets/config
gunzip statistics.csv.gz
bundle exec rake migration:statistics[statistics.csv]
This will enqueue several thousand jobs to update each statistic. You can follow along from the Sidekiq queue
Migrated files were verified using the etags that were calculated by Amazon's S3 service upon upload. These were either md5 checksums, or multipart checksums. The details for the verification can be found here