- Re-running takes time
- What if a file is no longer there?
- What if we need packages
- Only download if necessary
- file is missing
- file has changed
Download only if we want to download
global redownload 0
if $redownload == 1 {
// what to do when file does NOT exist
copy "https://datahub.io/core/country-codes/r/country-codes.csv" "data/raw/country-codes.csv", replace
}
Automatically download the file again if not there.
global redownload 0
capture confirm file "data/raw/country-codes.csv"
if _rc != 0 {
global redownload 1
}
if $redownload == 1 {
...
What if the file has changed?
if $redownload == 1 {
copy "https://datahub.io/core/country-codes/r/country-codes.csv" "data/raw/country-codes.csv", replace
// create checksum of file
// Aug 2023 version: 2295658388
global countrycksum 2295658388
checksum "data/raw/country-codes.csv", save
assert $countrycksum == r(checksum)
// This will fail if the files are not identical
// Provide a verbose message if we get past this point
disp in green "Country codes file downloaded successfully"
}
Be informative!
...
}
else {
// what to do when file does exist
disp in green "Country codes file already exists"
}
Step 3 (with robust download code) Stata 1
- “Change the parameter to 0.2, then run the code again”
- "Compute the percentages for Table 2 by hand"
- Use functions, ado files, programs, macros, subroutines
- Use loops, parameters, parameter files to call those subroutines
- Use placeholders (globals, macros, libnames, etc.) for common locations ($CONFDATA, $TABLES, $CODE)
- Compute all numbers in package
- No manual calculation of numbers
// Header of main.do
// Define which steps should be run
global step1 1
global step2 1
do "code/00_setup.do"
if $step1 == 1 do "code/01_download_data.do"
if $step2 == 1 do "code/02_create_analysis_sample.do"
if $step3 == 1 do "code/03_analysis.do"
Here we always run the 00_setup.do
file.
// Header of main.do
// Define which steps should be run
global step1 1
global step2 1
do "code/00_setup.do"
if $step1 == 1 do "code/01_download_data.do"
if $step2 == 2 do "code/02_create_analysis_sample.do"
if $step3 == 3 do "code/03_analysis.do"
Then conditionally run the other pieces:
// Header of main.do
// Define which steps should be run
global step1 1
global step2 1
do "code/00_setup.do"
if $step1 == 1 do "code/01_download_data.do"
if $step2 == 1 do "code/02_create_analysis_sample.do"
if $step3 == 1 do "code/03_analysis.do"
- Let's use a separate
config.do
file to contain configuration parameters
// file locations
// code to set rootdir omitted
global inputdata "$rootdir/data/inputs"
global tempdata "$rootdir/temporary"
global outputs "$rootdir/tables-figures"
// ensure they are created
cap mkdir "$tempdata"
cap mkdir "$outputs"
So let's automate some of this:
include "config.do"
// define steps
global step1 1
global step2 1
// Nothing needs to be changed here
do "$rootdir/code/00_setup.do"
if $step1 == 1 do "$rootdir/code/01_download_data.do"
if $step2 == 1 do "$rootdir/code/02_create_analysis_sample.do"
if $step3 == 1 do "$rootdir/code/03_analysis.do"
include "config.do"
// define steps
global step1 1
global step2 1
// Nothing needs to be changed here
do "$rootdir/code/00_setup.do"
if $step1 == 1 do "$rootdir/code/01_download_data.do"
if $step2 == 1 do "$rootdir/code/02_create_analysis_sample.do"
if $step3 == 1 do "$rootdir/code/03_analysis.do"
Configure the steps on certain conditions:
// define steps
global step1 1
global step2 1
// verify if file has changed
qui checksum "$resultfile1"
// if not, don't run Step 2
if `r(checksum)' == $checksum1 global step2 0
// Nothing needs to be changed here
do "$rootdir/code/00_setup.do"
if $step1 == 1 do "$rootdir/code/01_download_data.do"
if $step2 == 1 do "$rootdir/code/02_create_analysis_sample.do"
if $step3 == 1 do "$rootdir/code/03_analysis.do"
and config.do
contains additional information:
// file locations
// code to set rootdir omitted
global inputdata "$rootdir/data/inputs"
global tempdata "$rootdir/temporary"
global outputs "$rootdir/tables-figures"
// ensure they are created
cap mkdir "$tempdata"
cap mkdir "$outputs"
// some key parameters
global resultfile1 "$outputs/table1.tex"
global checksum1 386698503
Consider a final test if everything runs:
- delete
temporary/
andtables-figures/
folders. - might even delete the downloaded files
- then run the
main.do
file again.