Changes to the manual

larsvilhuber · Jun 30, 2024 · c89e7c4 · c89e7c4
1 parent 6a2485d
commit c89e7c4
Show file tree

Hide file tree

Showing 6 changed files with 192 additions and 176 deletions.
diff --git a/00-targets.md b/00-targets.md
@@ -5,9 +5,9 @@ We want to check
 ::: {.incremental}
 
 - that your code runs without problem, after all the debugging.
-- that your code runs without manual intervention.
+- that it actually produces all the outputs
+- that your code runs without manual intervention, and with low effort.
 - that your code generates a log file that you can inspect, and that you could share with others.
 - that it will run on somebody else's computer
-- that it actually produces all the outputs
 
 :::
diff --git a/03-automatically_saving_figures.md b/03-automatically_saving_figures.md
@@ -4,9 +4,9 @@
 Say you have 53 figures and 23 tables, the latter created from 161 different specifications. That makes for a lot of work when re-running the code, if you haven't automated the saving of said figures and tables. 
 
 
-```{warning}
+:::{warning}
 We have seen instructions that tell the replicator to right-click and save the figures. While there is no substitute for comparing all these figures, that's too much work!
-```
+:::
 
 ## TL;DR
 
@@ -71,8 +71,16 @@ ggsave(ggp,file.path(figures,"figure1.png"))
 
 :::{tab-item} Python
 
+There are many ways to do this in Python, which is often geared towards "head-less" processing. Even if using Jupyter notebooks, you should save the figures!
+
 ```python
-Need example
+import matplotlib.pyplot as plt
+import os
+
+plt.plot([1, 2, 3], [1, 4, 9])
+plt.show()
+plt.savefig(os.path.join("figures",'foo.png'))
+plt.savefig(os.path.join("figures",'foo.pdf'))
 ```
 
 

diff --git a/04-01-creating_log_files_manually.md b/04-01-creating_log_files_manually.md
@@ -0,0 +1,69 @@
+(creating-log-files-explicitly)=
+# Creating log files explicitly
+
+We start by describing how to explicitly generate log files as part of the statistical processing code.
+
+::::{tab-set}
+
+
+:::{tab-item} Stata
+
+```stata
+global logdir "${rootdir}/logs"
+cap mkdir "$logdir"
+local c_date = c(current_date)
+local cdate = subinstr("`c_date'", " ", "_", .)
+local c_time = c(current_time)
+local ctime = subinstr("`c_time'", ":", "_", .)
+local globallog = "$logdir/logfile_`cdate'-`ctime'-`c(username)'.log"
+log using "`globallog'", name(global) replace text
+```
+
+How to potentially do this automatically at each start, see [Stata manual](https://www.stata.com/manuals/gswb.pdf#gswB.3).
+
+:::
+
+:::{tab-item} R
+
+```R
+# This will only log output ("stdout") and warnings/messages ("stderr"), but not the commands themselves!
+
+logfile.name <- paste0("logfile_", Sys.Date(),"-",format(as.POSIXct(Sys.time()), format = "%H_%M"),"-",Sys.info()["user"], ".log")
+globallog    <- file(file.path(rootdir,logfile.name), open = "wt")
+# Send output to logfile
+sink(globallog, split=TRUE)
+sink(globallog, type = "message")
+
+## revert output back to the console 
+sink(type = "message")
+sink()
+close(globallog)
+```
+
+:::
+
+:::{tab-item} MATLAB
+
+```matlab
+% The "diary" function should achieve this. Not a MATLAB expert!
+```
+:::
+
+:::{tab-item} Python
+
+```python
+% The logging module should achieve this.
+import logging
+logging.warning('Watch out!')
+```
+will output
+
+```
+WARNING:root:Watch out!
+```
+
+:::
+
+::::
+
+While some software (Stata, MATLAB) will create log files that contain commands and output, others (R, Python) will  create log files that contain only output. 
diff --git a/04-02-creating_log_files_automatically.md b/04-02-creating_log_files_automatically.md
@@ -0,0 +1,104 @@
+(creating-log-files-automatically)=
+# Creating log files automatically
+
+An alternative (or complement) to creating log files explicitly is to use native functionality of the software to create them. This usually is triggered when using the **command line** to run the software, and thus may be considered an **advanced topic.** The examples below are for Linux/macOS, but similar functionality exists for Windows.
+
+
+:::::{tab-set}
+
+
+::::{tab-item} Stata
+
+To automatically create a log file, run Stata from the command line with the `-b` option:
+
+```bash
+stata -b do main.do
+```
+
+which will create a file `main.log` in the same directory as `main.do`. 
+
+:::{warning}
+For this to work, the filename cannot include spaces.
+:::
+
+On Windows, follow instructions [here](https://www.stata.com/manuals/gswb.pdf#gswB.5).
+
+::::
+
+::::{tab-item} R
+
+To automatically create a log file, run R from the command line as follows:
+
+```bash
+R CMD BATCH main.R
+```
+
+will create a file `main.Rout` in the same directory as `main.R`. 
+
+:::{warning}
+If there are other commands, such as `sink()`, active in the R code, the `main.Rout` file will not contain some output.
+:::
+
+::::
+
+::::{tab-item} MATLAB
+
+To automatically create a log file, run MATLAB from the command line as follows:
+
+```bash
+matlab -nodisplay -r "addpath(genpath('.')); main" -logfile matlab.log
+```
+
+A similar command on Windows would be:
+
+```bash
+start matlab -nosplash  -minimize -r  "addpath(genpath('.'));main"  -logfile matlab.log
+```
+
+::::
+
+::::{tab-item} Julia, Python
+
+In order to capture screen output in Julia and Python, on Unix-like system (Linux, macOS), the following can be run:
+
+```bash
+julia main.jl | tee main.log
+```
+
+or 
+
+```bash
+python main.py | tee main.log
+```
+
+which will create a log file with everything that would normally appear on the console using the `tee` command. 
+
+::::
+
+:::::
+
+## Takeaways
+
+### What this does
+
+This ensures
+
+- that your code runs without problem, after all the debugging.
+- that your code runs without manual intervention.
+- that your code generates a log file that you can inspect, and that you could share with others.
+
+### What this does not do
+
+This does not ensure
+
+- that it will run on somebody else's computer
+  - because it does not guarantee that all the software is there
+  - because it does not guarantee that all the directories for input or output are there
+  - because many intermediate files might be present that are not in the replication package
+  - because it does not guarantee that all the directory names are correctly adjusted everywhere in your code
+- that it actually produces all the outputs
+  - because some outputs might be present from test runs
+
+### What to do next
+
+To solve some of these problems, let's go to the next step.