-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revision of Episode 2 (Using your GPU with CuPy) #105
Conversation
If plans work out as planned, I intend to provide all remarks on Episode 2 (Using your GPU with CuPy) in this dedicated branch. Additional sub-branches may concern (i) linguistic editing (ii) pedagogical suggestion. In case some suggestion involves both aspects of feedback, I will make choice based on judgement.
Semantic newlines means that one sentence takes one line only, hence all lines that contain sentences are one-liners. This makes it easy to isolate the changes the file undergoes. A few broken lines have been merged into one. More frequently, lines containing several sentences have been split at the full stop plus a whitespace. A side effect of this rule occurs with i.e. e.g. dev. std. vs. and suchlike. Especially troublesome is when numbers are followed by a dot and whitespace, like in array outputs. Sometimes a spurious blank line appeared. Mind also when you break a comment since the new line requires the # token at the start. I have applied the changes with the dialogue of the text editor, using regular expressions and then git diff, repeatedly until there were no side effects. No side effect means that the old version is one line and the new version is more lines, except when I blended broken lines; git diff shows the state of the play neatly. Sometimes I have changed a double into a single blank line (hopefully this is immaterial for styling). I have done this ahead of working on linguistic and/or pedagogic editing. This order should then be applied when editing other lesson episodes (in other branches), for consistency of method. Writing a sed or awk script to apply these changes correctly and consistently across different files is a TODO.
Being the developers from the Netherlands eScience CentER, I have applied a round of spellchecking in US English. This affected the body of the text as well as the name of variables (labelled > labeled). This very broad type of editing can be applied ahead of distinguishing linguistic and pedagogic issues.
This revision has aimed at improving the learninge experience by means of: (1) preserving content, barring a few sentences repeated within a short range, and some incidental observations that could still be inferred from the context; (2) merging, splitting and relocating sentences in order to provide a reading experience in consistent blocks, mostly ordered from general to specific notions, so that a topic is opened and closed within a narrow span of text; (3) changing the text at word level to reduce repetitions and/or convey more precision; (4) only comments have been edited in the code lines; editing the code lines is to be handled in a separate branch named E02-code. Although conciseness and brevity are different notions, the word count went from 4660 to 4568. INFORMATION CHANGES The revision deserves revision as far as meanings are concerned. Broadly speaking, editing was supposed to streamline the reading/learning experience and not to alter the content, which may have happened unintentionally. However, the following changes are deliberate and should be revised all the more: (1) the connection between CPU and GPU. Rather than with a cable. I propose they are more likely connected with some interconnect plug. I have add a Wikipedia link to interconnects and refrained to expand on hardware specifications too widely, for example mentioning NVLink and developments. (2) the status of cupyx as a namespace of CuPy. I even got somewhere that once upon a time cupyx used to be referred to as cupy.cupyx. At any rate, cupyx is documented inside Cupy. I have added a link to the official documentation. My new wording wanted to convey the relatedness of CuPy and cupyx and can be inaccurate, but also read (1) next. FUTURE Remarks not implemented but worth considering in the future are: (1) clarifying from the outset overlaps and distinctions between the terms API v libraries v packages v modules, and between methods v functions v routines. Then, stick to a preferred usage nor to confuse learners neither to reinforce confusions; (2) if not done in a previous episode, mentioning that we are only coding on Nvidia graphics cards because of the CUDA lock; (3) revising this episode's questions and objectives (at the top of the page) in a separate branch to be named E02-pedagogy. ONE-OFF NOTE ON LINGUISTIC EDITING As seen here, linguistic revision takes time and space. In terms of psychology of copy editing, it would be unsurprising if the authors of the original content feel uneasy about changes at first. I hope nobody takes umbrage at that, although this is a ritual moment of copy editing. Exposing the reasoning behind the changes is a handle to reframe the owner's view of a text and evaluate the new state of the play with a dispassionate eye. The risk of cutting corners is, naturally, to lose edge.
This continues commit db44928 with the following changes: (1) explain that the variable `deltas` refers to Dirac delta functions; (2) add warning about multiple meaning of word kernel, and use filters for convolution kernels; (3) clean badness that turn up rendering the Rmd (R markdown) file in RStudio; (4) polish expressions that could be more precise.
The revision work of commit 9640c8c is extended to the section 'NumPy routines on the GPU' included. The spirit is the same as in commit 9640c8c Some edits leaked in the rest of the documents, either by programmatic substitution or because of establishing an order of appearance (abbreviations). Left to do is the second half of the episode with the astronomy example.
It seems clear that the previous variable name 'deltas' did not express generic finite differences but an application of Dirac delta function to the image pixels. 'diracs' is therefore more precise insofar as it rules out any ambiguity. Since the convolution filter is called gauss, the pair diracs/gauss sounds also well balanced. Any other choice is of course possible, but without the drawbacks of the delta terminology
…filter The mean 'muu' is now 'origin' like in the illustration in the same section. 'dst' is now 'dist' which links more easily to the distance it means.
These changes concern the first episode half on image convolution. The notation is more uniform as to the usage of lower/uppercase and suffixes, and is possibly more concise. This is the table of changes: convolved_image_using_CPU > convolved_image_cpu convolved_image_using_GPU > convolved_image_gpu convolved_image_using_GPU_copied_to_host > convolved_image_gpu_in_host diracs_1d > diracs_1d_cpu execution_gpu > benchmark_gpu gauss_1d > gauss_1d_cpu gpu_avg_time > gpu_execution_avg transfer_compute_transferback > push_compute_pull (Git-inspired!)
I looked at the first half in its entirety with a fresh pair of eyes before moving to the second episode half on astronomy. I would call these edits either stylistic (reading flow) or pedagogic (announce what you explain). Although the work is not complete, there should be no unglued bits left.
… on Astronomy Here linguistic editing has been largely functional to pedagogy improvements. UNTIL THE SECTION ON BACKGROUND CHARACTERIZATION (STEP 1) The style goes too conversational about specialist knowledge, and seems to imply that just speaking of specialist knowledge is sufficient for learners to focus on the applications in GPU programming. In fact, the current implementation of a good idea approach engages uninitiated learners with considerable extraneous load at the expense of the cognitive load. This may prove quite tiresome during the course and while revising the material. The editing work so far to limit the side effects of a specialist application involved among others: (1) re-locate sentences so as to guide the readers from more- to less-known notions; (2) add a few Wikipedia links for exotic notions; (3) highlight/disclose assumptions, purposes and conclusions for the uninitiated readers. Hopefully, the uninitiated reader is better prepared to swallow the purpose of this section. Please double check for correctness. FROM SECTION ON SEGMENTING (STEP 2) ONWARD The text and logic become too hard to unravel and follow, both linguistically and pedagogically. The germane and extraneous loads for learners are considerable and possibly excessive. Intervening on these sections requires too much guesswork. Only a light linguistic pass has then been applied as a foothold for suggested work. OVERALL ON THE ASTRONOMY PART After having read the text and code several times, I guess that the purpose of this standard routine in radio astronomy is to find out sources lost WITHIN the background, which is then translated into finding the sources WITHIN the clipped dataset. Realising this (or anything that applies instead) should not be guesswork in the first place. The course should provide the necessary, accompanying and lightweight information early on. Also, the assisting commentary should be tuned in well with the overall statement of purpose of the method. Unglued specialist claims (for example, those on the probability of finding bright pixels at random) should fall into place without raising questions (how did you compute that probability? why is it equally relevant to the initial image of 2048² pixels and to the clipped image, which is arguably smaller?) The course should play ahead of these doubts. Shifting attention between inconsistent/incomplete specialist claims muddles the notions of GPU programming. As a loadstar, the learner should recognize in the text -- as easily as possible -- what to do, how to do that, and why to do that. (This extended commit message may well become the explanation of a pull request/issue.)
I previously I had unintendely merged into E02-language the E02-code on the fork (that is, on my GitHub repo). Here, I am just experimenting a bit. Here I intend to merge the branch E02-code branch that is on my local repository.
…attributes Using this native error message as example: Please use `.get()` to construct a NumPy array explicitly. I have applied the following usage: (1) Functions are typeset as `function()` (2) Methods as either `.method()` or `module.method()` (3) Variables as `variable` (4) Attributes as `.attribute` No decision made yet as to the Jupyter magics. This is an example of the implementation The previous code executes `convolve2d_gpu()` ten times, and stores the execution time of each run, in seconds, inside the `.gpu_times` attribute of the variable `benchmark_gpu`. We can then compute the average execution time and print it, as shown. For me: this commit shows the limit of the division into the branch E02-code and E0-language. The edits in this commit are in-line code text, so editing this code directly in the language-dedicated branch saves me one merge with conflicts
Thank you!Thank you for your pull request 😃 🤖 This automated message can help you check the rendered files in your submission for clarity. If you have any questions, please feel free to open an issue in {sandpaper}. If you have files that automatically render output (e.g. R Markdown), then you should check for the following:
Rendered Changes🔍 Inspect the changes: https://github.com/carpentries-incubator/lesson-gpu-programming/compare/md-outputs..md-outputs-PR-105 The following changes were observed in the rendered markdown documents:
What does this mean?If you have source files that require output and figures to be generated (e.g. R Markdown), then it is important to make sure the generated figures and output are reproducible. This output provides a way for you to inspect the output in a diff-friendly manner so that it's easy to see the changes that occur due to new software versions or randomisation. ⏱️ Updated at 2023-11-28 13:12:31 +0000 |
In order to help evaluate the changes, the attached files are the PDFs of the episode at the commits
The links above point to the documents. Take note that the number of pages of the documents differs because they were rendered with a different font size at print time. I could not even out this difference with elemental fiddling. Nonetheless, the original document contains 4659 words (Rmd source), the revised one 4519 words. Keep in mind, at any rate, that brevity and conciseness are two different notions, so a shorter document is not necessarily clearer. |
Seems like you put a lot of effort into this, great! Sadly, I can currently not review this, since lesson development has been suspended until further notice and at least until the end of this year. |
Sorry for the very long delay @wmotion, I approved your changes and we will test them in December at a new workshop. |
Auto-generated via `{sandpaper}` Source : b28cc41 Branch : main Author : Alessio Sclocco <[email protected]> Time : 2024-11-04 09:34:14 +0000 Message : Merge pull request #105 from wmotion/Episode02 Revision of Episode 2 (Using your GPU with CuPy)
Auto-generated via `{sandpaper}` Source : 74c6fea Branch : md-outputs Author : GitHub Actions <[email protected]> Time : 2024-11-04 09:40:20 +0000 Message : markdown source builds Auto-generated via `{sandpaper}` Source : b28cc41 Branch : main Author : Alessio Sclocco <[email protected]> Time : 2024-11-04 09:34:14 +0000 Message : Merge pull request #105 from wmotion/Episode02 Revision of Episode 2 (Using your GPU with CuPy)
No. One needs to assess the statistics of the background pixels, i.e. the image pixel values outside sources. To do so, one has to separate background pixels from source pixels. A technique to do this is called The standard deviation of the background pixel values reflects the sensitivity of the observation. A common standard is to threshold at 5 times that standard deviation: every pixel value above that threshold has a very high probability of being a cosmic source and not a noise peak. |
You wrote:
That is certainly not the goal, the goal of |
Thanks for clarifying this. Please edit the lesson material so that this misunderstanding cannot arise any longer. |
See #105 (comment) for the same pedagogic approach. |
Auto-generated via `{sandpaper}` Source : b28cc41 Branch : main Author : Alessio Sclocco <[email protected]> Time : 2024-11-04 09:34:14 +0000 Message : Merge pull request #105 from wmotion/Episode02 Revision of Episode 2 (Using your GPU with CuPy)
Auto-generated via `{sandpaper}` Source : 85da916 Branch : md-outputs Author : GitHub Actions <[email protected]> Time : 2024-11-05 01:21:58 +0000 Message : markdown source builds Auto-generated via `{sandpaper}` Source : b28cc41 Branch : main Author : Alessio Sclocco <[email protected]> Time : 2024-11-04 09:34:14 +0000 Message : Merge pull request #105 from wmotion/Episode02 Revision of Episode 2 (Using your GPU with CuPy)
Auto-generated via `{sandpaper}` Source : 85da916 Branch : md-outputs Author : GitHub Actions <[email protected]> Time : 2024-11-05 01:21:58 +0000 Message : markdown source builds Auto-generated via `{sandpaper}` Source : b28cc41 Branch : main Author : Alessio Sclocco <[email protected]> Time : 2024-11-04 09:34:14 +0000 Message : Merge pull request #105 from wmotion/Episode02 Revision of Episode 2 (Using your GPU with CuPy)
Auto-generated via `{sandpaper}` Source : b28cc41 Branch : main Author : Alessio Sclocco <[email protected]> Time : 2024-11-04 09:34:14 +0000 Message : Merge pull request #105 from wmotion/Episode02 Revision of Episode 2 (Using your GPU with CuPy)
Auto-generated via `{sandpaper}` Source : 038206e Branch : md-outputs Author : GitHub Actions <[email protected]> Time : 2024-11-19 01:30:59 +0000 Message : markdown source builds Auto-generated via `{sandpaper}` Source : b28cc41 Branch : main Author : Alessio Sclocco <[email protected]> Time : 2024-11-04 09:34:14 +0000 Message : Merge pull request #105 from wmotion/Episode02 Revision of Episode 2 (Using your GPU with CuPy)
Auto-generated via `{sandpaper}` Source : b28cc41 Branch : main Author : Alessio Sclocco <[email protected]> Time : 2024-11-04 09:34:14 +0000 Message : Merge pull request #105 from wmotion/Episode02 Revision of Episode 2 (Using your GPU with CuPy)
Auto-generated via `{sandpaper}` Source : f71341a Branch : md-outputs Author : GitHub Actions <[email protected]> Time : 2024-11-26 01:28:05 +0000 Message : markdown source builds Auto-generated via `{sandpaper}` Source : b28cc41 Branch : main Author : Alessio Sclocco <[email protected]> Time : 2024-11-04 09:34:14 +0000 Message : Merge pull request #105 from wmotion/Episode02 Revision of Episode 2 (Using your GPU with CuPy)
The revision in this pull request covers specifically episode two. At the time of writing, it did not cover the questions and objectives (top of page) and the keypoints (bottom of page). The last three sections have been revised only lightly; suggestions as to how to proceed are given. Specific issues may follow at a later time.
The topic branch tied to this pull request is
Episode-02
(thus, notmain
). Its first commit is 'empty' (that is, no change in working tree) just to mark the detachment from the main branch. This branch also contains merges from secondary branches dedicated to code changes, linguistic changes, pedagogic changes. The arrangement of these secondary branches in the fork is my own experimentation with content management: they should be of little import for the pull request.The beginning of the commit subject indicates whether the focus was on code changes, linguistic changes, or pedagogical changes. The last two are largely tied to each other: pedagogical changes implies more attention for the process of learning and for ways to encourage learning and retention. Some commit messages are lengthy with a (hoped-for) purpose to explain the underlying assumptions and possible alternatives.
Of course, it holds for all edits that the course developers/content experts should double-check whether the new text has stayed true to the original intentions. Long live version control!
SPECIAL COMMITS
The commits from 1abb082 through cdf743d regard semantic newlines (one sentence = one line) and rote spellchecking. This handling makes it possible exploit the Git toolbox to maximum clarity, using the same units Git uses for comparisons. Think of informative
git diff
andgit merge
operations. Therefore, pulling these commits should be fail-safe and is recommended at any rate.Commits db44928 and 591d289 explain the editing approach to the first half of the episode (Convolution), at least as is was conceived at the start. Other commits could also contain relevant annotations, though.
Commit 9dd5c8c explains the unfinished work on the second half of the episode (image processing in astronomy).
ACKNOWLEDGED ERRORS (SO FAR)
In commit 9dd5c8c I had not understood the direction of the image processing method yet. The standard deviation of a clipped data is used as standard deviation for the entire data set (as I understood it now). Please take this as a piece of evidence that the method is not explained clearly.