Releases: simonw/shot-scraper
0.9
-
New shot-scraper javascript command for executing JavaScript against a web page and returning the result to the console as JSON: #38
% shot-scraper javascript datasette.io document.title "Datasette: An open source multi-tool for exploring and publishing data"
This can be used for web scraping and data extraction. Any JavaScript errors will cause the command to return an exit code of 1, so this can also be used to run tests against a website from within a continuous integration environment such as GitHub Actions.
-
The
shot-scraper pdf
andshot-scraper accessibility
commands can both now be used with local files in addition to URLs. #37 -
The
output:
key is no longer required in YAML shot configuration: if omitted, an automatic filename will be used instead. #40 -
An empty YAML file no longer produces an error. #41
0.8
0.7
- The
shot-scraper shot
andshot-scraper pdf
commands both now default to writing a file to disk if no filename is specified, using a name derived from the URL. If you want to write the PNG or PDF content to standard output you can do so using-o -
. #32 - New
--retina
flag forshot-scraper shot
andshot-scraper multi
which causes the screenshot to be taken with a device scale factor of 2. #33 shot-scraper shot --devtools
option opens an interactive browser window with the browser developer tools enabled. #34
0.6
-
Now supports taking screenshots of pages that require authentication. #18
The following command will open a browser window for the specified website, wait for you to manually authenticate and hit
<enter>
in the terminal, and then write the resulting authentication context out toauth.json
:shot-scraper auth https://github.com/ auth.json`
You can then take authenticated screenshots like this:
shot-scraper https://github.com/notifications \ --auth auth.json -o notifications.png
The
-a/--auth
option is also supported by themulti
,pdf
andaccessibility
commands. -
The
shot-scraper
command can now open a browser in which you can interact with a page before the screenshot is taken: #31shot-scraper https://simonwillison.net/ \ -o after-interaction.png \ --height 800 --interactive
This will output:
Hit <enter> to take the shot and close the browser window: # And after you hit <enter>... Screenshot of 'https://simonwillison.net/' written to 'after-interaction.png'
-
You can now pass multiple CSS selectors in order to take a screenshot of the smallest area that encompasses all of the content referenced by those selectors: #21
shot-scraper https://simonwillison.net/ \ -s '#bighead' -s .overband \ -o bighead-multi-selector.png
Add
--padding 20
to include an additional 20px of padding around the specified area.The YAML format used by
snap-shotter multi
also now supports multiple CSS selectors, which look like this:- output: bighead-multi-selector.png url: https://simonwillison.net/ selectors: - "#bighead" - .overband padding: 20
-
Scripted tests can now be run using
tests/run_examples.sh
#29
0.5
- New
shot-scraper pdf
command for creating a PDF export of a web page. #24 shot-scraper accessibility --javascript
option for executing custom JavaScript before taking the accessibility snapshot. #23shot-scraper accessibility -o filename.json
option. #25- README demos section now links to
@newshomepages
Twitter bot by @palewire - README now includes tips on executing JavaScript. #20
- README now includes the
--help
output of the various commands.
0.4
0.3
0.2
shot-scraper --selector SELECTOR
option to specify an element on the page using a CSS selector and take a screenshot of just that element. #8selector: ...
key in YAML file to specify an element by CSS selector.--javascript SCRIPT
option to specify custom JavaScript to be executed after the page has loaded but before the screenshot is taken. #12javascript:
key in YAML to specify JavaScript to execute.--width
and--height
options to set the width and height of the browser window used for the screenshot. If a height is specified, the resulting screenshot will be that height rather than being the full height of the page. #13- Equivalent
width:
andheight:
keys in the YAML configuration.
0.1
- Switched from npm Playwright to Python Playwright. #3
- New
shot-scraper install
command for installing the browser needed by Playwright. #6 - New
shot-scraper shot URL
command (also the default if you just runshot-scraper ...
) which takes a single screenshot. #5 shot-scraper multi shots.yml
command now executes the YAML file with a list of shots in it.