This is a tool for cleaning up Rheinwerk openbooks before converting them to EPUB or PDF format.
Current state of development: v1.1.0 is feature complete, i.e. it can download, MD5-verify, unpack and convert all 36 openbooks available at release time.
History: If you want to know details about what has changed in which version, please take a look at the change log.
Download: A precompiled, executable JAR file is available here.
Usage:
$ java -jar galileo_openbook_cleaner-1.1.0.jar --help
OpenbookCleaner usage: java ... [options] <book_id>*
Option Description
------ -----------
-?, --help Display this help text
-c, --check-avail Check config file
-d, --download-dir <File> Download directory for openbooks; must
exist (default: .)
-l, --log-level <Integer> Log level (0=normal, 1=verbose,
2=debug, 3=trace) (default: 0)
-m, --check-md5 Download all known books without
storing them, verifying their MD5
checksums (slow! >1 Gb download)
-t, --threading <Integer> Threading mode (0=single, 1=multi);
single is slower, but better for
diagnostics) (default: 1)
-w, --write-config Write editable book list to config.xml
book_id1 book_id2 ... Books to be downloaded & converted
Legal book IDs:
all (magic value: all books), actionscript_1_und_2, actionscript_einstieg,
apps_iphone_ios5, apps_iphone_ios6, asp_net, c_von_a_bis_z, dreamweaver_8,
excel_2007, hdr_fotografie, it_handbuch, javascript_ajax, java_7, java_insel,
joomla_1_5, linux, linux_unix_prog, microsoft_netzwerk, oop, photoshop_cs2,
photoshop_cs4, php_pear, ruby_on_rails_2, shell_prog, ubuntu_10_04,
ubuntu_11_04, ubuntu_12_04, unix_guru, vb_2008, vb_2008_einstieg,
vb_2010_einstieg, vb_2012_einstieg, vcsharp_2008, vcsharp_2010, vcsharp_2012,
vmware, windows_server_2008
Dependencies: Openbook cleaner was developed in Java 7. It also uses a few open source libraries:
- jsoup 1.7.2 for parsing the "dirty" openbook HTML, selecting DOM elements and editing them, removing navigation elements, ads and other types of clutter, and finally write a clean, pretty-printed HTML document back to disk
- JOpt Simple 4.3 for parsing command-line parameters and showing a help page (usage info)
- Apache Commons Compress 1.4.1 for unzipping downloaded openbook archives. Note: When Java 7 is available on MacOS, this library might be removed again and we can revert to using the built-in Java classes.
- XStream 1.4.4 parsing the config.xml file containing openbook meta data
- AspectJ 1.7.4 for cross-cutting concerns like logging, timing, tracing which are not part of the main application logic. This helps to keep the core code clean and free from scattered code addressing secondary concerns.
Development environment:
- IDE: I originally started developing this project with Eclipse but have switched to IntelliJ IDEA which for me personally is preferable because of its superior Maven support. OTOH, Eclipse has better AspectJ integration. So if you want to change any of the aspect code, you might want to use Eclipse anyway.
- Git support is needed in your IDE of choice (or at least from the command line) if you want to interact with the source code repository and not just download a ZIP archive from GitHub.
- Maven is used for dependency management and the whole build and packaging cycle. Any Maven 3 version should be safe, I recommend using the latest stable version. It is totally up to you if you want to build from the command line or via IDE integration. In IntelliJ IDEA you should install the original Maven plugins, for Eclipse you need m2e and also the AspectJ Maven Configurator (can be installed from http://dist.springsource.org/release/AJDT/configurator/).
- AspectJ support is available for both Eclipse (AJDT, AspectJ Development Tools) and IntelliJ IDEA. I do not know about Netbeans or other IDEs though. So please make sure to install the corresponding IDE plugins for AspectJ support if you want to edit the aspect code comfortably. But this is optional, because Maven can still build the project, fetching all necessary dependencies including AspectJ.
Because later I might want to use this Git repository as a refactoring showcase for my developer workshops, I am going to do any refactoring step by step, documenting progress in small, fine-granular Git changesets, so later on I can review the evolutionary progress with others.
As you can see, I am mostly doing this little project for myself, but I like to share the results and receive some user feedback. I hope the openbook cleaner is useful to you. Enjoy! :-)
Alexander Kriegisch