Skip to content

Commit

Permalink
Add unicode aware downcase and upcase functions
Browse files Browse the repository at this point in the history
These are work the same as ascii_downcase and ascii_upcase but
also correctly handles all unicode characters in addition to ascii.
  • Loading branch information
liquidaty authored and wader committed Mar 20, 2024
1 parent be437ec commit 27869e7
Show file tree
Hide file tree
Showing 49 changed files with 24,327 additions and 3 deletions.
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

# vendored files
src/decNumber/** linguist-vendored
modules/utf8proc/** linguist-vendored

# generated files
src/lexer.[ch] linguist-generated=true
Expand Down
96 changes: 96 additions & 0 deletions COPYING
Original file line number Diff line number Diff line change
Expand Up @@ -134,3 +134,99 @@ STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.



jq uses parts of the open source C library "utf8proc", which is distributed
under the following license:

**utf8proc** is a software package originally developed
by Jan Behrens and the rest of the Public Software Group, who
deserve nearly all of the credit for this library, that is now maintained by the Julia-language developers. Like the original utf8proc,
whose copyright and license statements are reproduced below, all new
work on the utf8proc library is licensed under the [MIT "expat"
license](http://opensource.org/licenses/MIT):

*Copyright © 2014-2021 by Steven G. Johnson, Jiahao Chen, Tony Kelman, Jonas Fonseca, and other contributors listed in the git history.*

Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the "Software"),
to deal in the Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute, sublicense,
and/or sell copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.

## Original utf8proc license ##

*Copyright (c) 2009, 2013 Public Software Group e. V., Berlin, Germany*

Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the "Software"),
to deal in the Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute, sublicense,
and/or sell copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.

## Unicode data license ##

This software contains data (`utf8proc_data.c`) derived from processing
the Unicode data files. The following license applies to that data:

**COPYRIGHT AND PERMISSION NOTICE**

*Copyright (c) 1991-2007 Unicode, Inc. All rights reserved. Distributed
under the Terms of Use in http://www.unicode.org/copyright.html.*

Permission is hereby granted, free of charge, to any person obtaining a
copy of the Unicode data files and any associated documentation (the "Data
Files") or Unicode software and any associated documentation (the
"Software") to deal in the Data Files or Software without restriction,
including without limitation the rights to use, copy, modify, merge,
publish, distribute, and/or sell copies of the Data Files or Software, and
to permit persons to whom the Data Files or Software are furnished to do
so, provided that (a) the above copyright notice(s) and this permission
notice appear with all copies of the Data Files or Software, (b) both the
above copyright notice(s) and this permission notice appear in associated
documentation, and (c) there is clear notice in each modified Data File or
in the Software as well as in the documentation associated with the Data
File(s) or Software that the data or software has been modified.

THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF
THIRD PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS
INCLUDED IN THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR
CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF
USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
PERFORMANCE OF THE DATA FILES OR SOFTWARE.

Except as contained in this notice, the name of a copyright holder shall
not be used in advertising or otherwise to promote the sale, use or other
dealings in these Data Files or Software without prior written
authorization of the copyright holder.

Unicode and the Unicode logo are trademarks of Unicode, Inc., and may be
registered in some jurisdictions. All other trademarks and registered
trademarks mentioned herein are the property of their respective owners.
35 changes: 33 additions & 2 deletions Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -190,12 +190,35 @@ jq.1: jq.1.prebuilt
$(AM_V_GEN) cp $(srcdir)/jq.1.prebuilt $@

CLEANFILES += jq.1
SUBDIRS =

AM_CFLAGS += ${utf8proc_CFLAGS}

if BUNDLE_UTF8PROC
LIBJQ_SRC += modules/utf8proc/utf8proc.c
AM_CFLAGS += -I${srcdir}/modules/utf8proc
AM_CPPFLAGS += -I$(srcdir)/modules/utf8proc
else
if BUILD_UTF8PROC
BUILT_SOURCES += $(builddir)/libutf8proc.a
CLEANFILES += $(builddir)/libutf8proc.a
jq_LDADD += $(builddir)/libutf8proc.a

AM_CFLAGS += -I${srcdir}/modules/utf8proc
AM_CPPFLAGS += -I$(srcdir)/modules/utf8proc

$(builddir)/libutf8proc.a:
$(MAKE) $(AM_MAKEFLAGS) -C $(srcdir)/modules/utf8proc builddir=$(shell cd $(builddir) && pwd) libutf8proc.a
else
jq_LDADD += ${utf8proc_LIBS}
endif
endif

### Build oniguruma

if BUILD_ONIGURUMA
libjq_la_LIBADD += modules/oniguruma/src/.libs/libonig.la
SUBDIRS = modules/oniguruma
SUBDIRS += modules/oniguruma
endif

AM_CFLAGS += $(onig_CFLAGS)
Expand All @@ -204,6 +227,10 @@ if WITH_ONIGURUMA
TESTS += tests/onigtest tests/manonigtest
endif

### utf8proc

AM_CFLAGS += $(utf8proc_CFLAGS)

### Packaging

install-binaries: $(BUILT_SOURCES)
Expand All @@ -217,6 +244,7 @@ DOC_FILES = docs/content docs/public docs/templates \
EXTRA_DIST = $(DOC_FILES) $(man_MANS) $(TESTS) $(TEST_LOG_COMPILER) \
jq.1.prebuilt jq.spec src/lexer.c src/lexer.h src/parser.c \
src/parser.h src/version.h src/builtin.jq scripts/version \
modules/utf8proc \
libjq.pc \
tests/base64.test tests/jq-f-test.sh tests/jq.test \
tests/modules/a.jq tests/modules/b/b.jq tests/modules/c/c.jq \
Expand All @@ -236,7 +264,7 @@ EXTRA_DIST = $(DOC_FILES) $(man_MANS) $(TESTS) $(TEST_LOG_COMPILER) \
tests/utf8-truncate.jq tests/jq-f-test.sh \
tests/no-main-program.jq tests/yes-main-program.jq

AM_DISTCHECK_CONFIGURE_FLAGS=--with-oniguruma=builtin
AM_DISTCHECK_CONFIGURE_FLAGS=--with-oniguruma=builtin --with-utf8proc=builtin

# README.md is expected in GitHub projects, good stuff in it, so we'll
# distribute it and install it with the package in the doc directory.
Expand All @@ -253,3 +281,6 @@ rpm: dist jq.spec
rpmbuild -tb --define "_topdir ${PWD}/rpm" --define "_prefix /usr" --define "myver $(VERSION)" --define "myrel ${RELEASE}" rpm/SOURCES/jq-$(VERSION).tar.gz
find rpm/RPMS/ -name "*.rpm" -exec mv {} ./ \;
rm -rf rpm

dist-hook:
make -C $(distdir)/modules/utf8proc clean
66 changes: 65 additions & 1 deletion configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -288,9 +288,73 @@ AC_SUBST(onig_LDFLAGS)

AM_CONDITIONAL([BUILD_ONIGURUMA], [test "x$build_oniguruma" = xyes])
AM_CONDITIONAL([WITH_ONIGURUMA], [test "x$with_oniguruma" != xno])

dnl utf8proc
AC_ARG_WITH([utf8proc],
[AS_HELP_STRING([--with-utf8proc=prefix],
[specify the location of custom-installed utf8proc library to use, 'builtin' to force built-in, 'bundled' to use built-in AND bundle into libjq, or 'auto' to use default prefix or fallback to builtin])], ,
[with_utf8proc=auto])

utf8proc_CFLAGS=-DUTF8PROC_STATIC
utf8proc_LIBS=
build_utf8proc=yes
bundle_utf8proc=no

AS_IF([test "x$with_utf8proc" = "xbundled" ], [
bundle_utf8proc=yes
], [
AS_IF([test "x$with_utf8proc" = "xauto" ], [
test_prefix=
if test "x$prefix" != xNONE ; then
test_prefix=$prefix
elif test "x$ac_default_prefix" != x ; then
test_prefix=$ac_default_prefix
fi
if test "x$test_prefix" != "" ; then
save_CFLAGS="$CFLAGS"
CFLAGS="$CFLAGS -I$test_prefix/include"
AC_CHECK_HEADER("utf8proc.h", [
with_utf8proc=$test_prefix
AC_MSG_NOTICE([utf8proc.h found in $test_prefix])
], [
with_utf8proc=builtin
])
CFLAGS="$save_CFLAGS"
fi
])
AS_IF([test "x$with_utf8proc" != xyes -a "x$with_utf8proc" != xbuiltin -a "x$with_utf8proc" != "x" ], [
save_CFLAGS="$CFLAGS"
save_LDFLAGS="$LDFLAGS"
utf8proc_CFLAGS="$utf8proc_CFLAGS -I$with_utf8proc/include"
# utf8proc_LIBS="-L$with_utf8proc/lib -l:libutf8proc.a" # -l: not supported by some compilers
utf8proc_LIBS="$with_utf8proc/lib/libutf8proc.a"
CFLAGS="$CFLAGS $utf8proc_CFLAGS"
AC_CHECK_HEADER("utf8proc.h", [
build_utf8proc=no
], [
AC_MSG_NOTICE([utf8proc.h not found in $with_utf8proc. Will use the packaged utf8proc.])
])
CFLAGS="$save_CFLAGS"
LDFLAGS="$save_LDFLAGS"
])
AS_IF([test "x$build_utf8proc" = xyes ], [
# utf8proc_LIBS used only for libjq.pc
utf8proc_LIBS=`pwd`/libutf8proc.a
])
])

AC_SUBST(utf8proc_CFLAGS)
AC_SUBST(utf8proc_LDFLAGS)

AM_CONDITIONAL([BUILD_UTF8PROC], [test "x$build_utf8proc" = xyes])
AM_CONDITIONAL([BUNDLE_UTF8PROC], [test "x$bundle_utf8proc" = xyes])


AC_SUBST([BUNDLER], ["$bundle_cmd"])

AC_CONFIG_MACRO_DIRS([config/m4 m4])
AC_CONFIG_FILES([Makefile libjq.pc])
AC_OUTPUT

11 changes: 11 additions & 0 deletions docs/content/manual/manual.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1856,6 +1856,17 @@ sections:
input: '"useful but not for é"'
output: ['"USEFUL BUT NOT FOR é"']

- title: "`downcase`, `upcase`"
body: |
Emit a copy of the input string with its characters (unicode) converted to the
specified case.
examples:
- program: 'upcase'
input: '"useful for é"'
output: ['"USEFUL FOR É"']

- title: "`while(cond; update)`"
body: |
Expand Down
15 changes: 15 additions & 0 deletions jq.1.prebuilt

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions libjq.pc.in
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,5 @@ URL: https://jqlang.github.io/jq/
Description: Library to process JSON using a query language
Version: @VERSION@
Libs: -L${libdir} -ljq
Libs.private: @utf8proc_LIBS@
Cflags: -I${includedir}
Loading

0 comments on commit 27869e7

Please sign in to comment.