Go to file
Nick Wellnhofer 4a513d5667 hash: Rewrite hash table code
This is a complete rewrite of the code in hash.c

Move from a chained hash table implementation to open addressing with
Robin Hood probing. This allows to increase the maximum fill factor and
further reduce the growth factor, saving considerable amounts of memory
without sacrificing performance.

To make this work, hash values are now cached in the table entry
also avoiding many key comparisons.

Tables are created lazily with a smaller minimum size.

Insertion functions now report an error if growing the table resulted in
a memory allocation failure.

Some string comparisons were optimized to call directly into libc
instead of using the xmlstring API.

The length of inserted keys is computed along with the hash improving
allocation performance.

Bounds checking was made more robust.

In dictionary-based mode, unneeded interning of strings is avoided.
2023-09-29 02:25:57 +02:00
.gitlab-ci
doc tests: Add ATTRIBUTE_NO_SANITIZE_INTEGER macro 2023-09-29 00:15:40 +02:00
example tests: Don't use deprecated symbols 2023-09-20 22:06:49 +02:00
fuzz malloc-fail: Report malloc failure in xmlRegEpxFromParse 2023-09-22 19:53:11 +02:00
include dict: Separate RNG code 2023-09-29 00:15:40 +02:00
m4
os400
python error: Make xmlGetLastError return a const error 2023-09-22 13:29:07 +02:00
result
test
vms
win32 globals: Use thread-local storage if available 2023-09-20 22:06:49 +02:00
xstc
.editorconfig
.gitattributes
.gitignore
.gitlab-ci.yml gitlab-ci: Add a "medium" config build 2023-09-21 12:42:19 +02:00
autogen.sh
buf.c globals: Stop including globals.h 2023-09-20 22:07:40 +02:00
c14n.c globals: Stop including globals.h 2023-09-20 22:07:40 +02:00
catalog.c globals: Stop including globals.h 2023-09-20 22:07:40 +02:00
check-relaxng-test-suite.py
check-relaxng-test-suite2.py
check-xinclude-test-suite.py
check-xml-test-suite.py
check-xsddata-test-suite.py
chvalid.c
chvalid.def
CMakeLists.txt cmake: Only use pkg-config for .pc files, not for building binaries 2023-09-23 16:48:57 +01:00
config.h.cmake.in
configure.ac autotools: Make --with-minimum disable lzma support 2023-09-21 14:31:31 +02:00
Copyright hash: Rewrite hash table code 2023-09-29 02:25:57 +02:00
dbgen.pl
dbgenattr.pl
debugXML.c include: Remove more unnecessary includes 2023-09-21 01:50:53 +02:00
dict.c dict: Separate RNG code 2023-09-29 00:15:40 +02:00
encoding.c doc: Make apibuild.py happy 2023-09-21 22:57:33 +02:00
entities.c globals: Stop including globals.h 2023-09-20 22:07:40 +02:00
error.c error: Make xmlGetLastError return a const error 2023-09-22 13:29:07 +02:00
genChRanges.py
gentest.py
genUnicode.py
globals.c parser: Fix reinitialization 2023-09-27 17:24:46 +02:00
hash.c hash: Rewrite hash table code 2023-09-29 02:25:57 +02:00
HTMLparser.c parser: Simplify xmlStringCurrentChar 2023-09-22 19:01:11 +02:00
HTMLtree.c include: Remove more unnecessary includes 2023-09-21 01:50:53 +02:00
legacy.c doc: Make apibuild.py happy 2023-09-21 22:57:33 +02:00
libxml-2.0-uninstalled.pc.in autoconf: Don't bake build time CFLAGS into pkg-config file 2023-09-04 22:14:02 +01:00
libxml-2.0.pc.in autoconf: Don't bake build time CFLAGS into pkg-config file 2023-09-04 22:14:02 +01:00
libxml.h tests: Add ATTRIBUTE_NO_SANITIZE_INTEGER macro 2023-09-29 00:15:40 +02:00
libxml.m4
libxml2-config.cmake.cmake.in cmake: Check whether static linking dependencies found in config files 2023-09-23 16:48:54 +01:00
libxml2-config.cmake.in cmake: Check whether static linking dependencies found in config files 2023-09-23 16:48:54 +01:00
libxml2.doap
libxml2.syms
list.c globals: Stop including globals.h 2023-09-20 22:07:40 +02:00
MAINTAINERS.md doc: Update MAINTAINERS and NEWS 2023-09-22 19:01:11 +02:00
Makefile.am
nanoftp.c globals: Stop including globals.h 2023-09-20 22:07:40 +02:00
nanohttp.c globals: Stop including globals.h 2023-09-20 22:07:40 +02:00
NEWS doc: Update MAINTAINERS and NEWS 2023-09-22 19:01:11 +02:00
parser.c doc: Make apibuild.py happy 2023-09-21 22:57:33 +02:00
parserInternals.c parser: Simplify xmlStringCurrentChar 2023-09-22 19:01:11 +02:00
pattern.c include: Remove more unnecessary includes 2023-09-21 01:50:53 +02:00
README.md
README.zOS
relaxng.c debug: Remove debugging code 2023-09-19 17:35:09 +02:00
rngparser.c
runsuite.c include: Remove more unnecessary includes 2023-09-21 01:50:53 +02:00
runtest.c build: Fix build when certain modules are disabled 2023-09-21 02:26:43 +02:00
runxmlconf.c error: Make xmlGetLastError return a const error 2023-09-22 13:29:07 +02:00
SAX.c globals: Stop including globals.h 2023-09-20 22:07:40 +02:00
SAX2.c build: Fix build when certain modules are disabled 2023-09-21 02:26:43 +02:00
schematron.c globals: Don't include SAX2.h from globals.h 2023-09-20 22:06:49 +02:00
testapi.c tests: Update testapi.c 2023-09-21 22:58:02 +02:00
testchar.c
testdict.c hash: Add hash table tests 2023-09-29 00:15:40 +02:00
testdso.c
testlimits.c include: Remove more unnecessary includes 2023-09-21 01:50:53 +02:00
testModule.c include: Remove more unnecessary includes 2023-09-21 01:50:53 +02:00
testOOM.c include: Remove more unnecessary includes 2023-09-21 01:50:53 +02:00
testOOMlib.c
testOOMlib.h
testrecurse.c include: Remove more unnecessary includes 2023-09-21 01:50:53 +02:00
testThreads.c include: Remove more unnecessary includes 2023-09-21 01:50:53 +02:00
threads.c dict: Separate RNG code 2023-09-29 00:15:40 +02:00
timsort.h
tree.c doc: Make apibuild.py happy 2023-09-21 22:57:33 +02:00
trio.c
trio.h
triodef.h
trionan.c
trionan.h
triop.h
triostr.c
triostr.h
uri.c globals: Stop including globals.h 2023-09-20 22:07:40 +02:00
valid.c doc: Make apibuild.py happy 2023-09-21 22:57:33 +02:00
xinclude.c parser: Simplify xmlStringCurrentChar 2023-09-22 19:01:11 +02:00
xlink.c include: Remove more unnecessary includes 2023-09-21 01:50:53 +02:00
xml2-config.in build: Generate better pkg-config files for static-only builds 2023-09-03 08:52:36 +01:00
xmlcatalog.c globals: Stop including globals.h 2023-09-20 22:07:40 +02:00
xmlIO.c globals: Move remaining globals back to correct header files 2023-09-20 22:06:49 +02:00
xmllint.c globals: Stop including globals.h 2023-09-20 22:07:40 +02:00
xmlmemory.c memory: Fix memory debugging with Windows threads 2023-09-21 23:29:18 +02:00
xmlmodule.c globals: Stop including globals.h 2023-09-20 22:07:40 +02:00
xmlreader.c xmlreader: Fix EOF detection in xmlTextReaderPushData 2023-09-21 16:29:28 +02:00
xmlregexp.c malloc-fail: Report malloc failure in xmlFARegExec 2023-09-29 00:15:40 +02:00
xmlsave.c doc: Make apibuild.py happy 2023-09-21 22:57:33 +02:00
xmlschemas.c error: Make xmlGetLastError return a const error 2023-09-22 13:29:07 +02:00
xmlschemastypes.c include: Remove more unnecessary includes 2023-09-21 01:50:53 +02:00
xmlstring.c string: Fix UTF-8 validation in xmlGetUTF8Char 2023-09-29 00:15:40 +02:00
xmlunicode.c
xmlwriter.c globals: Don't include SAX2.h from globals.h 2023-09-20 22:06:49 +02:00
xpath.c doc: Make apibuild.py happy 2023-09-21 22:57:33 +02:00
xpointer.c globals: Stop including globals.h 2023-09-20 22:07:40 +02:00
xzlib.c

libxml2

libxml2 is an XML toolkit implemented in C, originally developed for the GNOME Project.

Official releases can be downloaded from https://download.gnome.org/sources/libxml2/

The git repository is hosted on GNOME's GitLab server: https://gitlab.gnome.org/GNOME/libxml2

Bugs should be reported at https://gitlab.gnome.org/GNOME/libxml2/-/issues

Documentation is available at https://gitlab.gnome.org/GNOME/libxml2/-/wikis

License

This code is released under the MIT License, see the Copyright file.

Build instructions

libxml2 can be built with GNU Autotools, CMake, or several other build systems in platform-specific subdirectories.

Autotools (for POSIX systems like Linux, BSD, macOS)

If you build from a Git tree, you have to install Autotools and start by generating the configuration files with:

./autogen.sh [configuration options]

If you build from a source tarball, extract the archive with:

tar xf libxml2-xxx.tar.gz
cd libxml2-xxx

Then you can configure and build the library:

./configure [configuration options]
make

The following options disable or enable code modules and relevant symbols:

--with-c14n             Canonical XML 1.0 support (on)
--with-catalog          XML Catalogs support (on)
--with-debug            debugging module and shell (on)
--with-history          history support for shell (off)
--with-readline[=DIR]   use readline in DIR (for shell history)
--with-html             HTML parser (on)
--with-http             HTTP support (on)
--with-iconv[=DIR]      iconv support (on)
--with-icu              ICU support (off)
--with-iso8859x         ISO-8859-X support if no iconv (on)
--with-lzma[=DIR]       use liblzma in DIR (on)
--with-mem-debug        memory debugging module (off)
--with-modules          dynamic modules support (on)
--with-output           serialization support (on)
--with-pattern          xmlPattern selection interface (on)
--with-push             push parser interfaces (on)
--with-python           Python bindings (on)
--with-reader           xmlReader parsing interface (on)
--with-regexps          regular expressions support (on)
--with-run-debug        runtime debugging module (off)
--with-sax1             older SAX1 interface (on)
--with-schemas          XML Schemas 1.0 and RELAX NG support (on)
--with-schematron       Schematron support (on)
--with-threads          multithreading support (on)
--with-thread-alloc     per-thread malloc hooks (off)
--with-tree             DOM like tree manipulation APIs (on)
--with-valid            DTD validation support (on)
--with-writer           xmlWriter serialization interface (on)
--with-xinclude         XInclude 1.0 support (on)
--with-xpath            XPath 1.0 support (on)
--with-xptr             XPointer support (on)
--with-zlib[=DIR]       use libz in DIR (on)

Other options:

--with-minimum          build a minimally sized library (off)
--with-legacy           maximum ABI compatibility (off)

Note that by default, no optimization options are used. You have to enable them manually, for example with:

CFLAGS='-O2 -fno-semantic-interposition' ./configure

Now you can run the test suite with:

make check

Please report test failures to the mailing list or bug tracker.

Then you can install the library:

make install

At that point you may have to rerun ldconfig or a similar utility to update your list of installed shared libs.

CMake (mainly for Windows)

Another option for compiling libxml is using CMake:

cmake -E tar xf libxml2-xxx.tar.gz
cmake -S libxml2-xxx -B libxml2-xxx-build [possible options]
cmake --build libxml2-xxx-build
cmake --install libxml2-xxx-build

Common CMake options include:

-D BUILD_SHARED_LIBS=OFF            # build static libraries
-D CMAKE_BUILD_TYPE=Release         # specify build type
-D CMAKE_INSTALL_PREFIX=/usr/local  # specify the install path
-D LIBXML2_WITH_ICONV=OFF           # disable iconv
-D LIBXML2_WITH_LZMA=OFF            # disable liblzma
-D LIBXML2_WITH_PYTHON=OFF          # disable Python
-D LIBXML2_WITH_ZLIB=OFF            # disable libz

You can also open the libxml source directory with its CMakeLists.txt directly in various IDEs such as CLion, QtCreator, or Visual Studio.

Dependencies

Libxml does not require any other libraries. A platform with somewhat recent POSIX support should be sufficient (please report any violation to this rule you may find).

However, if found at configuration time, libxml will detect and use the following libraries:

  • libz, a highly portable and widely available compression library.
  • liblzma, another compression library.
  • libiconv, a character encoding conversion library. The iconv function is part of POSIX.1-2001, so libiconv isn't required on modern UNIX-like systems like Linux, BSD or macOS.
  • ICU, a Unicode library. Mainly useful as an alternative to iconv on Windows. Unnecessary on most other systems.

Contributing

The current version of the code can be found in GNOME's GitLab at at https://gitlab.gnome.org/GNOME/libxml2. The best way to get involved is by creating issues and merge requests on GitLab. Alternatively, you can start discussions and send patches to the mailing list. If you want to work with patches, please format them with git-format-patch and use plain text attachments.

All code must conform to C89 and pass the GitLab CI tests. Add regression tests if possible.

Authors

  • Daniel Veillard
  • Bjorn Reese
  • William Brack
  • Igor Zlatkovic for the Windows port
  • Aleksey Sanin
  • Nick Wellnhofer