Commit Graph

131 Commits

Author SHA1 Message Date
Nick Wellnhofer
9b5cce7a71 include: Remove more unnecessary includes 2023-09-21 01:50:53 +02:00
Nick Wellnhofer
699299cae3 globals: Stop including globals.h 2023-09-20 22:07:40 +02:00
Nick Wellnhofer
76d6b0d768 html: Don't escape ASCII chars in href attributes
In several cases, href attributes can contain ASCII characters which are
illegal in URIs. Escaping them often does more harm than good.

Fixes #321.
2022-11-20 21:16:03 +01:00
Nick Wellnhofer
ad338ca737 Remove explicit integer casts
Remove explicit integer casts as final operation

- in assignments
- when passing arguments
- when returning values

Remove casts

- to the same type
- from certain range-bound values

The main motivation is that these explicit casts don't change the result
of operations and only render UBSan's implicit-conversion checks
useless. Removing these casts allows UBSan to detect cases where
truncation or sign-changes occur unexpectedly.

Document some explicit casts as truncating and add a few missing ones.
2022-09-01 02:33:57 +02:00
Nick Wellnhofer
0f568c0b73 Consolidate private header files
Private functions were previously declared

- in header files in the root directory
- in public headers guarded with IN_LIBXML
- in libxml.h
- redundantly in source files that used them.

Consolidate all private header files in include/private.
2022-08-26 02:11:56 +02:00
David Kilzer
054e46b097 Restore behavior of htmlDocContentDumpFormatOutput()
Patch by J Pascoe of Apple.

* HTMLtree.c:
(htmlDocContentDumpFormatOutput):
- Prior to commit b79ab6e6d9, xmlDoc.type was set to
  XML_HTML_DOCUMENT_NODE before dumping the HTML output, then
  restored before returning.
2022-05-14 08:56:47 -07:00
David Kilzer
21561e833a Mark more static data as const
Similar to 8f5710379, mark more static data structures with
`const` keyword.

Also fix placement of `const` in encoding.c.

Original patch by Sarah Wilkin.
2022-04-07 12:01:23 -07:00
Nick Wellnhofer
776d15d383 Don't check for standard C89 headers
Don't check for

- ctype.h
- errno.h
- float.h
- limits.h
- math.h
- signal.h
- stdarg.h
- stdlib.h
- string.h
- time.h

Stop including non-standard headers

- malloc.h
- strings.h
2022-03-02 00:43:54 +01:00
Nick Wellnhofer
346c3a930c Remove elfgcchack.h
The same optimization can be enabled with -fno-semantic-interposition
since GCC 5. clang has always used this option by default.
2022-02-20 21:49:04 +01:00
Nick Wellnhofer
92d9ab4c28 Fix whitespace when serializing empty HTML documents
The old, non-recursive HTML serialization code would always terminate
the output with a newline. The new implementation omitted the newline
if the document node had no children. Readd the newline when
serializing empty documents.

Fixes #266.
2021-06-07 15:09:53 +02:00
Nick Wellnhofer
85b1792e37 Work around lxml API abuse
Make xmlNodeDumpOutput and htmlNodeDumpFormatOutput work with corrupted
parent pointers. This used to work with the old recursive code but the
non-recursive rewrite required parent pointers to be set correctly.

Unfortunately, lxml relies on the old behavior and passes subtrees with
a corrupted structure. Fall back to a recursive function call if an
invalid parent pointer is detected.

Fixes #255.
2021-05-21 12:19:25 +02:00
Nick Wellnhofer
e6495e4789 Remove unused encoding parameter of HTML output functions
The encoding string is unused. Encodings are set by way of the output
buffer.
2021-02-07 14:39:55 +01:00
Nick Wellnhofer
0b3c64d9f2 Handle dumps of corrupted documents more gracefully
Check parent pointers for NULL after the non-recursive rewrite of the
serialization code. This avoids segfaults with corrupted documents
which can apparently be seen with lxml, see issue #187.
2020-09-29 18:08:37 +02:00
Nick Wellnhofer
c1ba6f54d3 Revert "Do not URI escape in server side includes"
This reverts commit 960f0e2756.

This commit introduced

- an infinite loop, found by OSS-Fuzz, which could be easily fixed.
- an algorithm with quadratic runtime
- a security issue, see
  https://bugzilla.gnome.org/show_bug.cgi?id=769760

A better approach is to add an option not to escape URLs at all
which libxml2 should have possibly done in the first place.
2020-08-15 18:32:29 +02:00
Nick Wellnhofer
b79ab6e6d9 Make htmlNodeDumpFormatOutput non-recursive
Fixes stack overflow with deeply nested HTML documents.

Found by OSS-Fuzz.
2020-07-28 03:44:30 +02:00
Nick Wellnhofer
20c60886e4 Fix typos
Resolves #133.
2020-03-08 17:41:53 +01:00
Jared Yanovich
2a350ee9b4 Large batch of typo fixes
Closes #109.
2019-09-30 18:04:38 +02:00
Nick Wellnhofer
d459831c1b Fix HTML serialization with UTF-8 encoding
If the encoding is specified as UTF-8, make sure to use a NULL encoding
handler.
2018-10-13 16:47:13 +02:00
Nick Wellnhofer
ee501f5449 Stop using doc->charset outside parser code
doc->charset does not specify the in-memory encoding which is always
UTF-8.
2018-10-13 16:47:01 +02:00
Shaun McCance
7607d9dd45 Allow HTML serializer to output HTML5 DOCTYPE
For https://bugzilla.gnome.org/show_bug.cgi?id=747301

Use simple HTML5 DOCTYPE for about:legacy-compat

HTML5 uses a DOCTYPE without a PUBLIC or SYSTEM identifier. It looks
like this:

<!DOCTYPE html>

I can't use XSLT to output this, because to get a DOCTYPE I have to
provide a PUBLIC or SYSTEM identifier. Luckily, the standards folks
recognized this and provided this semantically equivalent form for the
HTML DOCTYPE:

<!DOCTYPE html SYSTEM "about:legacy-compat">

But people don't like seeing the "legacy" identifier in their output.
They'd rather see the shiny new DOCTYPE. Since we know that
about:legacy-compat is defined by the W3C to be semantically equivalent
to the sans-SYSTEM DOCTYPE, we could just special-case it in the HTML
serializer in libxml2. So if you set the SYSTEM identifier to
"about:legacy-compat", you get an HTML5 short-form DOCTYPE.
2015-04-03 22:52:36 +08:00
Romain Bondue
960f0e2756 Do not URI escape in server side includes 2013-04-23 20:44:55 +08:00
Daniel Veillard
f8e3db0445 Big space and tab cleanup
Remove all space before tabs and space and tabs at end of lines.
2012-09-11 13:26:36 +08:00
Daniel Veillard
7d4c529a33 Improve HTML escaping of attribute on output
Handle special cases of &{...} constructs as hinted in the spec
  http://www.w3.org/TR/html401/appendix/notes.html#h-B.7.1
and special values as comment <!-- ... --> used for server side includes
This is limited to attribute values in HTML content.
2012-09-05 12:11:43 +08:00
Daniel Veillard
7b9b07198f Convert the HTML tree module to the new buffers
The new input buffers induced a couple of changes, the others
are related to the switch to xmlBuf in saving routines.
2012-07-23 14:24:27 +08:00
Daniel Veillard
39d027cdb7 Fix html serialization error and htmlSetMetaEncoding()
For https://bugzilla.gnome.org/show_bug.cgi?id=630682
The python tests were reporting errors, some of it was due to
a small change in case encoding, but the main one was about
htmlSetMetaEncoding(doc, NULL) being broken by not removing
the associated meta tag anymore
2012-05-11 12:38:23 +08:00
Daniel Veillard
c62efc847c Add options to ignore the internal encoding
For both XML and HTML, the document can provide an encoding
either in XMLDecl in XML, or as a meta element in HTML head.
This adds options to ignore those encodings if the encoding
is known in advace for example if the content had been converted
before being passed to the parser.

* parser.c include/libxml/parser.h: add XML_PARSE_IGNORE_ENC option
  for XML parsing
* include/libxml/HTMLparser.h HTMLparser.c: adds the
  HTML_PARSE_IGNORE_ENC for HTML parsing
* HTMLtree.c: fix the handling of saving when an unknown encoding is
  defined in meta document header
* xmllint.c: add a --noenc option to activate the new parser options
2011-05-26 11:47:37 +08:00
Daniel Veillard
8d7c1b7ab2 582913 Fix htmlSetMetaEncoding() to be nicer
* HTMLtree.c: htmlSetMetaEncoding should not destroy existing meta
  encoding elements, plus it should not change things at all if the
  encoding is the same. Also fixed htmlSaveFileFormat() to ask for
  change if outputing to UTF-8.
2009-08-12 23:03:23 +02:00
Daniel Veillard
74eb54b5b7 575875 don't output charset=html
* HTMLtree.c: don't output charset=html in htmlSetMetaEncoding()
  as this is clearly a libxml2 only thingused for import only
2009-08-12 15:59:01 +02:00
Daniel Veillard
da3fee406d Borland C fix from Moritz Both regenerate, workaround a problem for buffer
* trionan.c: Borland C fix from Moritz Both
* testapi.c: regenerate, workaround a problem for buffer testing
* xmlIO.c HTMLtree.c: new internal entry point to hide even better
  xmlAllocOutputBufferInternal
* tree.c: harden the code around buffer allocation schemes
* parser.c: restore the warning when namespace names are not absolute
  URIs
* runxmlconf.c: continue regression tests if we get the expected
  number of errors
* Makefile.am: run the python tests on make check
* xmlsave.c: handle the HTML documents and trees
* python/libxml.c: convert python serialization to the xmlSave APIs
  and avoid some horrible hacks
Daniel

svn path=/trunk/; revision=3790
2008-09-01 13:08:57 +00:00
Daniel Veillard
fcd02adb71 htmlNodeDumpFormatOutput didn't handle XML_ATTRIBUTE_NODe fixes bug
* HTMLtree.c: htmlNodeDumpFormatOutput didn't handle XML_ATTRIBUTE_NODe
  fixes bug #438390
Daniel

svn path=/trunk/; revision=3631
2007-06-12 09:49:40 +00:00
Rob Richards
417b74d0b1 Add linefeeds to error messages allowing for consistant handling.
* HTMLtree.c xmlsave.c: Add linefeeds to error messages allowing
  for consistant handling.
2006-08-15 23:14:24 +00:00
Rob Richards
77b92ff6a8 fix bug #322136 in xmlNodeBufGetContent when entity ref is a child of an
* tree.c: fix bug #322136 in xmlNodeBufGetContent when entity ref is
  a child of an element (fix by Oleksandr Kononenko).
* HTMLtree.c include/libxml/HTMLtree.h: Add htmlDocDumpMemoryFormat.
2005-12-20 15:55:14 +00:00
Daniel Veillard
b8c8016044 fixed bug #310333 with a patch close to the provided patch for HTML UTF-8
* HTMLtree.c: fixed bug #310333 with a patch close to the provided
  patch for HTML UTF-8 serialization
* result/HTML/script2.html: this changed the output of that test
Daniel
2005-08-08 13:46:45 +00:00
Daniel Veillard
5d4644ef6e revamped the elfgcchack.h format to cope with gcc4 change of aliasing
* doc/apibuild.py doc/elfgcchack.xsl: revamped the elfgcchack.h
  format to cope with gcc4 change of aliasing allowed scopes, had
  to add extra informations to doc/libxml2-api.xml to separate
  the header from the c module source.
* *.c: updated all c library files to add a #define bottom_xxx
  and reimport elfgcchack.h thereafter, and a bit of cleanups.
* doc//* testapi.c: regenerated when rebuilding the API
Daniel
2005-04-01 13:11:58 +00:00
Daniel Veillard
aa9a983dbd fixing bug 168196, <a name=""> must be URI escaped too Daniel
* HTMLtree.c: fixing bug 168196, <a name=""> must be URI escaped too
Daniel
2005-03-29 20:30:17 +00:00
Daniel Veillard
d5cc0f7f51 augmented types supported a number of new bug fixes and documentation
* gentest.py testapi.c: augmented types supported
* HTMLtree.c tree.c xmlreader.c xmlwriter.c: a number of new
  bug fixes and documentation updates.
Daniel
2004-11-06 19:24:28 +00:00
Daniel Veillard
ce244ad595 fixed the way the generator works, extended the testing, especially with
* gentest.py testapi.c: fixed the way the generator works,
  extended the testing, especially with more real trees and nodes.
* HTMLtree.c tree.c valid.c xinclude.c xmlIO.c xmlsave.c: a bunch
  of real problems found and fixed.
* entities.c: fix error reporting to go through the new handlers
Daniel
2004-11-05 10:03:46 +00:00
Daniel Veillard
3d97e669ec extending the tests coverage more fixes and cleanups Daniel
* gentest.py testapi.c: extending the tests coverage
* HTMLtree.c tree.c xmlsave.c xpointer.c: more fixes and cleanups
Daniel
2004-11-04 10:49:00 +00:00
Daniel Veillard
36e5cd5064 adding xmlMemBlocks() work on generator of an automatic API regression
* xmlmemory.c include/libxml/xmlmemory.h: adding xmlMemBlocks()
* Makefile.am gentest.py testapi.c: work on generator of an
  automatic API regression test tool.
* SAX2.c nanoftp.c parser.c parserInternals.c tree.c xmlIO.c
  xmlstring.c: various API hardeing changes as a result of running
  teh first set of automatic API regression tests.
* test/slashdot16.xml: apparently missing from CVS, commited it
Daniel
2004-11-02 14:52:23 +00:00
William M. Brack
13dfa87e91 added the routine xmlNanoHTTPContentLength to the external API
* nanohttp.c, include/libxml/nanohttp.h: added the routine
  xmlNanoHTTPContentLength to the external API (bug151968).
* parser.c: fixed unnecessary internal error message (bug152060);
  also changed call to strncmp over to xmlStrncmp.
* encoding.c: fixed compilation warning (bug152307).
* tree.c: fixed segfault in xmlCopyPropList (bug152368); fixed
  a couple of compilation warnings.
* HTMLtree.c, debugXML.c, xmlmemory.c: fixed a few compilation
  warnings; no change to logic.
2004-09-18 04:52:08 +00:00
Daniel Veillard
42fd412637 change --html to make sure we use the HTML serialization rule by default
* xmllint.c: change --html to make sure we use the HTML serialization
  rule by default when HTML parser is used, add --xmlout to allow to
  force the XML serializer on HTML.
* HTMLtree.c: ugly tweak to fix the output on <p> element and
  solve #125093
* result/HTML/*: this changes the output of some tests
Daniel
2003-11-04 08:47:48 +00:00
William M. Brack
76e95df055 Changed all (?) occurences where validation macros (IS_xxx) had
* include/libxml/parserInternals.h HTMLparser.c HTMLtree.c
  SAX2.c catalog.c debugXML.c entities.c parser.c relaxng.c
  testSAX.c tree.c valid.c xmlschemas.c xmlschemastypes.c
  xpath.c: Changed all (?) occurences where validation macros
  (IS_xxx) had single-byte arguments to use IS_xxx_CH instead
  (e.g. IS_BLANK changed to IS_BLANK_CH).  This gets rid of
  many warning messages on certain platforms, and also high-
  lights places in the library which may need to be enhanced
  for proper UTF8 handling.
2003-10-18 16:20:14 +00:00
Daniel Veillard
e2238d5617 converted too small cleanup Daniel
* HTMLtree.c include/libxml/xmlerror.h: converted too
* tree.c: small cleanup
Daniel
2003-10-09 13:14:55 +00:00
Daniel Veillard
a9cce9cd0d Okay this is scary but it is just adding a configure option to disable
* HTMLtree.c SAX2.c c14n.c catalog.c configure.in debugXML.c
  encoding.c entities.c nanoftp.c nanohttp.c parser.c relaxng.c
  testAutomata.c testC14N.c testHTML.c testRegexp.c testRelax.c
  testSchemas.c testXPath.c threads.c tree.c valid.c xmlIO.c
  xmlcatalog.c xmllint.c xmlmemory.c xmlreader.c xmlschemas.c
  example/gjobread.c include/libxml/HTMLtree.h include/libxml/c14n.h
  include/libxml/catalog.h include/libxml/debugXML.h
  include/libxml/entities.h include/libxml/nanohttp.h
  include/libxml/relaxng.h include/libxml/tree.h
  include/libxml/valid.h include/libxml/xmlIO.h
  include/libxml/xmlschemas.h include/libxml/xmlversion.h.in
  include/libxml/xpathInternals.h python/libxml.c:
  Okay this is scary but it is just adding a configure option
  to disable output, this touches most of the files.
Daniel
2003-09-29 13:20:24 +00:00
William M. Brack
3a6da760c5 Fixed bug 121394 - missing ns on attributes
* HTMLtree.c: Fixed bug 121394 - missing ns on attributes
2003-09-15 04:58:14 +00:00
Daniel Veillard
70bcb0ea24 hum try to avoid some troubles when the library is not initialized and one
* HTMLtree.c tree.c threads.c: hum try to avoid some troubles
  when the library is not initialized and one try to save, the
  locks in threaded env might not been initialized, playing safe
* xmlschemastypes.c: apply patch for hexBinary from Charles Bozeman
* test/schemas/hexbinary_* result/schemas/hexbinary_*: also added
  his tests to the regression suite.
Daniel
2003-08-08 14:00:28 +00:00
Daniel Veillard
5f5b7bb78e fixing bug #112904: html output method escaped plus sign character in URI
* HTMLtree.c: fixing  bug #112904: html output method escaped
  plus sign character in URI attribute.
Daniel
2003-05-16 17:19:40 +00:00
Daniel Veillard
645c690d49 patch from Vasily Tchekalkin to fix #109865 Daniel
* HTMLtree.c: patch from Vasily Tchekalkin to fix #109865
Daniel
2003-04-10 21:40:49 +00:00
Daniel Veillard
c7e9b194e7 Fixed reopening of #78662 <form action="..."> is an URI reference Daniel
* HTMLtree.c: Fixed reopening of #78662 <form action="...">
  is an URI reference
Daniel
2003-03-27 14:08:24 +00:00
Daniel Veillard
04ee2f2d00 avoid escaping ',' in URIs Daniel
* HTMLtree.c: avoid escaping ',' in URIs
Daniel
2003-03-23 20:31:46 +00:00