tc39/proposal-well-formed-stringify

Name: proposal-well-formed-stringify

Owner: Ecma TC39

Description: Proposal to prevent JSON.stringify from returning ill-formed strings

Created: 2018-03-24 05:17:39.0

Updated: 2018-05-24 03:52:51.0

Pushed: 2018-05-23 02:08:56.0

Homepage: https://tc39.github.io/proposal-well-formed-stringify/

Size: 34

Language: null

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Well-formed JSON.stringify

A proposal to prevent JSON.stringify from returning ill-formed Unicode strings.

Status

This proposal is at stage 2 of the TC39 Process.

Champions
Motivation

RFC 8259 section 8.1 requires JSON text exchanged outside the scope of a closed ecosystem to be encoded using UTF-8, but JSON.stringify can return strings including code points that have no representation in UTF-8 (specifically, surrogate code points U+D800 through U+DFFF). And contrary to the description of JSON.stringify, such strings are not “in UTF-16” because “isolated UTF-16 code units in the range D800??..DFFF?? are ill-formed” per The Unicode Standard, Version 10.0.0, Section 3.4 at definition D91 and excluded from being “in UTF-16” per definition D89.

However, returning such invalid Unicode strings is unnecessary, because JSON strings can include Unicode escape sequences.

Proposed Solution

Rather than return unpaired surrogate code points as single UTF-16 code units, represent them with JSON escape sequences.

Discussion
Backwards Compatibility

This change is backwards-compatible, under an assumption of consumer compliance with the JSON specification. User-visible effects will be limited to the replacement of some rare single UTF-16 code units in JSON.stringify output with equivalent six-character escape sequences that can be represented both in UTF-16 and in UTF-8. It is the authors' opinion that any consumer accepting the current ill-formed output will be unaffected by this change (this is true in particular of ECMAScript JSON.parse). Any consumer rejecting the current ill-formed output will have a new opportunity to accept its well-formed representation, although such consumers may still reject input that specifies strings including Unicode code points that are not scalar values (e.g., because they only accept I-JSON input), but those that accept it must have mechanisms for dealing with unpaired surrogates (as mentioned in the specification of JSON).

Validity

Unicode escape sequences are valid JSON, and?being completely ASCII?are well-formed in both UTF-16 and UTF-8.

Specification

The specification is available in ecmarkup or rendered HTML.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.