particle-iot/ruby-unicode-display-width

Name: ruby-unicode-display-width

Owner: Particle

Description: Debian package for Ruby gem unicode-display-width

Created: 2016-10-27 21:59:06.0

Updated: 2016-10-27 22:01:34.0

Pushed: 2018-01-03 18:43:54.0

Homepage: null

Size: 13

Language: Ruby

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Unicode::DisplayWidth [version]

Determines the monospace display width of a string in Ruby. Implementation based on EastAsianWidth.txt and other data, 100% in Ruby. Other than wcwidth(), which fulfills a similar purpose, it does not rely on the OS vendor to provide an up-to-date method for measuring string width.

Unicode version: 9.0.0

Introduction to Character Widths

Guesing the correct space a character will consume on terminals is not easy. There is no single standard. Most implementations combine data from East Asian Width, some General Categories, and hand-picked adjustments.

How this Library Handles Widths

Further at the top means higher precedence. Please expect changes to this algorithm with every MINOR version update (the X in 1.X.0)!

Width | Characters | Comment ——-|——————————|————————————————– X | (user defined) | Overwrites any other values -1 | "\b" | Backspace (total width never below 0) 0 | "\0", "\x05", "\a", "\n", "\v", "\f", "\r", "\x0E", "\x0F" | C0 control codes that do not change horizontal width 1 | "\u{00AD}" | SOFT HYPHEN 2 | "\u{2E3A}" | TWO-EM DASH 3 | "\u{2E3B}" | THREE-EM DASH 0 | General Categories: Mn, Me, Cf (non-arabic) | Excludes ARABIC format characters 0 | "\u{1160}".."\u{11FF}" | HANGUL JUNGSEONG 2 | East Asian Width: F, W | Full-width characters 1 or 2 | East Asian Width: A | Ambiguous characters, user defined, default: 1 1 | All other codepoints | -

Install

Install the gem with:

gem install unicode-display_width

Or add to your Gemfile:

gem 'unicode-display_width'
Usage
ire 'unicode/display_width'

ode::DisplayWidth.of("?") # => 1
ode::DisplayWidth.of("?") # => 2
Ambiguous Characters

The second parameter defines the value returned by characterrs defined as ambiguous:

ode::DisplayWidth.of("·", 1) # => 1
ode::DisplayWidth.of("·", 2) # => 2
Custom Overwrites

You can overwrite how to handle specific code points by passing a hash (or even a proc) as third parameter:

ode::DisplayWidth.of("a\tb", 1, 0x09 => 10)) # => 12
Usage with String Extension

Activated by default. Will be deactivated in version 2.0:

ire 'unicode/display_width/string_ext'

display_width #=> 1
display_width #=> 2

You can actively opt-out from the string extension with: require 'unicode/display_width/no_string_ext'

Usage From the CLI

Use this one-liner to print out display widths for strings from the command-line:

m install unicode-display_width
by -r unicode/display_width -e 'puts Unicode::DisplayWidth.of $*[0]' -- "?"

Replace “?” with the actual string to measure

Other Implementations & Discussion

See unicode-x for more Unicode related micro libraries.

Copyright & Info

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.