Javascript regex en string literal highlighting in Emacs

De beste javascript mode voor Emacs is deze van Karl Landström. Hij heeft alleen moeite met het correct detecteren van strings en regexes:


Niet leuk als je de source van jQuery opent. (Waarschuwing, dit crasht emacs!)

Ook niet leuk is dat Emacs’ ingebouwde syntax-table niet overweg kan
met de complete overloading van / in Javascript (als deel-operator, regex quote, en 2 soorten commentaar marker). Boeh voor Emacs!

De oplossing is dan om alle auto-quoting uit te zetten en het zelf te doen. Kan je net zulke ingewikkelde syntax mee highlighten als je wilt. Hoera voor Emacs!

Relevante passages:

;; --- Syntax Table And Parsing ---

(defvar javascript-mode-syntax-table
  (let ((table (make-syntax-table)))
    (c-populate-syntax-table table)

    ;; switch off build-in quoted string detection
    ;; since that just makes it really hard to detect
    ;; regular expressions and comments
    ;; this also has the benefit that multiline strings
    ;; are now not recognized as strings (since javascript does
    ;; not allow them)
    (modify-syntax-entry ?' "." table)
    (modify-syntax-entry ?\" "." table)

    ;; The syntax class of underscore should really be `symbol' ("_")
    ;; but that makes matching of tokens much more complex as e.g.
    ;; "\\" matches part of e.g. "_xyz" and "xyz_abc". Defines
    ;; it as word constituent for now.
    (modify-syntax-entry ?_ "w" table)

  "Syntax table used in JavaScript mode.")
(defconst js-quoted-string-re "\\(\".*?[^\\]\"\\|'.*?[^\\]'\\)")
(defconst js-quoted-string-or-regex-re "\\(/.*?[^\\]/\\w*\\|\".*?[^\\]\"\\|'.*?[^\\]'\\)")

(defconst js-font-lock-keywords-1
   (list js-function-heading-1-re 1 font-lock-function-name-face)
   (list js-function-heading-2-re 1 font-lock-function-name-face)

   ;; detect literal strings following a + operator
   (list (concat "+[ \t]*" js-quoted-string-re)  1 font-lock-string-face)

   ;; detect literal strings used in "literal object" keys
   (list (concat "[,{][ \t]*" js-quoted-string-re "[ \t]*:" ) 1 font-lock-string-face)

   ;; detects strings and regexes when assigned, passed, returned
   ;; used as an object key string (i.e. bla["some string"]), when used
   ;; as a literal object value (i.e. key: "string"), used as an array
   ;; element, or when they appear as the first expression on a line
   ;; and a few other hairy cases
   (list (concat "[=(:,;[][ \t]*" js-quoted-string-or-regex-re)  1 font-lock-string-face)
   (list (concat "^[ \t]*"      js-quoted-string-or-regex-re) 1 font-lock-string-face)
   (list (concat "return[ \t]*" js-quoted-string-or-regex-re) 1 font-lock-string-face)

   ;; detect "autoquoted" object properties... clases with "switch { ...  default: }"
   ;; may not be worth the trouble
   (list "\\(^[ \t]*\\|[,{][ \t]*\\)\\(\\w+\\):" 2 font-lock-string-face))

  "Level one font lock.")

En dan:

Patch is onderweg naar Karl. In de tussentijd kun je hier mijn versie downloaden.

Tags: , , ,

2 Responses to “Javascript regex en string literal highlighting in Emacs”

  1. Michael Alan Dorman Says:

    Joost, I just wanted to say thanks for posting this modified version of javascript.el—Karls last published version was causing emacs to fall over when looking at the source for prototype.js.

    That said, if you’re interested, there’s one expression in the latest version at that causes some mis-hilighting. If you look at line 4160, there’s an xpath expression involving // that causes the rest of the file to be highlighted as a comment.

    Still, much much better than sending Emacs into some sort of tight loop! Many thanks!


  2. Joost Says:

    Hey Michael,

    I’m not sure that can be fixed. At least not with the changes I made – I disabled all built-in string detection and all built-in comment detection except for the /* … */ style comments – I left those in because I couldn’t work out how to check for multi-line constructs.

    Basically that means that /* and */ are detected before any other comment and string detection, so even inside a string or regex they will be interpreted as comment markers.

    As a work around, you can escape the * character, which should not have any effect on the code:

    getElementsByXPath(‘.//\*’ + cond, element)

    I’m going to try and find a better solution, but don’t hold your breath :-)