The following `real-life' natural language tokenizer is put together in a few lines of BinProlog on top of a generic regular expression recognizer using higher-order programming (call/N) and AGs.
% NL tokenizer - converts NL strings of chars to lists of words chars_words(Cs,Ws):-dcg_def([32|Cs]),words(Ws),!,dcg_val([]). % a sequence of words words(Ws):-star(word,Ws),spaces. % a token which is punctuation or a sequence of letters word(W):-spaces,(plus(letter,Xs);one(punct,Xs)),!,name(W,Xs). % 0 or more space characters spaces:-star(space,_). % recognizers space(X):- #X, is_space(X). % recognizes space letter(X):- #X, is_an(X). % recognizes alpha-numerics punct(X):- #X, is_punct(X). % recognizes punctuation % regexp tools with AGs and call/N one(F,[X]):- call(F,X). % recognizes one X of type F star(F,[X|Xs]):- call(F,X),!,star(F,Xs). % recognizes 0 or more star(_,[]). plus(F,[X|Xs]):- call(F,X),star(F,Xs). % recognizes 1 or more
The interface predicate chars_words initializes the input list with dcg_def/1 and constrains it to be empty at the end with dcg_val/1. Otherwise, higher-order programming (in this case call/N) can be applied as usual. Parsing is done by combining basic recognizers (space/1, letter/1, punct/1) which consume their input with #/1 under the constraint of being of a specified type. To combine words into sequences we reuse the same regular expression operators.
Note that similar code based on conventional, translation based DCGs or EDCGs would have to use phrase in rather intricate ways and would not benefit from an efficiently implemented call/N, (as available in BinProlog 5.75).