Parsing is a technique where we use to analyze and understand the structure of a text or code in order to extract meaningful information. This process involves breaking down complex data structures or programming languages into simpler, more digestible parts, making them easier to analyze. Parsing finds applications in a variety of fields, such as natural language processing, compilers, and data analysis tools.
For example, when writing code in a programming language, it must be converted into a format that a computer can process and execute. This involves parsing the code to identify its various components—like variables, functions, statement and etc. By breaking the code down, parsing helps the computer understand the syntax and structure of the programming language, enabling it to run the instructions accurately.
Another one is Data parsing, it refers to the process of analyzing a structured format to extract information, and when it comes to JSON (JavaScript Object Notation), this involves interpreting the data represented in a JSON format. JSON is widely used for transmitting data between a server and a web application because of its lightweight and easy-to-read structure. You are going to see an example of parsing functions later on our simple parser JSON implementation and but before that you can read the documentation here: MaxPC Documentation.
The first parser library I am currently using is MaxPC—a non-complex system where you can write your custom parsing program based on S-expression, that is capable for parse tree transformation, error handling and can operate on sequence and streams. It has five types of parsing functions called basic parsers, logical combinators, sequence combinators, transformation and error handling. MaxPC also have—Caveat: Recursive Parsers which we are not going to include on this topic.
(ql:quickload :maxpc)
defpackage :parsing-playground
(:use :cl :maxpc))
(
in-package :parsing-playground) (
> (parse '(foo) (=element)) → FOO, T, T
parsing-playground > (parse '(x) (?eq 'y)) ⇒ NIL, NIL, NIL parsing-playground
For us to determine the result that is based on generalized boolean, this is the parser template:
function (input) (or input null) * boolean) (
NOTE: Generalized boolean: Match‑p is true if the parser got matched into the input-source. End‑p is true if parser got matched into the complete input-source.
Always remember that parser which names starts with a/an:
'y)) → NIL, NIL, NIL (parse '(x) (?eq
(parse '(foo) (=element)) → FOO, T, T
(parse '(z) (%maybe (=element))) → Z, T, T (parse '() (%any (=element))) → NIL, T, T
Now that you've explored the capabilities of MaxPC, let’s apply it to a real example—creating a custom parser for a JSON-like object that decodes string keys and simple values (both strings and numbers). This example will show how to build a parser on multiple object entries.—Take note that this is just an example, some JSON file that you can see on the internet have complex structures, it means you also need to have a complex parsing system.
Example .JSON file:
{
"Name": "nycto",
"Age": "22"
}
{
"Name": "sion",
"Age": "20"
}
{
"Name": "Bob",
"Age": "20"
}
For our utilities:
defun ?whitespace ()
("Match zero or more whitespace character input."
(?seq (%any (maxpc.char:?whitespace))))
defun ?digit-p ()
("Match to check the elements is a digit character."
some (?satisfies 'digit-char-p)))) (?seq (%
It’s important to note that expressions are read from left to right. In the ?whitespace function, the ?seq function—matches parsers in sequence, while %any—matches parsers in a variable number of times.
Similarly, in the =?digit-p function, %some—is used to match parsers one or more times, and ?satisfies— checks if the input meets the criteria defined by the symbol digit-char-p.
Result:
#\ ) (?whitespace)) → NIL, T, T
(parse '("30" (?digitp)) → NIL, T, T
(parse "30 a b" (?digit-p)) → NIL T NIL
(parse #\3 #\a) (?digit-p)) → NIL T NIL
(parse '(#\3 #\0) (?digit-p)) → NIL, T, T (parse '(
For matching the name and name's value:
defun =key ()
("Returns a key string."
rest _)
(=destructure (_ str &#\")
(=list (?eq some (?satisfies 'alphanumericp)))
(=subseq (%#\")
(?eq #\:)))
(?seq (?eq
str))
defun =name-value ()
("Return a value string."
rest _)
(=destructure (_ str &#\")
(=list (?eq some (?satisfies (lambda (c) (not (eq c #\"))))))
(=subseq (%#\")
(?eq #\,)))
(?seq (?eq str))
As you can see, the two functions break down the data into components
to match entries like \"name\":"
and \"alice\","
. If you're familiar with
destructuring-bind, this process becomes easier. The underscore (_) is
used to ignore the result values, allowing us to omit the backslashes
and colons from the results.
Result:
"\"name\":" (=key)) → "name", T, T
(parse "\"nycto\"," (=name-value)) → "nycto", T, T (parse
For matching the age's value:
defun =age-value ()
("Return a digit."
(=destructure (_ age _)#\")
(=list (?eq
(=subseq (?digit-p))#\"))
(?eq age))
This function will only accept digits as input and will utilize the =key function again.
Result:
"\"age\":" (=key))
(parse "\"22\"" (=age-value)) (parse
Return the name and name's value; age and age's value:
defun =key-value ()
("Matches a key-value pair."
(=destructure (key1 _ value1 _ key2 _ value2)
(=list (=key)
(?whitespace)
(=name-value)
(?whitespace)
(=key)
(?whitespace)
(=age-value))list key1 value1 key2 value2))) (
This function matches and returns two items without brackets. It uses =list to sequence the parsers and generate a list as shown below.
Result:
"\"name\": \"nycto\", \"age\": \"22\"" (=key-value))
(parse "name" "nycto" "age" "22"), T, T →(
Top level to parse the entire object:
defun =parse-json ()
("Matches a single JSON object."
rest _)
(=destructure (_ _ result &#\{)
(=list (?eq
(?whitespace)some (=key-value))
(%
(?whitespace)#\})
(?eq
(?whitespace))apply #'append result))) ;;To flatten the nested-lists.
(
defun parse-file (items)
(
(parse items (=parse-json)))
defun read-json-file (filename)
("Read the contents of a JSON file and return it as a string."
with-open-file (stream filename)
(let ((content (make-string (file-length stream))))
(read-sequence content stream)
(
content)))
defun parse-json-file (filename)
("Read a JSON file and return the parsed object."
let ((json-string (read-json-file filename)))
( (parse-file json-string)))
The *=parse-json* function matches the structure of a JSON object by
using pattern matching to identify an opening brace, whitespace,
key-value pairs, and a closing brace and flattens them into a single
list. The PARSE-FILE
function serves as a
wrapper that applies the PARSE-JSON
parser
to a given input. The READ-JSON-FILE
function reads the contents of a specified JSON file into a string.
Finally, the PARSE-JSON-FILE
function
combines file reading and parsing, first fetching the JSON string from
the file and then applying the parsing function.
Result:
(parse {"name\": \"nycto\",
\ \"age\": \"22\"
}
(=parse-json))
→ ("name" "nycto" "age" "22"),T, T
"your-specified-path/example.json")
(read-json-file "{
→ \"name\": \"nycto\",
\"age\": \"22\"
}
"your-specified-path/example.json")
(parse-json-file "name" "nycto" "age" "22"),T, T → (
To handle a multiple objects, we can just create a new definition on
our top-level by writing the parse-json-objects function
as shown below:
defun =parse-json ()
("Matches a single JSON object."
rest _)
(=destructure (_ _ result &#\{)
(=list (?eq
(?whitespace)some (=key-value))
(%
(?whitespace)#\})
(?eq
(?whitespace))
result))
defun =parse-json-objects ()
("Matches multiple JSON objects."
(=destructure (objs)some (=parse-json)))
(=list (%apply #'append objs))) ;; Reduce 1 level of list
(
defun parse-file (items)
(
(parse items (=parse-json-objects)))
defun parse-json-file (filename)
("Read a JSON file and return the parsed object."
let ((json-string (read-json-file filename)))
(
(parse-file json-string)))
Result:
"your-specified-path/example.json")
(parse-json-file "Name" "nycto" "Age" "22") ("Name" "sion" "Age" "20")
(("Name" "Bob" "Age" "20")),T, T (