Class | Asciidoctor::Lexer |
In: |
lib/asciidoctor/lexer.rb
|
Parent: | Object |
Public: Methods to parse lines of AsciiDoc into an object hierarchy representing the structure of the document. All methods are class methods and should be invoked from the Lexer class. The main entry point is ::next_block. No Lexer instances shall be discovered running around. (Any attempt to instantiate a Lexer will be futile).
The object hierarchy created by the Lexer consists of zero or more Section and Block objects. Section objects may be nested and a Section object contains zero or more Block objects. Block objects may be nested, but may only contain other Block objects. Block objects which represent lists may contain zero or more ListItem objects.
Examples
# Create a Reader for the AsciiDoc lines and retrieve the next block from it. # Lexer::next_block requires a parent, so we begin by instantiating an empty Document. doc = Document.new reader = Reader.new lines block = Lexer.next_block(reader, doc) block.class # => Asciidoctor::Block
BlockMatchData | = | Struct.new(:context, :masq, :tip, :terminator) |
whether a block supports complex content should be a config setting if terminator is false, that means the all the lines in the reader should be parsed NOTE could invoke filter in here, before and after parsing
Internal: Catalog any callouts found in the text, but don‘t process them
text - The String of text in which to look for callouts document - The current document on which the callouts are stored
Returns nothing
Internal: Catalog any inline anchors found in the text, but don‘t process them
text - The String text in which to look for inline anchors document - The current document on which the references are stored
Returns nothing
Internal: Collect the lines belonging to the current list item, navigating through all the rules that determine what comprises a list item.
Grab lines until a sibling list item is found, or the block is broken by a terminator (such as a line comment). Definition lists are more greedy if they don‘t have optional inline item text...they want that text
reader - The Reader from which to retrieve the lines. list_type - The Symbol context of the list (:ulist, :olist, :colist or :dlist) sibling_trait - A Regexp that matches a sibling of this list item or String list marker
of the items in this list (default: nil)
has_text - Whether the list item has text defined inline (always true except for labeled lists)
Returns an Array of lines belonging to the current list item.
Internal: Initialize a new Section object and assign any attributes provided
The information for this section is retrieved by parsing the lines at the current position of the reader.
reader - the source reader parent - the parent Section or Document of this Section attributes - a Hash of attributes to assign to this section (default: {})
Public: Determines whether this line is the start of any of the delimited blocks
returns the match data if this line is the first line of a delimited block or nil if not
Public: Checks if these lines are a section title
line1 - the first line as a String line2 - the second line as a String (default: nil)
returns the section level if these lines are a section title, false otherwise
Internal: Determine whether the this line is a sibling list item according to the list type and trait (marker) provided.
line - The String line to check list_type - The context of the list (:olist, :ulist, :colist, :dlist) sibling_trait - The String marker for the list or the Regexp to match a sibling
Returns a Boolean indicating whether this line is a sibling list item given the criteria provided
Public: Make sure the Lexer object doesn‘t get initialized.
Raises RuntimeError if this constructor is invoked.
Public: Return the next Section or Block object from the Reader.
Begins by skipping over blank lines to find the start of the next Section or Block. Processes each line of the reader in sequence until a Section or Block is found or the reader has no more lines.
Uses regular expressions from the Asciidoctor module to match Section and Block delimiters. The ensuing lines are then processed according to the type of content.
reader - The Reader from which to retrieve the next block parent - The Document, Section or Block to which the next block belongs
Returns a Section or Block object holding the parsed content of the processed lines
Internal: Parse and construct a labeled (e.g., definition) list Block from the current position of the Reader
reader - The Reader from which to retrieve the labeled list match - The Regexp match for the head of the list parent - The parent Block to which this labeled list belongs
Returns the Block encapsulating the parsed labeled list
Internal: Parse and construct the next ListItem for the current bulleted (unordered or ordered) list Block, callout lists included, or the next term ListItem and definition ListItem pair for the labeled list Block.
First collect and process all the lines that constitute the next list item for the parent list (according to its type). Next, parse those lines into blocks and associate them with the ListItem (in the case of a labeled list, the definition ListItem). Finally, fold the first block into the item‘s text attribute according to rules described in ListItem.
reader - The Reader from which to retrieve the next list item list_block - The parent list Block of this ListItem. Also provides access to the list type. match - The match Array which contains the marker and text (first-line) of the ListItem sibling_trait - The list marker or the Regexp to match a sibling item
Returns the next ListItem or ListItem pair (depending on the list type) for the parent list Block.
Internal: Parse and construct an outline list Block from the current position of the Reader
reader - The Reader from which to retrieve the outline list list_type - A Symbol representing the list type (:olist for ordered, :ulist for unordered) parent - The parent Block to which this outline list belongs
Returns the Block encapsulating the parsed outline (unordered or ordered) list
Public: Return the next section from the Reader.
This method process block metadata, content and subsections for this section and returns the Section object and any orphaned attributes.
If the parent is a Document and has a header (document title), then this method will put any non-section blocks at the start of document into a preamble Block. If there are no such blocks, the preamble is dropped.
Since we are reading line-by-line, there‘s a chance that metadata that should be associated with the following block gets consumed. To deal with this case, the method returns a running Hash of "orphaned" attributes that get passed to the next Section or Block.
reader - the source Reader parent - the parent Section or Document of this new section attributes - a Hash of metadata that was left orphaned from the
previous Section.
Examples
source # => "Greetings\n---------\nThis is my doc.\n\nSalutations\n-----------\nIt is awesome." reader = Reader.new source.lines.entries # create empty document to parent the section # and hold attributes extracted from header doc = Document.new Lexer.next_section(reader, doc).first.title # => "Greetings" Lexer.next_section(reader, doc).first.title # => "Salutations"
returns a two-element Array containing the Section and Hash of orphaned attributes
Internal: Parse the table contained in the provided Reader
table_reader - a Reader containing the source lines of an AsciiDoc table parent - the parent Block of this Asciidoctor::Table attributes - attributes captured from above this Block
returns an instance of Asciidoctor::Table parsed from the provided reader
Public: Parses AsciiDoc source read from the Reader into the Document
This method is the main entry-point into the Lexer when parsing a full document. It first looks for and, if found, processes the document title. It then proceeds to iterate through the lines in the Reader, parsing the document into nested Sections and Blocks.
reader - the Reader holding the source lines of the document document - the empty Document into which the lines will be parsed options - a Hash of options to control processing
returns the Document object
Internal: Parse the next line if it contains metadata for the following block
This method handles lines with the following content:
Any attributes found will be inserted into the attributes argument. If the line contains block metadata, the method returns true, otherwise false.
reader - the source reader parent - the parent of the current line attributes - a Hash of attributes in which any metadata found will be stored options - a Hash of options to control processing: (default: {})
* :text indicates that lexer is only looking for text content and thus the block title should not be captured
returns true if the line contains metadata, otherwise false
Internal: Parse lines of metadata until a line of metadata is not found.
This method processes sequential lines containing block metadata, ignoring blank lines and comments.
reader - the source reader parent - the parent to which the lines belong attributes - a Hash of attributes in which any metadata found will be stored (default: {}) options - a Hash of options to control processing: (default: {})
* :text indicates that lexer is only looking for text content and thus the block title should not be captured
returns the Hash of attributes including any metadata found
Internal: Parse the cell specs for the current cell.
The cell specs dictate the cell‘s alignments, styles or filters, colspan, rowspan and/or repeating content.
returns the Hash of attributes that indicate how to layout and style this cell in the table.
Internal: Parse the column specs for this table.
The column specs dictate the number of columns, relative width of columns, default alignments for cells in each column, and/or default styles or filters applied to the cells in the column.
Every column spec is guaranteed to have a width
returns a Hash of attributes that specify how to format and layout the cells in the table.
Public: Parses the document header of the AsciiDoc source read from the Reader
Reads the AsciiDoc source from the Reader until the end of the document header is reached. The Document object is populated with information from the header (document title, document attributes, etc). The document attributes are then saved to establish a save point to which to rollback after parsing is complete.
This method assumes that there are no blank lines at the start of the document, which are automatically removed by the reader.
returns the Hash of orphan block attributes captured above the header
Public: Consume and parse the two header lines (line 1 = author info, line 2 = revision info).
Returns the Hash of header metadata. If a Document object is supplied, the metadata is applied directly to the attributes of the Document.
reader - the Reader holding the source lines of the document document - the Document we are building (default: nil)
Examples
parse_header_metadata(Reader.new ["Author Name <author@example.org>\n", "v1.0, 2012-12-21: Coincide w/ end of world.\n"]) # => {'author' => 'Author Name', 'firstname' => 'Author', 'lastname' => 'Name', 'email' => 'author@example.org', # 'revnumber' => '1.0', 'revdate' => '2012-12-21', 'revremark' => 'Coincide w/ end of world.'}
Internal: Parse the section title from the current position of the reader
Parse a single or double-line section title. After this method is called, the Reader will be positioned at the line after the section title.
reader - the source reader, positioned at a section title document- the current document
Examples
reader.lines # => ["Foo\n", "~~~\n"] title, level, id, single = parse_section_title(reader, document) title # => "Foo" level # => 2 id # => nil single # => false line1 # => "==== Foo\n" title, level, id, single = parse_section_title(reader, document) title # => "Foo" level # => 3 id # => nil single # => true
returns an Array of [String, Integer, String, Boolean], representing the id, title, level and line count of the Section, or nil.
Public: Parse the first positional attribute and assign named attributes
Parse the first positional attribute to extract the style, role and id parts, assign the values to their cooresponding attribute keys and return both the original style attribute and the parsed value from the first positional attribute.
attributes - The Hash of attributes to process
Examples
puts attributes => {1 => "abstract#intro.lead", "style" => "preamble"} parse_style_attribute(attributes) => ["abstract", "preamble"] puts attributes => {1 => "abstract#intro.lead", "style" => "abstract", "id" => "intro", "role" => "lead"}
Returns a two-element Array of the parsed style from the first positional attribute and the original style that was replaced
Internal: Parse the author line into a Hash of author metadata
author_line - the String author line names_only - a Boolean flag that indicates whether to process line as
names only or names with emails (default: false)
multiple - a Boolean flag that indicates whether to process multiple
semicolon-separated entries in the author line (default: true)
returns a Hash of author metadata
Remove the indentation (block offset) shared by all the lines, then indent the lines by the specified amount if specified
Trim the leading whitespace (indentation) equivalent to the length of the indent on the least indented line. If the indent argument is specified, indent the lines by this many spaces (columns).
The purpose of this method is to shift a block of text to align to the left margin, while still preserving the relative indentation between lines
lines - the Array of String lines to process indent - the integer number of spaces to add to the beginning
of each line; if this value is nil, the existing space is preserved (optional, default: 0)
Examples
source = <<EOS def names @name.split ' ') end EOS source.lines.entries # => [" def names\n", " @names.split ' '\n", " end\n"] Lexer.reset_block_indent(source.lines.entries) # => ["def names\n", " @names.split ' '\n", "end\n"] puts Lexer.reset_block_indent(source.lines.entries).join # => def names # => @names.split ' ' # => end
returns the Array of String lines with block offset removed
Internal: Resolve the 0-index marker for this list item
For ordered lists, match the marker used for this list item against the known list markers and determine which marker is the first (0-index) marker in its number series.
For callout lists, return <1>.
For bulleted lists, return the marker as passed to this method.
list_type - The Symbol context of the list marker - The String marker for this list item ordinal - The position of this list item in the list validate - Whether to validate the value of the marker
Returns the String 0-index marker for this list item
Internal: Resolve the 0-index marker for this ordered list item
Match the marker used for this ordered list item against the known ordered list markers and determine which marker is the first (0-index) marker in its number series.
The purpose of this method is to normalize the implicit numbered markers so that they can be compared against other list items.
marker - The marker used for this list item ordinal - The 0-based index of the list item (default: 0) validate - Perform validation that the marker provided is the proper
marker in the sequence (default: false)
Examples
marker = 'B.' Lexer::resolve_ordered_list_marker(marker, 1, true) # => 'A.'
Returns the String of the first marker in this number series
Internal: Converts a Roman numeral to an integer value.
value - The String Roman numeral to convert
Returns the Integer for this Roman numeral
Public: Convert a string to a legal attribute name.
name - the String name of the attribute
Returns a String with the legal AsciiDoc attribute name.
Examples
sanitize_attribute_name('Foo Bar') => 'foobar' sanitize_attribute_name('foo') => 'foo' sanitize_attribute_name('Foo 3 #-Billy') => 'foo3-billy'
Private: Get the Integer section level based on the characters used in the ASCII line under the section title.
line - the String line from under the section title.
Public: Store the attribute in the document and register attribute entry if accessible
name - the String name of the attribute to store value - the String value of the attribute to store doc - the Document being parsed attrs - the attributes for the current context
returns a 2-element array containing the attribute name and value