Package scanner

import "text/scanner"
Overview
Index
Examples

Overview ▾

软件包扫描程序为UTF-8编码的文本提供了一个扫描程序和令牌生成器. 它需要提供源的io.Reader,然后可以通过重复调用Scan函数将其标记化. 为了与现有工具兼容,不允许使用NUL字符. 如果源中的第一个字符是UTF-8编码的字节顺序标记(BOM),则将其丢弃.

默认情况下,扫描程序会跳过空格和Go注释,并识别Go语言规范定义的所有文字. 可以对其进行自定义,以仅识别这些文字的一部分,并识别不同的标识符和空白字符.

Example

example:3:1: if
example:3:4: a
example:3:6: >
example:3:8: 10
example:3:11: {
example:4:2: someParsable
example:4:15: =
example:4:17: text
example:5:1: }

示例(IsIdentRune)

default:1:1: %
default:1:2: var1
default:1:7: var2
default:1:11: %

percent:1:1: %var1
percent:1:7: var2
percent:1:11: %

示例(模式)

comments:2:5: // Comment begins at column 5.
comments:6:1: /*
This multiline comment
should be extracted in
its entirety.
*/

示例(空格)

[[aa ab ac ad] [ba bb bc bd] [ca cb cc cd] [da db dc dd]]

Constants

预定义模式位,用于控制令牌的识别. 例如,要将扫描仪配置为仅识别(转到)标识符,整数并跳过注释,请将扫描仪的模式字段设置为:

ScanIdents | ScanInts | SkipComments

除了注释(如果设置了SkipComments会跳过注释)之外,不会忽略无法识别的标记. 取而代之的是,扫描程序仅返回各自的单个字符(或可能的子令牌). 例如,如果模式为ScanIdents(而不是ScanStrings),则将字符串" foo"扫描为标记序列""" Ident""".

使用GoTokens配置扫描程序,使其可以接受所有Go文字令牌,包括Go标识符. 评论将被跳过.

const (
    ScanIdents     = 1 << -Ident
    ScanInts       = 1 << -Int
    ScanFloats     = 1 << -Float // includes Ints and hexadecimal floats
    ScanChars      = 1 << -Char
    ScanStrings    = 1 << -String
    ScanRawStrings = 1 << -RawString
    ScanComments   = 1 << -Comment
    SkipComments   = 1 << -skipComment // if set with ScanComments, comments become white space
    GoTokens       = ScanIdents | ScanFloats | ScanChars | ScanStrings | ScanRawStrings | ScanComments | SkipComments
)

扫描的结果是这些标记之一或Unicode字符.

const (
    EOF = -(iota + 1)
    Ident
    Int
    Float
    Char
    String
    RawString
    Comment
)

GoWhitespace是"扫描仪的空白"字段的默认值. 它的值选择Go的空白字符.

const GoWhitespace = 1<<'\t' | 1<<'\n' | 1<<'\r' | 1<<' '

func TokenString

func TokenString(tok rune) string

TokenString返回令牌或Unicode字符的可打印字符串.

type Position

源位置由位置值表示. 如果Line> 0,则位置有效.

type Position struct {
    Filename string // filename, if any
    Offset   int    // byte offset, starting at 0
    Line     int    // line number, starting at 1
    Column   int    // column number, starting at 1 (character count per line)
}

func (*Position) IsValid

func (pos *Position) IsValid() bool

IsValid报告位置是否有效.

func (Position) String

func (pos Position) String() string

type Scanner

扫描程序实现了从io.Reader读取Unicode字符和令牌的功能.

type Scanner struct {

    // Error is called for each error encountered. If no Error
    // function is set, the error is reported to os.Stderr.
    Error func(s *Scanner, msg string)

    // ErrorCount is incremented by one for each error encountered.
    ErrorCount int

    // The Mode field controls which tokens are recognized. For instance,
    // to recognize Ints, set the ScanInts bit in Mode. The field may be
    // changed at any time.
    Mode uint

    // The Whitespace field controls which characters are recognized
    // as white space. To recognize a character ch <= ' ' as white space,
    // set the ch'th bit in Whitespace (the Scanner's behavior is undefined
    // for values ch > ' '). The field may be changed at any time.
    Whitespace uint64

    // IsIdentRune is a predicate controlling the characters accepted
    // as the ith rune in an identifier. The set of valid characters
    // must not intersect with the set of white space characters.
    // If no IsIdentRune function is set, regular Go identifiers are
    // accepted instead. The field may be changed at any time.
    IsIdentRune func(ch rune, i int) bool // Go 1.4

    // Start position of most recently scanned token; set by Scan.
    // Calling Init or Next invalidates the position (Line == 0).
    // The Filename field is always left untouched by the Scanner.
    // If an error is reported (via Error) and Position is invalid,
    // the scanner is not inside a token. Call Pos to obtain an error
    // position in that case, or to obtain the position immediately
    // after the most recently scanned token.
    Position
    // contains filtered or unexported fields
}

func (*Scanner) Init

func (s *Scanner) Init(src io.Reader) *Scanner

初始化使用新的源初始化Scanner并返回s. 将Error设置为nil,将ErrorCount设置为0,将Mode设置为GoTokens,并将Whitespace设置为GoWhitespace.

func (*Scanner) Next

func (s *Scanner) Next() rune

Next reads and returns the next Unicode character. It returns EOF at the end of the source. It reports a read error by calling s.Error, if not nil; otherwise it prints an error message to os.Stderr. Next does not update the Scanner's Position field; use Pos() to get the current position.

func (*Scanner) Peek

func (s *Scanner) Peek() rune

Peek会在源代码中返回下一个Unicode字符,而无需提前扫描程序. 如果扫描仪的位置在源的最后一个字符处,它将返回EOF.

func (*Scanner) Pos

func (s *Scanner) Pos() (pos Position)

Pos会在上次调用Next或Scan所返回的字符或标记之后立即返回字符的位置. 使用扫描仪的位置字段作为最近扫描的令牌的起始位置.

func (*Scanner) Scan

func (s *Scanner) Scan() rune

扫描从源中读取下一个标记或Unicode字符并将其返回. 它仅识别设置了相应的模式位(1 <<-t)的令牌t. 它在源末尾返回EOF. 它通过调用s.Error(如果不是nil)来报告扫描程序错误(读取和令牌错误). 否则,它会向os.Stderr打印一条错误消息.

func (*Scanner) TokenText

func (s *Scanner) TokenText() string

TokenText返回与最近扫描的令牌相对应的字符串. 在调用Scan之后并在Scanner.Error调用中有效.

by  ICOPY.SITE