Tokenizing Java source code

From CodeCodex

Related content:

Implementations[edit]

Java[edit]

The StreamTokenizer can be used for simple parsing of a Java source file into tokens. The tokenizer can be aware of Java-style comments and ignore them. It is also aware of Java quoting and escaping rules.

    try {
        // Create the tokenizer to read from a file
        FileReader rd = new FileReader("filename.java");
        StreamTokenizer st = new StreamTokenizer(rd);
    
        // Prepare the tokenizer for Java-style tokenizing rules
        st.parseNumbers();
        st.wordChars('_', '_');
        st.eolIsSignificant(true);
    
        // If whitespace is not to be discarded, make this call
        st.ordinaryChars(0, ' ');
    
        // These calls caused comments to be discarded
        st.slashSlashComments(true);
        st.slashStarComments(true);
    
        // Parse the file
        int token = st.nextToken();
        while (token != StreamTokenizer.TT_EOF) {
            token = st.nextToken();
            switch (token) {
            case StreamTokenizer.TT_NUMBER:
                // A number was found; the value is in nval
                double num = st.nval;
                break;
            case StreamTokenizer.TT_WORD:
                // A word was found; the value is in sval
                String word = st.sval;
                break;
            case '"':
                // A double-quoted string was found; sval contains the contents
                String dquoteVal = st.sval;
                break;
            case '\'':
                // A single-quoted string was found; sval contains the contents
                String squoteVal = st.sval;
                break;
            case StreamTokenizer.TT_EOL:
                // End of line character found
                break;
            case StreamTokenizer.TT_EOF:
                // End of file has been reached
                break;
            default:
                // A regular character was found; the value is the token itself
                char ch = (char)st.ttype;
                break;
            }
        }
        rd.close();
    } catch (IOException e) {
    }

Perl[edit]

<HIGHLIGHTSYNTAX language="perl"> use Parse::Java qw(); my $ast = Parse::Java->new; $ast->parse_file('MyClass.java'); while (my ($token, $value) = $ast->_next_token) {

   next if $token eq 'COMMENT_TK'; # skip comments
   print "number: $v" if $token eq 'FP_TK' or $token eq 'INTEGRAL_TK';
   print "word: $v" if $token eq 'STRING_TK' or $token eq 'CHAR_TK';

}; </HIGHLIGHTSYNTAX>