HomeJavaUnderstanding Java Scanner - Text Parsing

Understanding Java Scanner – Text Parsing

Java provides a Scanner class that can be used as a text parser. This class can parse primitive types and strings using regular expressions.

A Scanner breaks its input into tokens using a delimiter pattern, which by default matches whitespace. The resulting tokens may then be converted into values of different types. Let us dig deep dive into some of the usage scenarios of the Scanner class.

Java Scanner: Read a Whole File

To read the whole file in a single String, use the following. The delimiter here is the regular expression for the beginning of the file.

try (Scanner scanner = new Scanner(new File(filename));) {
    scanner.useDelimiter("\\A");
    String all = scanner.next();
}

 

[ DJ Recommends:Ā How to Concatenate Strings in Java 8 ]

Read Text by Paragraph

Specifying an empty-line regex as the delimiter allows you to read text by paragraphs. The regular expression pattern specifies the multi-line flag, so use ^ and $ match at the beginning and end of each line, rather than the whole input.

try (Scanner scanner = new Scanner(new File(filename));) {
    scanner.useDelimiter("(?m:^$)");
    int ntoken = 0;
    while (scanner.hasNext()) {
    String token = scanner.next();
    ntoken++;
    System.out.printf("%3d) %s%n", ntoken, token);
    }
}

Count Words in a File

The default delimiter used by the Scanner is whitespace. It returns text tokens separated by whitespace. Let us use this fact to count the words in a file. The following code prints the value as well as its index from the file. Note that the Scanner implements the AutoCloseable interface so we can use it in try-with-resources block.

try (Scanner scanner = new Scanner(new File(filename));) {
    int nword = 0;
    while (scanner.hasNext()) {
    String sent = scanner.next();
    nword++;
    System.out.printf("%3d) %s%n", nword, sent);
    }
}

 

[ DJ Recommends: Best Practices For Java Exception Handling ]

Conclusion

Java’s Scanner class makes parsing text trivial. By setting the proper delimiter, we can accomplish various parsing tasks easily.

RELATED ARTICLES

Most Popular