Java - String

Updated: 2018-12-10

CharSequence vs String vs StringBuilder vs StringBuffer

  • String: immutable; StringBuilder/StringBuffer: modifiable
  • StringBuffer: thread safe(all its methods are declared as "synchronized")
  • StringBuilder: not thread safe, better performance.
  • CharSequence: an Interface. String, StringBuffer and StringBuilder all implement CharSequence.

Format

String.format

width 12, right align

String.format("%12s", "my-text")

width 12, left align

String.format("%-12s", "my-text")

width 12(11+%), precision 2, float number with %

String.format("%11.2f%%", rate * 100)

DecimalFormat

Import DecimalFormat first:

jshell> import java.text.DecimalFormat

Note that 0.789 became 0.8 due to the format:

jshell> new DecimalFormat("###,###.#").format(123456.789)
$1 ==> "123,456.8"

This is handy to format the number to dollars:

jshell> new DecimalFormat("$###,###.##").format(123456.789)
$2 ==> "$123,456.79"

SubString

String s = "{asdf}";
System.out.println(s.substring(1, s.length() - 1));
//asdf

Parse ArrayList toString() result

ArrayList's toString() will generate a string like [0, 1, 2]. To parse it:

String trimmed = rawString.substring(1, rawString.length() - 1);
String[] parts = StringUtils.split(trimmed, ",");

new String() vs String Literal

Compare:

String s = new String("foo");
String s = "foo";
  • new String(): creates new object in heap; time and memory consuming.
  • String literal: creates string literal only once in constant pool.

Checkout String Comparison section for examples.

== vs .equals()

To test if 2 String objects are equal, use .equals or .equalsIgnoreCase()

String a = new String("foo");
String b = new String("foo");

// false
System.out.println(a == b);

// true
System.out.println(a.equals(b));

However String literals can be tested by ==

String c = "foo";
String d = "foo";

// true
System.out.println(c == d);

// true
System.out.println(c.equals(d));

// false
System.out.println(b == c);

// true
System.out.println(b.equals(c));

More example

String e = "f" + "oo";
// true
System.out.println(c == e);

String Split

import org.apache.commons.lang.StringUtils;
String raw = "1|2|3|4";

for (String s : StringUtils.split(raw, "|")) {
    System.out.println(s);
}
/*
1
2
3
4*/

for (String s : raw.split("|")) {
    System.out.println(s);
}

/*
1
|
2
|
3
|
4*/

for (String s : raw.split("\\|")) {
    System.out.println(s);
}
/*
1
2
3
4*/

ignore trailing matches

System.out.println("/".split("/").length);
0

System.out.println("/a".split("/").length);
2

System.out.println("//a/".split("/").length);
3

System.out.println("//a".split("/").length);
3

System.out.println("///".split("/").length);
0

Replace: String.replace() vs String.replaceAll()

Compare the 4 replace functions:

String replace(char oldChar, char newChar)
String replace(CharSequence target, CharSequence replacement)
String replaceAll(String regex, String replacement)
String replaceFirst(String regex, String replacement)

The difference: RegEx: replace() only replaces plain text, while replaceAll and replaceFirst() will take a regular expression

String vs StringBuffer vs CharArray

String: cannot change once defined. StringBuffer: can change.

Comparison:

  • only check if they point to the same object

    string1==string2
  • check if the string content are the same.

    string1.equals(string2)

In Java String is immutable. Convert a String to CharArray if necessary.

String s = new String("hello");

// String to CharArray
char[] c = s.toCharArray();

// CharArray to String
String s2 = new String(c);

Convert StringBuffer to CharArray

StringBuffer strBuf = new StringBuffer("hello");

// StringBuffer to String to CharArray
char[] c = strBuf.toString().toCharArray();

// CharArray to String to StringBuffer
StringBuffer strBuf2 = new StringBuffer(new String(c));

Bytes

Convert String to Bytes:

jshell> "abcd".getBytes()
$1 ==> byte[4] { 97, 98, 99, 100 }

Or use Charset:

jshell> import java.nio.charset.Charset;

jshell> Charset.forName("UTF-8").encode("abcd").array()
$2 ==> byte[4] { 97, 98, 99, 100 }

The default Charset is UTF-8:

jshell> Charset.defaultCharset()
$3 ==> UTF-8

Try to use UTF-16. Note that UTF-16 without BE/LE, will prepend BOM(Byte Order Mark), in this case -2, -1, i.e. FE FF

jshell> "abcd".getBytes("UTF-16")
$4 ==> byte[10] { -2, -1, 0, 97, 0, 98, 0, 99, 0, 100 }

With BE or LE there's no BOM

jshell> "abcd".getBytes("UTF-16BE")
$49 ==> byte[8] { 0, 97, 0, 98, 0, 99, 0, 100 }

jshell> "abcd".getBytes("UTF-16LE")
$50 ==> byte[8] { 97, 0, 98, 0, 99, 0, 100, 0 }

However in this case the lengths are different even if they are in same encoding, there are trailing 0s in the 2nd way:

jshell> "你好".getBytes("UTF-8")
$5 ==> byte[6] { -28, -67, -96, -27, -91, -67 }

jshell> Charset.forName("UTF-8").encode("你好").array()
$6 ==> byte[11] { -28, -67, -96, -27, -91, -67, 0, 0, 0, 0, 0 }