Summary: In this programming example, we will learn to remove HTML tags from a string using REGEX or Jsoup in Java.
String in Java is immutable so its content cannot be changed but we can reassign a new string to the old variable( i.e. change the reference to the object) to change its value.
Method 1: Using Regex
Using the “\\<.*?>” REGEX pattern we can remove the HTML tags from a string.
All we need to do is to replace all substrings matching “\\<.*?>” with “” (null string) using the replaceAll()
method of string class.
The general syntax for replaceAll()
is:
String replaceAll(String regex, String replacement)
So let’s see the implementation in Java:
public class Main {
public static void main(String[] args) {
String str = "<strong>Pencil Programmer</strong>";
str = str.replaceAll("\\<.*?>", "");
System.out.println(str);
}
}
Output: Pencil Programmer
Method 2: Using Jsoup
Jsoup is a Java HTML parser. It is a java library that is used to parse HTML documents.
We can use the Jsoup parse to remove HTML entities from the Java string as follows:
import org.jsoup.Jsoup;
public class Main {
public static void main(String[] args) {
String str = "<strong>Pencil Programmer</strong>";
str = Jsoup.parse(str).text();
System.out.println(str);
}
}
Output: Pencil Programmer
Note: You need to download and install Jsoup before using it in your program.
These are the two methods using which we can remove HTML tags from the given string in Java.