We can remove HTML tags from a string using REGEX or Jsoup in Java. Let’s see how?
String in Java is immutable so its content cannot be changed but we can reassign a new string to the old variable i.e change the reference to the object.
Using “\\<.*?>” REGEX pattern we can remove the HTML tags from a string. All we need to do is to replace all “\\<.*?>“ matching substring with “”.
So for this, we will use replaceAll()
method of string class.
The general syntax for replaceAll() is
1 | String replaceAll(String regex, String replacement) |
So let’s see the implementation in java.
1 2 3 4 5 6 7 8 | public class Main { public static void main(String[] args) { String str = "<strong>Pencil Programmer</strong>"; str = str.replaceAll("\\<.*?>", ""); System.out.println(str); } } |
output
1 | Pencil Programmer |
Remove HTML Tags using Jsoup
Another method to remove HTML tags in Java is by using Jsoup.
Jsoup is a Java HTML parser. It is a java library that is used to parse the HTML documents.
Let’s see how we can do that.
1 2 3 4 5 6 7 8 9 10 | import org.jsoup.Jsoup; public class Main { public static void main(String[] args) { String str = "<strong>Pencil Programmer</strong>"; str = Jsoup.parse(str).text(); System.out.println(str); } } |
output
1 | Pencil Programmer |
Note: You need to download and install Jsoup before using it in your program.
If you have any doubts or suggestions then please comment below.