We can remove HTML tags from a string using REGEX or Jsoup in Java. Let’s see how?

String in Java is immutable so its content cannot be changed but we can reassign a new string to the old variable i.e change the reference to the object.

Using “\\<.*?>” REGEX pattern we can remove the HTML tags from a string. All we need to do is to replace all “\\<.*?>“ matching substring with “”.

So for this, we will use replaceAll() method of string class.

The general syntax for replaceAll() is

So let’s see the implementation in java.

output

Remove HTML Tags using Jsoup

Another method to remove HTML tags in Java is by using Jsoup.

Jsoup is a Java HTML parser. It is a java library that is used to parse the HTML documents.

Let’s see how we can do that.

output

Note: You need to download and install Jsoup before using it in your program.

If you have any doubts or suggestions then please comment below.

Leave a Reply