he secret behind any successful business operation is their powerful data collection and strategizing the business activities based on the data. Collection of data can be characterized in 2 types. One is live data and second is the data that should be read at a periodic interval to check any updation/revision. In both the cases it is important to automate the data collection to avoid manual efforts and errors and also to speed up the data collection. If the data is available in the form of API or web service, it can be retrieved through SOAP or REST APIs. But if the data is to be extracted from web pages then we have to use a headless browser to parse the web page and extract the data.
HtmlUnit is a Java library which provides the functionality of headless browser and allows simulating the behavior of all the major browsers prevailing in the market (IE, Firefox, Chrome etc…). We can also configure it to act like a mobile device browser. It has a very good support of developer community and the frequency of releases is also good incorporating new features.
In this blog I will try to show how HtmlUnit can be used to collect necessary data from internet using Java and possible use cases where HtmlUnit can help in gathering required data.