From ACT4D Project Wiki
With the rapid growth of the web and the Internet, the amount of information (data) that is available online has exploded and is increasing at an enormous rate. It is difﬁcult for users to keep track of such content and to make meaningful sense of it. The problem is particularly evident in the news domain with thousands of news sites and blogs all over the web. Moreover, given the evolving nature of news media the situation becomes more complex as it is challenging to remain up-to-date with current happenings as well as being aware of their relation to past events or incidents. Therefore, there is a growing need for development of automatic techniques and systems that extract semantic content from a given source and present the same to the user in a meaningful and effective manner (Shahaf and Guestrin ). The above mentioned problem is referred to as Information Overload in the computing literature and various methods and approaches have been presented to tackle this problem.
We aim to develop a novel way of visualizing news which places importance on the interactions between entities involved in news.
Named Entity Recognition
We initially used Calais Web service Link to identify named entities and facts from news articles. We made use of the python-calais framework to connect to the API and get the relevant content. The README provided explains the usage
- Illinois Named Entity Recognizer
In the end, we ended up using the Illinois NER to identify named entities. Download the code at