Home » Portfolio » Document summarization

Summarizing documents

Automatically extract summaries from texts

In the digital age, information travels mainly on the web. Millions of articles, blog posts and books are made available on the web every day. All this information is certainly an inexhaustible source but at the same time very difficult to manage. Extracting the information of interest and providing it in a concise way is a subject that has always fascinated me. During my university career I studied data mining to understand how to deal with this problem and find solutions. Using digital documents opens up an immense world of research. Over the years hundreds, if not thousands, of approaches have been developed to properly manage document collections and analyze the information they contain. University courses often provide only the basics to begin to address the problem. Some books provide some extra information and sometimes even suggest how to use these techniques on concrete case studies. All the rest of the knowledge is “hidden” in scientific articles that only PhD students and researchers usually read. After an in-depth study of all these issues, I focused on text summarization.

But what is text summarization? Very simply, this branch of research studies innovative methods to extract summaries from texts. The approaches that are developed are mainly based on decomposing the text of one or more documents into smaller units such as sentences or words. These units are then analyzed using different techniques to understand what are the main concepts that are treated and possibly their dependencies. Finally, we will recreate a text composed of a maximum of a hundred words that contains the most relevant phrases or concepts. The proposed solutions are therefore an attempt to replicate what the human mind does when it reads a school book to create its mental maps to better address a question or a university exam. Obviously the computational power of computers allows to analyze large amounts of documents in a very short time, even if the quality of the summaries obtained is not yet the best possible.

In this field, therefore, I have tried to offer a contribution too. I have written some articles on the subject and participated in international conferences on these topics. If you are interested in the results I have obtained you can consult the following articles

In order to provide support to the scientific community I have decided to be the publisher of some books on the subject

These books are a collection of work by other scholars who explore different approaches to the automatic generation of summaries and examine their current applications in the real world in different fields. However, as they focus on advanced topics, they are primarily aimed at researchers, scholars and IT professionals who already have previous knowledge of data mining.

If you are interested in better understanding these topics and building a cultural background, I suggest the following books:

Document summarization

Summarizing documents

Automatically extract summaries from texts

Progetta con MongoDB!!!