Content tagged with "Demo"

Introduction

As part of my continuing work on Partridge, I’ve been working on improving the sentence splitting capability of SSSplit – the component used to split academic papers from PLosOne and PubMedCentral into separate sentences.

Papers arrive in our system as big blocks of text with the occasional diagram, formula or diagram and in order to apply CoreSC annotations to the sentences we need to know where each sentence starts and ends. Of course that means we also have to take into account the other ‘stuff’ (listed above) floating around in the documents too. We can’t just ignore formulae and citations – they’re pretty important! That’s what SSSplit does. It carves up papers into sentence () elements whilst also leaving the XML structure of the rest of the document in tact.

Read more...