Content tagged with "Partridge"

Here is a recording of my recent keynote talk on the power of Natural Language processing through Watson and my academic/PhD topic – Partridge – at York Doctoral Symposium.
  • 0-11 minutes – history of mankind, invention and the acceleration of scientific progress (warming people to the idea that farming out your scientific reading to a computer is a much better idea than trying to read every paper written)
  • 11-26 minutes – My personal academic work – scientific paper annotation and cognitive scientific research using NLP
  • 26- 44 minutes – Watson – Jeopardy, MSK and Ecosystem
  • 44 – 48 minutes Q&A on Watson and Partridge
  • Please don’t cringe too much at my technical explanation of Watson – especially those of you who know much more about WEA and the original DeepQA setup than I do! This was me after a few days of reading the original 2011 and 2012 papers and making copious notes!

    Read more...

    Hoorah! After a number of weeks I’ve finally managed to get SAPIENTA running inside docker containers on our EBI cloud instance. You can try it out at http://sapienta.papro.org.uk/.

    The project was previously running via a number of very precarious scripts that had a habit of stopping and not coming back up. Hopefully the new docker environment should be a lot more stable.

    Another improvement I’ve made is to create a websocket interface for calling the service and a Python-based commandline client. If you’re interested I’m using socket.io and the relevent python libraries (server and client). This means that anyone who needs to can now request annotations in large batches. I’m planning on using socket.io to interface Partridge with SAPIENTA since they are hosted on separate servers and this approach avoids any complicated firewall issues.

    Read more...

    Introduction

    As part of my continuing work on Partridge, I’ve been working on improving the sentence splitting capability of SSSplit – the component used to split academic papers from PLosOne and PubMedCentral into separate sentences.

    Papers arrive in our system as big blocks of text with the occasional diagram, formula or diagram and in order to apply CoreSC annotations to the sentences we need to know where each sentence starts and ends. Of course that means we also have to take into account the other ‘stuff’ (listed above) floating around in the documents too. We can’t just ignore formulae and citations – they’re pretty important! That’s what SSSplit does. It carves up papers into sentence () elements whilst also leaving the XML structure of the rest of the document in tact.

    Read more...