Content tagged with "Python"

Retrieve and Rank and Python

Published on November 16, 2015 by James Ravenscroft

Introduction

Retrieve and Rank (R&R), if you hadn’t already heard about it, is IBM Watson’s new web service component for information retrieval and question answering. My colleague Chris Madison has summarised how it works in a high level way here.

R&R is based on the Apache SOLR search engine with a machine learning result ranking plugin that learns what answers are most relevant given an input query and presents them in the learnt “relevance” order.

SSSplit Improvements

Published on July 15, 2015 by James Ravenscroft

#phd #demo #partridge #python #sapienta

Introduction

As part of my continuing work on Partridge, I’ve been working on improving the sentence splitting capability of SSSplit – the component used to split academic papers from PLosOne and PubMedCentral into separate sentences.

Papers arrive in our system as big blocks of text with the occasional diagram, formula or diagram and in order to apply CoreSC annotations to the sentences we need to know where each sentence starts and ends. Of course that means we also have to take into account the other ‘stuff’ (listed above) floating around in the documents too. We can’t just ignore formulae and citations – they’re pretty important! That’s what SSSplit does. It carves up papers into sentence (~~) elements whilst also leaving the XML structure of the rest of the document in tact.~~

~~Read more...~~

Tidying up XML in one click

Published on June 28, 2015 by James Ravenscroft

#phd #sapienta #python

When I’m working on Partridge and SAPIENTA, I find myself dealing with a lot of badly formatted XML. I used to manually run xmllint –format against every file before opening it but that gets annoying very quickly (even if you have it saved in your bash history). So I decided to write a Nemo script that does it automatically for me.

#!/bin/sh for xmlfile in $NEMO_SCRIPT_SELECTED_FILE_PATHS; do if [[ $xmlfile == *.xml ]] then xmllint --format $xmlfile > $xmlfile.tmp rm $xmlfile mv $xmlfile.tmp $xmlfile fi done Read more...

1

2

3