Kusuri, is a Named Entity Recognizer optimized to detect and extract drugs mentions in tweets. Kusuri's interface is RESTful service. Kusuri performs its classification using either a CNN or a lexicon + BERT classifier. Kusuri has been optimized for speed in order to process millions of tweets in hours. This version is an updated and simplified version of the first system described in [Weisenbacher et al., JAMIA 2019] (https://doi.org/10.1093/jamia/ocz156).
Kusuri provide one service predict through a POST request. The parameters of the prediction should be posted as a JSON with the following format:
{'tweets': [{'tweet_id': 1, 'text': 'the first tweet'}, {'tweet_id': '2', 'text': 'the second tweet'}, ...], 'classifier':'classifier_available','lexicon': [{'drug':'drug1'}, {'drug':'drug2'}, {'drug':'drug3'}]}
If the prediction is successful, Kusuri will return a JSON with the following format:
{'tweets': [{'tweetID': 1, 'text': 'the first tweet', 'drugDetected':None, 'prediction':0}, {'tweetID': '2', 'text': 'the second tweet'}, , 'drugDetected':['the second'], 'prediction':1]}
{'tweets': [{'tweetID': 1, 'text': 'the first tweet', 'drugDetected':None, 'prediction':0}, {'tweetID': '2', 'text': 'the second tweet'}, , 'drugDetected':['the second'], 'prediction':1]}
'''
Created on Aug 24, 2020
@author: dweissen
'''
import requests
import pandas as pd
import json
from sklearn.metrics import confusion_matrix, classification_report
import logging as lg
import sys
if __name__ == '__main__':
exJSON = {}
exJSON['lexicon'] = [{"drug":"first"}, {"drug":"drug2"}, {"drug":"drug3"}]
exJSON["classifier"]="lexicon"
#exJSON['tweets'] = json.loads(examples.to_json(orient="records"))
#For testing the service if needed:
exJSON = {
"tweets": [
{"tweetID":"10", "text":"My first tweet","label":"1"},
{"tweetID":"11", "text":"My second tweet","label":"2"},
{"tweetID":"12", "text":"My third tweet","label":"1"},
{"tweetID":"13", "text":"My fourth tweet","label":"3"}],
"lexicon": [
{"drug":"first"},
{"drug":"drug2"},
{"drug":"drug3"}],
"classifier":"lexicon"
}
# POST
resp = requests.post('https://hlp.ibi.upenn.edu/kuuri/v0.1/predict', json=exJSON)
if resp.status_code != 200:
raise Exception(f'POST /predict/ ERROR: {resp.status_code}')
else:
exJSON = resp.json()
if 'errors' in exJSON:
raise Exception(f'POST /predict/ ERROR: {resp.json()}')
elif 'tweets' in exJSON:
#TODO: to see what is the format sent gand normalize that in a dataframe!
df = pd.json_normalize(exJSON['tweets'])
lg.info(f'Prediction done for {len(df)} tweets.')
lg.info(f'First 10 tweets:\n{df.head(10)}')
df.to_csv('/tmp/outDrug.tsv', sep='\t')
else:
raise Exception(f'POST /predict/ ERROR: unexpected json received from the rest: {exJSON}')