March 18, 2016

Development of the BibleTag API - Collaborative Tagging of Text, Sentiment Analysis, and More

Background Thoughts on Sentiment Analysis

Text mining, sentiment analysis, NLP, etc. are all the rage these days, and (at least partially) for good reason. There are some really exciting projects coming out in this area that will have a huge impact. It's really exciting to see projects utilizing machine learning to intelligently translate text, cluster and mine new articles, and more.

However, I have come to the following general conclusions related to the BibleTag-API project I will discuss below:

  1. Finding the right questions to answer or actionable metrics to calculate from textual data sets is the bulk of the work: Although there is power in key words searches through text, there is so much more power in returning relevant textual content based on a query or event. We can think we're being very clever indexing our system logs or scraping certain things from across the web, but all is for not if those indexed logs or scraped things provide answers to questions that are not at the heart of a problem. This is further discussed here.

  2. Sentiment analysis is hard (to get right): Although many sentiment analysis packages are being developed (see here or here) that allow you to quickly determine positivity levels, I have seen a general trend of doing this analysis without any validation of the results. I think this trend stems from the fact that validation is difficult in many situations, and, thus, training a good model (i.e., "getting it right") also becomes difficult. Take this study for example. How would one validate this analysis? If any emotional/sentiment analysis of the Bible reveals it is "angrier" than the Quran, how could be validate such a result? As we saw in point (1), key words analyses are prone to mistakes.

Well, I believe that the Bible is not an "angry" book, although it deals with topics of wrath, judgement, etc. An analysis of emotions, based on key word search, is prone to error in predicting actual sentiments.

Instead of imagining what we would need to do to predict sentiments from Bible verses (which is actually a small data problem), let's think about how we could develop a system that would return specific Bible passages based on sentiments, topics, or tags and would grow a repository of actually relevant tagged sentiments in the Bible.

The Ideas Behind the BibleTag-API

Ok, so we want a system that both:

For example, a certain passage of scripture (e.g., a few verses) may contain words like "wrath," "sin," "destruction," etc., but that passage may, overall, be speaking about God's salvation from such things. Thus, users of the system may tag this passage with tags like "salavation", "love", etc. that are actually relevant to the passage (more relevant in fact than an analysis based solely on key words).

To accomplish these goals and to build a Bible tagging system that could be utilized by developers in many different contexts, we (myself and a few other developers at the code for the kingdom hackathon in Nashville, TN) decided to build a free and open source REST API for collaborative tagging of the Bible, called the BibleTag-API.

Development/Usage of the BibleTag-API

The BibleTag-API (source code here) is written in Go and uses RethinkDB as a datastore. All the infrastructure is deployed currently on Digital Ocean via Docker. Here is the setup:

There is a GET endpoint to return verses based on tags and a POST endpoint to tag portions of the Bible. When returning verses, the API uses the Digital Bible Platform API to get the actual text. This provides a lot of flexibility. After some further development, the hope is to return tagged Bible verses in any format supported by the DBP including text, audio, a variety of languages, etc.

Below are example calls using cURL:

GET Request:

curl http://bibletag.xyz:8080/tag/Gospel  

GET Response:

[
  {
    "book_id": "John",
    "book_name": "John",
    "book_order": "58",
    "chapter_id": "3",
    "chapter_title": "Chapter 3",
    "paragraph_number": "68",
    "verse_id": "16",
    "verse_text": "\u201cFor God so loved the world, that he gave his only Son, that whoever believes in him should not perish but have eternal life."
  },
  {
    "book_id": "John",
    "book_name": "John",
    "book_order": "58",
    "chapter_id": "3",
    "chapter_title": "Chapter 3",
    "paragraph_number": "68",
    "verse_id": "17",
    "verse_text": "For God did not send his Son into the world to condemn the world, but in order that the world might be saved through him."
  }
]

POST Request:

curl -X POST \  
     -H "Content-Type: application/json" \
     -H "Cache-Control: no-cache" \
     -H "Postman-Token: aced34fd-2249-bff2-af6a-905ad2eeddfa" \
     -d '{
         "tag": "worry",
         "book": "1 peter",
         "chapter": 5,
         "startVerse": 7,
         "endVerse": 7
       }' "http://bibletag.xyz:8080/tag"

POST Response:

{
  "code": 200,
  "text": "Tagged Passage"
}

When the BibleTag-API receives a GET request, it searches previously received tags and returns the "most relevant" verse based on previously received tags. This definition of "most relevant" will be developed over time. In a first iteration, this is just highest ranking tag (by number of times tagged). However, future development could utilize algorithms that will determine relevancy, taking into account number of times pulled, voting/flagging, connections between a tag and other tags or between references, etc.

Not only this, over time the BibleTag-API will build a relevant repository of tagged Bible verses that can be mined to determine true sentiments expressed in the text of the Bible. We will be able to analyze a significant number of tags for many scripture passage to perform robust statistical analyses of sentiment that can be validated.

Initial/Example Integrations

The BibleTag-API has implications for interesting data analysis in the future. However, it is already useful for developers wanting to built search/tag functionality into mobile or web applications. A couple of these have already been developed:

The BibleTag SlackBot

This SlackBot allow you to insert relevant passages of scripture into Slack via /bibletag [tag]. The source code is available here, and this app will soon be on the listing of Slack apps for use in with your team.

AngularJS Web App

This AngularJS web app provides a response web UI for searching and tagging Bible verses. The source code is available here and this will soon be finalized and hosted for your use.

Conclusions

More to come soon from the BibleTag-API, and I hope to provide some interesting analyses via this blog as the tags start flowing in. This is a totally open source project, so please submit issues, pull requests, feedback etc. here or below in the comments. Enjoy!

Comments powered by Disqus