How Sentdex and Sentiment Analysis works


For any of the topics we track, be it stocks, politics, or something else entirely, we track the sentiment surrounding these topics.

The "Sentiment" is the views, or opinions, people have, measured on a scale from positive to negative.


Sources of data for sentiment analysis

Depending on the topics, our sources differ.

For stocks, Sentdex pulls from over 20 sources, mainly: Reuters, Bloomberg, WSJ, LA Times, CNBC, Forbes, Business Insider, and Yahoo Finance.

For politics , we pull from: CNN, Fox, USA Today, ABC, MSNBC, CNBC, CBS, Huffington Post, Yahoo Politics, Washington Post, and Reuters.

Geographical sentiment is pulled from Twitter.

Bitcoin sentiment is pulled mainly from Reddit as the source of content, though most content that is digested contains links that lead off of Reddit. The algorithm reads both the articles linked to Reddit posts, as well as the comments. The algorithm does *not* read votes on threads.


How Sentdex judges Sentiment

At its heart, Sentdex is a bot that reads the news like you or me. As Sentdex reads articles, it pulls out what are known as "Named Entities" through a natural language process called "named entity recognition." Once Sentdex decides what an article, paragraph, or even just a sentence is talking about, it then looks for opinions.

This is the more challenging part of using Natural Language Processing to derive sentiment from written-language. At the core, this process involves a lot of what is called "chunking" to group bits of text into noun-phrases, which contain adjectives and adverbs, along with some other information about what is being said.

From here, Sentdex is able to decide, mainly given the adjectives (while also looking out for reverse-meaning words like the difference between "good" and "not good"), whether or not the author of the text has a positive attitude or a negative attidude towards the subject, or the Named Entity, in question.