文本情感分析(也称为意见挖掘)是指用自然语言处理、文本挖掘以及计算机语言学等方法来识别和提取原素材中的主观信息。(维基百科)
简单的文本情感分析可借助已有工具包,以黑箱式操作完成。
from textblob import TextBlob
text="I am happy today. I feel sad today."
blob=TextBlob(text)
for sentence in blob.sentences:
print(sentence,sentence.sentiment)
I am happy today. Sentiment(polarity=0.8, subjectivity=1.0) I feel sad today. Sentiment(polarity=-0.5, subjectivity=1.0)
TextBlob的情感极性取值范围是[-1, 1],-1代表完全负面,1代表完全正面。
训练集为影评。
from snownlp import SnowNLP
text="我今天很快乐。我今天很愤怒。"
s=SnowNLP(text)
for sentence in s.sentences:
print(sentence,SnowNLP(sentence).sentiments)
我今天很快乐 0.971889316039116 我今天很愤怒 0.07763913772213482
SnowNLP的情感分析取值范围是[0,1],表达的是“这句话代表正面情感的概率”。
训练集为购物评价。
import jieba
sentence='这样的酒店配这样的价格还算不错'
wordList=jieba.cut(sentence)
for word in wordList:
print (word)
Building prefix dict from the default dictionary ... Dumping model to file cache C:\Users\ZZJASW~1\AppData\Local\Temp\jieba.cache Loading model cost 3.007 seconds. Prefix dict has been built succesfully.
这样 的 酒店 配 这样 的 价格 还 算 不错
import jieba
sentence='这样的酒店配这样的价格还算不错'
wordList=jieba.cut(sentence)
#####read in the stop word file
file='ChineseStopWord.txt'
f=open(file,'r',encoding='utf-8')
stopList=[]
for line in f:
line=line.strip()
stopList.append(line)
f.close()
#####remove stop words from the wordList
newWordList=[]
for word in wordList:
if word not in stopList:
newWordList.append(word)
#####examine the results
for word in newWordList:
print (word)
酒店 配 价格 算 不错
情感词典:BosonNLP数据
“假设情感值满足线性叠加原理;然后我们将句子进行分词,如果句子分词后的词语向量包含相应的词语,就加上向前的权值,其中,否定词和程度副词会有特殊的判别规则,否定词会导致权值反号,而程度副词则让权值加倍。最后,根据总权值的正负性来判断句子的情感。”
def classifyWords(wordDict):
# (1) 情感词
senList = readLines('BosonNLP_sentiment_score.txt')
senDict = defaultdict()
for s in senList:
senDict[s.split(' ')[0]] = s.split(' ')[1]
# (2) 否定词
notList = readLines('notDict.txt')
# (3) 程度副词
degreeList = readLines('degreeDict.txt')
degreeDict = defaultdict()
for d in degreeList:
degreeDict[d.split(',')[0]] = d.split(',')[1]
senWord = defaultdict()
notWord = defaultdict()
degreeWord = defaultdict()
for word in wordDict.keys():
if word in senDict.keys() and word not in notList and word not in degreeDict.keys():
senWord[wordDict[word]] = senDict[word]
elif word in notList and word not in degreeDict.keys():
notWord[wordDict[word]] = -1
elif word in degreeDict.keys():
degreeWord[wordDict[word]] = degreeDict[word]
return senWord, notWord, degreeWord
def scoreSent(senWord, notWord, degreeWord, segResult):
W = 1
score = 0
# 存所有情感词的位置的列表
senLoc = senWord.keys()
notLoc = notWord.keys()
degreeLoc = degreeWord.keys()
senloc = -1
# notloc = -1
# degreeloc = -1
# 遍历句中所有单词segResult,i为单词绝对位置
for i in range(0, len(segResult)):
# 如果该词为情感词
if i in senLoc:
# loc为情感词位置列表的序号
senloc += 1
# 直接添加该情感词分数
score += W * float(senWord[i])
# print "score = %f" % score
if senloc < len(senLoc) - 1:
# 判断该情感词与下一情感词之间是否有否定词或程度副词
# j为绝对位置
for j in range(senLoc[senloc], senLoc[senloc + 1]):
# 如果有否定词
if j in notLoc:
W *= -1
# 如果有程度副词
elif j in degreeLoc:
W *= float(degreeWord[j])
# i定位至下一个情感词
if senloc < len(senLoc) - 1:
i = senLoc[senloc + 1]
return score
###Source:http://www.jianshu.com/p/4cfcf1610a73
from textblob.classifiers import NaiveBayesClassifier
train = [
('I love this car', 'pos'),
('This view is amazing', 'pos'),
('I feel great', 'pos'),
('I am so excited about the concert', 'pos'),
("He is my best friend", 'pos'),
('I do not like this car', 'neg'),
('This view is horrible', 'neg'),
("I feel tired this morning", 'neg'),
('I am not looking forward to the concert', 'neg'),
('He is an annoying enemy', 'neg')
]
test = [
('feel happy this morning', 'pos'),
('Oh I love my friend', 'pos'),
('not like that man', 'neg'),
("this hourse not great", 'neg'),
('your song annoying', 'neg')
]
cl = NaiveBayesClassifier(train)
for sentence in test:
print (sentence[0],':',cl.classify(sentence[0]))
print ('accuracy is:', cl.accuracy(test))
feel happy this morning : neg Oh I love my friend : pos not like that man : neg this hourse not great : neg your song annoying : neg accuracy is: 0.8
from urllib.request import urlopen
html=urlopen("http://pythonscraping.com/pages/page1.html")
print(html.read())
b'<html>\n<head>\n<title>A Useful Page</title>\n</head>\n<body>\n<h1>An Interesting Title</h1>\n<div>\nLorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.\n</div>\n</body>\n</html>\n'
from bs4 import BeautifulSoup
html=urlopen("http://pythonscraping.com/pages/page1.html")
bsobj=BeautifulSoup(html.read(),"html.parser")
print (bsobj.html.body.h1)
print (bsobj.body.h1)
print (bsobj.html.h1)
<h1>An Interesting Title</h1> <h1>An Interesting Title</h1> <h1>An Interesting Title</h1>
fileName="D://example.txt"
p=open(fileName,"w") #open for writing, truncating the file first
print("hello",file=p)
print("world",file=p)
p.close()
p=open(fileName,"a") #open for writing, appending to the end of the file if it exists
print("hello world again",file=p)
p.close()