** Histogram Plots **
It's a graphical representation of a frequency distribution of some numerical data. Rectangles with equal sizes in the horizontal directions have heights with the corresponding frequencies.
If we construct a histogram, we start with distribute the range of possible x values into usually equal sized and adjacent intervals or bins
# import
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# generating some data points
X = np.random.random_integers(20, 50, 1000)
Y = np.random.random_integers(20, 50, 1000)
** Plotting Histogram **
plt.hist(X)
plt.xlabel("Value of X")
plt.ylabel("Freq")
<matplotlib.text.Text at 0x9266588>
gaussian_numbers = np.random.normal(size=10000)
plt.hist(gaussian_numbers)
plt.title("Gaussian Histogram")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()
n, bins, patches = plt.hist(gaussian_numbers)
print("n: ",n, np.sum(n)) # freq
print("bins: ", bins)
print("patches: ", patches)
for p in patches:
print(p,)
n: [ 8. 75. 460. 1367. 2628. 2800. 1834. 638. 176. 14.] 10000.0 bins: [-3.84331083 -3.09940042 -2.35549002 -1.61157962 -0.86766922 -0.12375882 0.62015159 1.36406199 2.10797239 2.85188279 3.59579319] patches: <a list of 10 Patch objects> Rectangle(-3.84331,0;0.74391x8) Rectangle(-3.0994,0;0.74391x75) Rectangle(-2.35549,0;0.74391x460) Rectangle(-1.61158,0;0.74391x1367) Rectangle(-0.867669,0;0.74391x2628) Rectangle(-0.123759,0;0.74391x2800) Rectangle(0.620152,0;0.74391x1834) Rectangle(1.36406,0;0.74391x638) Rectangle(2.10797,0;0.74391x176) Rectangle(2.85188,0;0.74391x14)
By default, hist is using 10 equal bins to plot the data, we can increase this no by using bins=n
n, bins, patches = plt.hist(gaussian_numbers, bins=100)
Another important keyword parameter of hist is "normed". "normed" is optional and the default value is 'False'. If it is set to 'True', the first element of the return tuple will be the counts normalized to form a probability density,
i.e., "n/(len(x)`dbin)", ie the integral of the histogram will sum to 1.
n, bins, patches = plt.hist(gaussian_numbers, bins=100, normed=True)
If both the parameters 'normed' and 'stacked' are set to 'True', the sum of the histograms is normalized to 1.
plt.hist(gaussian_numbers,
bins=100,
normed=True,
stacked=True,
edgecolor="#6A9662",
color="#DDFFDD")
plt.show()
can plot it as a cumulative distribution function as well by setting the parameter 'cumulative'
plt.hist(gaussian_numbers,
bins=100,
normed=True,
stacked=True,
cumulative=True)
plt.show()
** Bar Plots **
bars = plt.bar([1,2,3,4], [1,4,9,16])
bars[0].set_color('green')
plt.show()
f=plt.figure()
ax=f.add_subplot(1,1,1)
ax.bar([1,2,3,4], [1,4,9,16])
children = ax.get_children()
children[3].set_color('g')
years = ('2010', '2011', '2012', '2013', '2014')
visitors = (1241, 50927, 162242, 222093, 296665 / 8 * 12)
index = np.arange(len(visitors))
bar_width = 1.0
plt.bar(index, visitors, bar_width, color="green")
plt.xticks(index + bar_width / 2, years) # labels get centered
plt.show()