#!/usr/bin/env python
# coding: utf-8

# # 一维数据结构：Series

# In[1]:


import numpy as np
import pandas as pd


# `Series` 是一维带标记的数组结构，可以存储任意类型的数据（整数，浮点数，字符串，`Python` 对象等等）。
# 
# 作为一维结构，它的索引叫做 `index`，基本调用方法为
# 
#     s = pd.Series(data, index=index)
#     
# 其中，`data` 可以是以下结构：
# 
# - 字典
# - `ndarray`
# - 标量，例如 `5`
# 
# `index` 是一维坐标轴的索引列表。

# ## 从 ndarray 构建

# 如果 `data` 是个 `ndarray`，那么 `index` 的长度必须跟 `data` 一致：

# In[2]:


s = pd.Series(np.random.randn(5), index=["a", "b", "c", "d", "e"])

s


# 查看 `index`：

# In[3]:


s.index


# 如果 `index` 为空，那么 `index` 会使用 `[0, ..., len(data) - 1]`：

# In[4]:


pd.Series(np.random.randn(5))


# ## 从字典中构造

# 如果 `data` 是个 `dict`，如果不给定 `index`，那么 `index` 将使用 `dict` 的 `key` 排序之后的结果：

# In[5]:


d = {'a' : 0., 'b' : 1., 'c' : 2.}

pd.Series(d)


# 如果给定了 `index`，那么将会按照 `index` 给定的值作为 `key` 从字典中读取相应的 `value`，如果 `key` 不存在，对应的值为 `NaN`（not a number, `Pandas` 中的缺失默认值）：

# In[6]:


pd.Series(d, index=['b', 'd', 'a'])


# ## 从标量值构造

# 如果 `data` 是标量，那么 `index` 值必须被指定，得到一个值为 `data` 与 `index` 等长的 `Series`：

# In[7]:


pd.Series(5., index=['a', 'b', 'c', 'd', 'e'])


# ## 像 ndarray 一样使用 Series

# In[8]:


s


# 支持数字索引操作：

# In[9]:


s[0]


# 切片：

# In[10]:


s[:3]


# `mask` 索引：

# In[11]:


s[s > s.median()]


# 花式索引：

# In[12]:


s[[4, 3, 1]]


# 支持 `numpy` 函数：

# In[13]:


np.exp(s)


# ## 像字典一样使用 Series

# 也可以像字典一样使用 `Series`：

# In[14]:


s["a"]


# 修改数值：

# In[15]:


s["e"] = 12.

s


# 查询 `key`：

# In[16]:


"e" in s


# In[17]:


"f" in s


# 使用 `key` 索引时，如果不确定 `key` 在不在里面，可以用 `get` 方法，如果不存在返回 `None` 或者指定的默认值：

# In[18]:


s.get("f", np.nan)


# ## 向量化操作

# 简单的向量操作与 `ndarray` 的表现一致：

# In[19]:


s + s


# In[20]:


s * 2


# 但 `Series` 和 `ndarray` 不同的地方在于，`Series` 的操作默认是使用 `index` 的值进行对齐的，而不是相对位置：

# In[21]:


s[1:] + s[:-1]


# 对于上面两个不能完全对齐的 `Series`，结果的 `index` 是两者 `index` 的并集，同时不能对齐的部分当作缺失值处理。

# ## Name 属性

# 可以在定义时指定 `name` 属性：

# In[22]:


s = pd.Series(np.random.randn(5), name='something')
s.name