Predict the age of abalone from physical measurements

Predicting the age of abalone from physical measurements. The age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope -- a boring and time-consuming task. Other measurements, which are easier to obtain, are used to predict the age. Further information, such as weather patterns and location (hence food availability) may be required to solve the problem.

From the original data examples with missing values were removed (the majority having the predicted value missing), and the ranges of the continuous values have been scaled for use with an ANN (by dividing by 200).

- Dataset contains accurate observations
- No missing values in the dataset
- Data format is already prepared for importing and analysis

- Iris Plant Class Prediction Model

To practice building a prediction model using machine learning techniques.

- Gain a deeper understanding of machine learning techniques.
- Obtain more experience with Python and data science projects.
- Can adapt model to other projects.

The model could be used for other problems that require prediction analysis.

The age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope.

Predict age based on size of abalone.

Write a program that estimates the age based on the overall size of the abalone. It would return an age range based on the size.

- Number of instances:
**4177** - Number of attributes:
**8** - Target variable:
**Age**

Given is the attribute name, attribute type, the measurement unit and a brief description. The number of rings is the value to predict: either as a continuous value or as a classification problem.

Name | Data Type | Meas. | Description |
---|---|---|---|

Sex | nominal | M, F, and I (infant) | |

Length | continuous | mm | Longest shell measurement |

Diameter | continuous | mm | perpendicular to length |

Height | continuous | mm | with meat in shell |

Whole weight | continuous | grams | whole abalone |

Shucked weight | continuous | grams | weight of meat |

Viscera weight | continuous | grams | gut weight (after bleeding) |

Shell weight | continuous | grams | after being dried |

Rings | integer | +1.5 gives the age in years | |

In [1]:

```
# Import Libaries
import pandas as pd
import numpy as np
```

In [2]:

```
# Import Data
data_location = "/Users/wmemorgan/Google Drive/Computer_Data_Science_Lab/abalone/data/02_prepared_data/abalone.data"
column_names = ['Sex','Length','Diameter','Height','Whole_Weight',
'Shucked_Weight','Viscera_Weight','Shell_Weight','Rings']
data = pd.read_csv(data_location, names=column_names)
```

In [3]:

```
#Verify number of observations
len(data)
```

Out[3]:

In [4]:

```
# Shape
print(data.shape)
```

In [5]:

```
data.head()
```

Out[5]:

In [6]:

```
data.info()
```

In [7]:

```
data.describe()
```

Out[7]:

In [8]:

```
# Class distribution by Gender
print(data.groupby('Sex').size())
```

In [9]:

```
# Class distribution by Rings
print(data.groupby('Rings').size())
```

In [10]:

```
# Import Libaries
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
```

In [11]:

```
# box and whisker plots
data.plot(kind='box', subplots=True, layout=(2,4), figsize=(12,11), sharex=False, sharey=False)
plt.show()
```

In [12]:

```
# histograms
data.hist(figsize=(12,8))
plt.show()
```

In [13]:

```
sns.pairplot(data=data, hue="Rings")
plt.show()
```

In [33]:

```
# scatter plot matrix
from pandas.plotting import scatter_matrix
scatter_matrix(data, figsize=(12,8))
plt.show()
```