Lecture 0: Introduction to the Course Logistics
Welcome to the Cornell Data Science Training program!
Through these lectures and exercises, we aim to teach you basic data science concepts and how to apply them using a data science language. The topics that we will cover will include, but will not be limited to:
We'll start with the basics of the language, to advanced applications in more advanced concepts. By the end of the course, you will have the foundation and basic skills to contribute to any subteam on Cornell Data Science, and to your career in this explosive field.
Recap: Why Data Science? And why Python?
Before we begin to explore Numpy and all of its applications, we need to remember why data science is so important today and why data scientists choose Python. Data science can be thought of as the basis for empirical research, since data is used to inform our hypotheses and provide observations. In many cases, this data is used either by businesses or by scientists to inform their understanding of a phenomenon. We use a combination of exploratory data analysis and modeling to draw conclusions from larges troves of data. Data science allows us to:
Our more recent ability to collect data in real time from many places in cluding websites, smart phones, and environmental sensors makes data science indredibly relevent in almost every industry, scientific discipline, or engineering endeavor today.
So why is Python a good programming language for data science? Well, it is:
Thanks to the efforts of this community, it offers an ever-growing set of data management, analytical processing, and visualization libraries, like Numpy, Pandas, and Sci-Kit Learn! Such libraries make Python applicable to every aspect of data science. Lastly, but very importantly, Jupyter Notebooks make Python-based analysis more producible and repeatable. They also provide built-in training and communication support to help with team collaboration.
Getting Started with Jupyter Notebooks
Jupyter Notebooks have led to it rapidly gaining broad acceptance within the data science community. Here are some of the key features of Jupyter Notebooks:
So how do we set up the Jupyter Notebook on our computer? Here are the instructions you need:
Instruction to Install Jupyter Notebook
pip3 install jupyter
where pip3
is a Python Package Manager for version 3.
However, if you are new to Python, we highly recommend downloading Anaconda.
Anaconda is a Python distribution, which provide everything you need for python data science, including Python language itself, different libraries, and a package manager. If you don't really understand what we are talking about here, no worries! The main point here is that if you are new to Python, then downloading Anaconda distribution will make your life easier. Ahaha
Check the Anaconda setup instruction and install the Jupyter notebook. If you have any questions, please come to Office Hour and get help from the staffs. You need to make sure that it runs on your computer before the project comes!
Once you finish setting it up on your computer, type the following command in your terminal or cmd:
jupyter notebook
and you will be able to launch the notebook on your computer's default web browser. Check this by downloading this note on your computer and open it with Jupyter notebook viewer!