Created by Nathan Kelber and Ted Lawless for JSTOR Labs under Creative Commons CC BY License
For questions/comments/improvements, email [email protected]

To start this interactive lesson, click on the rocket ship in the top navigation and then select "Binder" to launch the Jupyter Notebook.

The rocket and binder logos

When you are in the interactive environment, you will see a Jupyter logo in the upper left-hand corner.

The Jupyter logo


Python Basics 1

Description: This lesson describes operators, expressions, data types, variables, and basic functions. Complete this lesson if you are familiar with Jupyter notebooks or have completed Getting Started with Jupyter Notebooks, but do not have any experience with Python programming. This is part 1 of 3 in the series Python Basics that will prepare you to do text analysis using the Python programming language.

Use Case: For Learners (Detailed explanation, not ideal for researchers)

Difficulty: Beginner

Completion Time: 75 minutes

Knowledge Required:

Knowledge Recommended: None

Data Format: None

Libraries Used: None

Research Pipeline: None


Getting Started with Jupyter Notebooks

Introduction

Python is the fastest-growing language in computer programming. Learning Python is a great choice because Python is:

  • Widely-adopted in the digital humanities and data science
  • Regarded as an easy-to-learn language
  • Flexible, having wide support for working with numerical and textual data
  • A skill desired by employers in academic, non-profit, and private sectors

The second most-popular language for digital humanities and data science work is R. We plan to create additional support for learning R soon. If you are interested in helping develop open educational resources for R, please reach out to Nathan Kelber ([email protected]).

The skills you'll learn in Python Basics 1-3 are general-purpose Python skills, applicable for any of the text analysis notebooks that you may explore later. They are also widely applicable to many other kinds of tasks in Python beyond text analysis.

Making Mistakes is Important

Every programmer at every skill level gets errors in their code. Making mistakes is how we all learn to program. Programming is a little like solving a puzzle where the goal is to get the desired outcome through a series of attempts. You won't solve the puzzle if you're afraid to test if the pieces match. An error message will not break your computer. Remember, you can always reload a notebook if it stops working properly or you misplace an important piece of code. Under the edit menu, there is an option to undo changes. (Alternatively, you can use command z on Mac and control z on Windows.) To learn any skill, you need to be willing to play and experiment. Programming is no different.

Expressions and Operators

The simplest form of Python programming is an expression using an operator. An expression is a simple mathematical statement like:

1 + 1

The operator in this case is +, sometimes called "plus" or "addition". Try this operation in the code box below. Remember to click the "Run" button or press Ctrl + Enter (Windows) or shift + return (OS X) on your keyboard to run the code.

In [ ]:
# Type the expression in this code block. Then run it.

Python can handle a large variety of expressions. Let's try subtraction in the next code cell.

In [ ]:
# Type an expression that uses subtraction in this cell. Then run it.

We can also do multiplication (*) and division (/). While you may have used an "×" to represent multiplication in grade school, Python uses an asterisk (*). In Python,

2 × 2

is written as

2 * 2

Try a multiplication and a division in the next code cell.

In [ ]:
# Try a multiplication in this cell. Then try a division.
# What happens if you combine them? What if you combine them with addition and/or subtraction?

When you run, or evaluate, an expression in Python, the order of operations is followed. (In grade school, you may remember learning the shorthand "PEMDAS".) This means that expressions are evaluated in this order:

  1. Parentheses
  2. Exponents
  3. Multiplication and Division (from left to right)
  4. Addition and Subtraction (from left to right)

Python can evaluate parentheses and exponents, as well as a number of additional operators you may not have learned in grade school. Here are the main operators that you might use presented in the order they are evaluated:

Operator Operation Example Evaluation
** Exponent/Power 3 ** 3 27
% Modulus/Remainder 34 % 6 4
/ Division 30 / 6 5
* Multiplication 7 * 8 56
- Subtraction 18 - 4 14
+ Addition 4 + 3 7
In [ ]:
# Try operations in this code cell.
# What happens when you add in parentheses?

Data Types (Integers, Floats, and Strings)

All expressions evaluate to a single value. In the above examples, our expressions evaluated to single numerical value. Numerical values come in two basic forms:

An integer, what we sometimes call a "whole number", is a number without a decimal point that can be positive or negative. When a value uses a decimal, it is called a float or floating-point number. Two numbers that are mathematically equivalent could be in two different data types. For example, mathematically 5 is equal to 5.0, yet the former is an integer while the latter is a float.

Of course, Python can also help us manipulate text. A snippet of text in Python is called a string. A string can be written with single or double quotes. A string can use letters, spaces, line breaks, and numbers. So 5 is an integer, 5.0 is a float, but '5' and '5.0' are strings. A string can also be blank, such as ''.

Familiar Name Programming name Examples
Whole number integer -3, 0, 2, 534
Decimal float 6.3, -19.23, 5.0, 0.01
Text string 'Hello world', '1700 butterflies', '', '1823'

The distinction between each of these data types may seem unimportant, but Python treats each one differently. For example, we can ask Python whether an integer is equal to a float, but we cannot ask whether a string is equal to an integer or a float.

To evaluate whether two values are equal, we can use two equals signs between them. The expression will evaluate to either True or False.

In [ ]:
# Run this code cell to determine whether the values are equal
42 == 42.0
In [ ]:
# Run this code cell to compare an integer with a string
15 == 'fifteen'
In [ ]:
# Run this code cell to compare an integer with a string
15 == '15'

When we use the addition operator on integers or floats, they are added to create a sum. When we use the addition operator on strings, they are combined into a single, longer string. This is called concatenation.

In [ ]:
# Combine the strings 'Hello' and 'World'

Notice that the strings are combined exactly as they are written. There is no space between the strings. If we want to include a space, we need to add the space to the end of 'Hello' or the beginning of 'World'. We can also concatenate multiple strings.

In [ ]:
# Combine three strings

When we use addition operator, the values must be all numbers or all strings. Combining them will create an error.

In [ ]:
# Try adding a string to an integer
'55' + 23

Here, we receive the error can only concatenate str (not "int") to str. Python assumes we would like to join two strings together, but it does not know how to join a string to an integer. Put another way, Python is unsure if we want:

'55' + 23

to become

'5523'

or

78

We can multiply a string by an integer. The result is simply the string repeated the appropriate number of times.

In [ ]:
# Multiply a string by an integer

Variables

A variable is like a container that stores information. There are many kinds of information that can be stored in a variable, including the data types we have already discussed (integers, floats, and string). We create (or initialize) a variable with an assignment statement. The assignment statement gives the variable an initial value.

In [ ]:
# Initialize an integer variable and add 22 
new_integer_variable = 5
new_integer_variable + 22

The value of a variable can be overwritten with a new value.

In [ ]:
# Overwrite the value of my_favorite_number when the commented out line of code is executed. 
# Remove the # in the line "#my_favorite_number = 9" to turn the line into executable code.

my_favorite_number = 7
my_favorite_number = 9
my_favorite_number
In [ ]:
# Overwriting the value of a variable using its original value
cats_in_house = 1
cats_in_house = cats_in_house + 2
cats_in_house
In [ ]:
# Initialize a string variable and concatenate another string
new_string_variable = 'Hello '
new_string_variable + 'World!'

You can create a variable with almost any name, but there are a few guidelines that are recommended.

Variable Names Should be Descriptive

If we create a variable that stores the day of the month, it is helpful to give it a name that makes the value stored inside it clear like day_of_month. From a logical perspective, we could call the variable almost anything (hotdog, rabbit, flat_tire). As long as we are consistent, the code will execute the same. When it comes time to read, modify, and understand the code, however, it will be confusing to you and others. Consider this simple program that lets us change the days variable to compute the number of seconds in that many days.

In [ ]:
# Compute the number of seconds in 3 days
days = 3
hours_in_day = 24
minutes_in_hour = 60
seconds_in_minute = 60

days * hours_in_day * minutes_in_hour * seconds_in_minute

We could write a program that is logically the same, but uses confusing variable names.

In [ ]:
hotdogs = 60
sasquatch = 24
example = 3
answer = 60

answer * sasquatch * example * hotdogs

This code gives us the same answer as the first example, but it is confusing. Not only does this code use variable names that are confusing, it also does not include any comments to explain what the code does. It is not clear that we would change example to set a different number of days. It is not even clear what the purpose of the code is. As code gets longer and more complex, having clear variable names and explanatory comments is very important.

Variable Naming Rules

In addition to being descriptive, variable names must follow 3 basic rules:

  1. Must be one word (no spaces allowed)
  2. Only letters, numbers and the underscore character (_)
  3. Cannot begin with a number
In [ ]:
# Which of these variable names are acceptable? 
# Comment out the variables that are not allowed in Python and run this cell to check if the variable assignment works. 
# If you get an error, the variable name is not allowed in Python.

$variable = 1
a variable = 2
a_variable = 3
4variable = 4
variable5 = 5
variable-6 = 6
variAble = 7
Avariable = 8

Variable Naming Style Guidelines

The three rules above describe absolute rules of Python variable naming. If you break those rules, your code will create an error and fail to execute properly. There are also style guidelines that, while they won't break your code, are generally advised for making your code readable and understandable. These style guidelines are written in the Python Enhancement Proposals (PEP) Style Guide.

The current version of the style guide advises that variable names should be written:

lowercase, with words separated by underscores as necessary to improve readability.

If you have written code before, you may be familiar with other styles, but these notebooks will attempt to follow the PEP guidelines for style. Ultimately, the most important thing is that your variable names are consistent so that someone who reads your code can follow what it is doing. As your code becomes more complicated, writing detailed comments with # will also become more important.

Functions

Many different kinds of programs often need to do very similar operations. Instead of writing the same code over again, you can use a function. Essentially, a function is a small snippet of code that can be quickly referenced. There are three kinds of functions:

We'll address functions you write yourself in Python Basics 2. For now, let's look at a few of the native functions. One of the most common functions used in Python is the print() function which simply prints a string.

In [ ]:
# A print function that prints: Hello World!
print('Hello World!')

We could also define a variable with our string 'Hello World!' and then pass that variable into the print() function. It is common for functions to take an input, called an argument, that is placed inside the parentheses ().

In [ ]:
# Define a string and then print it
our_string = 'Hello World!'
print(our_string)

There is also an input() function for taking user input.

In [ ]:
# A program to greet the user by name
print('Hi. What is your name?') # Ask the user for their name
user_name = input() # Take the user's input and put it into the variable user_name
print('Pleased to meet you, ' + user_name) # Print a greeting with the user's name

We defined a string variable user_name to hold the user's input. We then called the print() function to print the concatenation of 'Pleased to meet you, ' and the user's input that was captured in the variable user_name. Remember that we can use a + to concatenate, meaning join these strings together.

Here are couple more tricks we can use. You can pass a string variable into the input function for a prompt and you can use an f string to add the variable into the print string without use the + operator to concatenate both strings.

In [ ]:
# A program to greet the user by name
user_name = input('Hi. What is your name? ')
print(f'Pleased to meet you, {user_name}')

We can concatenate many strings together, but we cannot concatenate strings with integers or floats.

In [ ]:
# Concatenating many strings within a print function
print('Hello, ' + 'all ' + 'these ' + 'strings ' + 'are ' + 'being ' + 'connected ' + 'together.')
In [ ]:
# Trying to concatenate a string with an integer causes an error
print('There are ' + 7 + 'continents.')

We can transform one variable type into another variable type with the str(), int(), and float() functions. Let's convert the integer above into a string so we can concatenate it.

In [ ]:
print('There are ' + str(7) + ' continents.')

Mixing strings with floats and integers can have unexpected results. See if you can spot the problem with the program below.

In [ ]:
# A program to tell a user how many months old they are
user_age = input('How old are you? ') # Take the user input and put it into the variable user_age
number_of_months = user_age * 12 # Define a new variable number_of_months that multiplies the user's age by 12

print('That is more than ' + number_of_months + ' months old!' ) # Print a response that tells the user they are at least number_of_months old

In order to compute the variable number_of_months, we multiply user_age by 12. The problem is that user_age is a string. Multiplying a string by 12 simply makes the string repeat 12 times. After the user gives us their age, we need that input to be converted to an integer. Can you fix the program?


Lesson Complete

Congratulations! You have completed Python Basics 1. There are two more lessons in Python Basics:

  • Python Basics 2
  • Python Basics 3

Python Basics 1 Quiz

If you would like to check your understading of this lesson, you can take this quick quiz.

Start Next Lesson: Python Basics 2

In [ ]: