A data structure consisting of an ordered collection of items of a single type i.e. an indexed list.
A machine-learning algorithm that determines the class of an input element based on a set of features.
A character (most typically a comma) used to specify boundaries between words or regions in plain text.
A tree like structure which represents the organization and hierachy of files within a directory. Terms such as parent and child are used to describe relationships between files and folders within this system.
A cloned copy of a project which is set-up on a independent branch seperate to the original. Often used as a development tool in opensource software - where anyone can create a fork of the program and work on it as a distinct piece of software. Github is an example of a tool which facilitates this sharing and development process.
Put simply, functions provide functionality to a program. They are blocks of organized code which begin with the keyword def proceeded by the name of the function you wish to define in parentheses. The code block begins with a colon and must be indented. Further Information.
Empty spaces used as a formatting tool to designate blocks of code in programming. In Python, indentation is used to indicate a block of code, typically four spaces are used - each line of code in the block must be indented by the same amount of spaces otherwise an error may occur.
The repetition of a procedure in the form of a loop to obtain successively closer approximations to the solution of a problem.
The core computer program of the operating system which can control all system processes. The iPython kernel runs the code in the background for Jupyter notebooks.
A lemma is the canonical form of a word. Lematization is the process of grouping together inflected forms of a word to be analysed as a single item i.e. determining the orginal lemma for the words.
Placing objects or elements in a hierarchical arrangement within a set (an ordered collection of immutable objects).
A process of transforming text into a single canonical form, thereby faciliating data consistentency for further processing. Examples include removing non-alphanumeric characters or changing to lower case.
Data which has attributes or values AND a defined behaviour.
Text which includes only data related to the readable material. That is, without data related to grapahical presentation, formatting or other objects such as images. Encoded using Unicode standards, typically in a text editor such as Textedit on Mac or Wordpad on PC. Plain texts are particularly useful for archival storage as they are not confined to proprietary software and can be opened and edited on many systems, thereby ensuring a more universal accessibility and preservation.
The sequence of characters which define a search pattern. These patterns are useful for performing string operations such as find or find and replace
A central location where where data is stored and managed. More specifically, in revision control systems a repository stores metadata for sets of files or directory structure.
The process of reducing a word to it's base form or word stem e.g. added/adding would reduce to add.
A list of words which are programmed to be ignored or filtered in analysis and search queries. Lists of stop-words often contain high frequency function words such as the, of, and etc
A string is a container for data of letters, numbers or symbols.
A data set used to train a model in machine learning. Specific examples are chosen to fit the parameters of the model for training and the subsequent results are compared with a testing dataset.
A sequence of immutable (fixed) objects. Tuples are created by seperating values using commas within a set of parentheses e.g. (1, 2, 3, 4, 5 );
A variable stores a piece of data and gives it a specific name. Common data types which are stored in variables in Python include numbers and Boolean values.
An industry standard in computing for encoding (representing) text. Letters, numbers and symbols are assigned unique numeric values which facilitate universal application across different programs and platforms. A fun example of the utility of unicode is the emoji keyboard used on smartphones when sending messages. The universal nature of unicode allows the emoji's to be accurately represented on most modern phones regardless of their differing operating systems (such as android, ios, blackberry). Further information