Introduction to Version Control with git - Standalone Lesson#

Overview

Questions:

  • What is version control?

  • How do I use git to keep a record of my project?

  • What is a branch and why would I use it?

  • How do I tell git to ignore files?

Objectives:

  • Explain the purpose of version control.

  • Introduce common git commands.

  • Understand how to view and check out previous versions of files.

Prerequisites

  • Completed the previous lesson.

  • Created GitHub account (described in set-up instructions)

  • Configured git (described in set-up instructions)

Version Control#

Version control keeps a complete history of your work on a given project. It facilitates collaboration where everyone can work freely on different parts of a project without overriding others’ changes. You can move between past versions and rollback when needed.
You can also review the history of your project through commit messages that describe changes on the source code and see what exactly has been modified in any given commit.

This is greatly beneficial whether you are working independently or within a team. We recommend always using git when working on a programming project.

git vs. GitHub

git is the software used for version control, while GitHub is a hosting service. You can use git locally (without using an online hosting service), or you can use it with other hosting services such as GitLab or BitBucket. Other examples of version control software include SVN and Mercurial.

MolSSI recommends using the software git for version control, and GitHub as a hosting service, though there are other options.

Using git to keep a record of your project#

You should have git installed and configured from the setup instructions.

In this section, we are going to create a file with some Python functions and use git to track changes to our project.

First, use a terminal to cd into the directory where you are keeping your files for the workshop (molssi_best_practices). Then, create a folder for your git lesson.

cd molssi_best_practices
mkdir git-lesson
cd git-lesson

Challenge

Use the skills you used in the previous lesson to create a directory called git-lesson.

Then, change directories into the folder you just created.

Make sure you execute the commands that are in the solution of the challenge above before continuing.

We will do our first git project in this folder.

In order for git to keep track of your project, or any changes in your project, you must first tell git that you want it start a project and track your changes. After starting a project, you must manually create check-points if you wish to have points to return to. If we want to tell git that the folder we are working in represents a project, we do so with the command git init. In your folder, use the command

git init

You will see an output message similar to the following, except with the path to your directory

Initialized empty Git repository in /PATH/TO/REPOSITORY

Now, when you check the contents of the directory using ls, it will still look empty. However, if you look at the hidden files (files or folders beginning with a dot .) using the ls -a, you will see that the directory is no longer empty

ls -a
.   ..  .git

The presence of the .git folder indicates to us that the git software is now watching the folder for changes. .git is a directory where git stores the repository data. We can tell from this output that we are in a git repository.

Another way we can tell if we are in a git repository is to use the command git status.

git status
On branch main

No commits yet

nothing to commit (create/copy files and use "git add" to track)

The command git status gives us the current state of our repository. This message tells us that we are on the main branch, and that we haven’t yet created a checkpoint, or commit, of our project.

The 3 steps of a commit#

A particular version of your project is called a “commit”. There is a very specific procedure that you should follow when making a commit. These steps are : (1) Making changes to your project. (2) Marking the changes you want to record (git add). (3) Creating a version, or a “commit” to the repository (you can also think of this as a project checkpoint).

Now that we’ve covered the steps, let’s see how to make versions of our project and view the project history.

Create a file called README.md in your text editor of choice. The README file is a file which typically accompanies git repositories and gives information about that project. We will use markdown, which we covered in the last lesson, for the README to give it a nice formatting when viewed online.

Add the following to your README

# Git Lesson

This lesson covers the basics of git for version control.

Save this file an return to the terminal. Now that we have made a change, check the status of your project again.

git status
On branch main

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	README.md

nothing added to commit but untracked files present (use "git add" to track)

Notice how git is watching our repository! It tells us that it sees that we have added a new file (README.md), but git is not actually tracking that file yet. Remember that git only watches what we tell it to watch, and only tracks changes in files we tell it to track changes in. Next, we will want to tell git to watch README.md and to make a version of our project which includes the file. In other words, we want to commit our changes.

git add, git status, git commit#

Making a commit is like making a checkpoint for a particular version of your code. You can easily return to, or revert to that checkpoint.

To create the checkpoint, we first have to make changes to our project. We might modify many files at a time in a repository. Thus, the first step in creating a checkpoint (or commit) is to tell git which files we want to include in the checkpoint. We do this with a command called git add. This adds files to what is called the staging area.

Let’s look at our output from git status again.

On branch main

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	README.md

nothing added to commit but untracked files present (use "git add" to track)

Git even tells us to use git add to include what will be committed. Let’s follow the instructions and tell git that we want to create a checkpoint with the current version of README.md

git add README.md
git status
On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   README.md

We are now on the second step of creating a commit. We have added our files to the staging area. In our case, we only have one file in the staging area, but we could add more if we had more files.

To create the checkpoint, or commit, we will now use the git commit command. We add a -m after the command for “message.” Whenever you create a commit, you should write a message about what the commit does.

git commit -m "add README with information about project."
[main (root-commit) dc466ff] add README with information about project
 1 file changed, 3 insertions(+)
 create mode 100644 README.md

Check your status after the commit

 git status
 On branch main
nothing to commit, working tree clean

This message means nothing in the directory has changed since our last checkpoint, or commit.

git log#

Next, type

git log

You will get an output resembling the following:

commit dc466ff70070312b622ab0041f4d770bd37bb248 (HEAD -> main)
Author: Jessica Nash <janash@vt.edu>
Date:   Wed Jul 8 15:59:57 2020 -0400

    add README with information about project

Each line of this log tells you something important about the commit, or check point that exists for the project. On the first line,

commit dc466ff70070312b622ab0041f4d770bd37bb248 (HEAD -> main)

You have a unique identifier for the commit (dc466…). You can use this number to reference this checkpoint. This number is unique for every commit, meaning you will have a different number than that shown above.

Then, git records the name of the author who made the change.

Author: Your Name <your_email@something.com>

This should be your information. This way, anyone who is working with this project can see who made each commit. Note that this name and email address matches what you specified when you configured git in the setup.

Date:   Wed Jul 8 15:59:57 2020 -0400

Next, it lists the date and time the commit was made.

    add README with information about project

Finally, there will be a blank line followed by a commit message. The commit message is a message whoever made the commit chose to write, but should describe the change that took place when the commit was made. You’ll recognize this message from what you just wrote when you used the git commit command.

To exit the log, press the q key.

When we have more commits (or versions) of our code, git log will show a history of these commits, and they will all have the same format discussed above. Right now, we have only one commit.

Let’s continue to edit this readme to include more information. This is a file which will describe what is in this directory. Open README.md in your text editor of choice and add the following to the end

This lesson is for the MolSSI Best Practices Workshop.

Let’s make another version of our file that contains this information:

git add README.md
git status
On branch main
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        modified:   README.md
git commit -m "add more information to README"

Check your understanding

Add the following information about how to make a commit to your README.md and commit the change.

To make a commit ("version" or "checkpoint") of your files, follow this procedure:

1. Make changes to your project you would like to keep.
1. When you have your changes, tell git you are ready to create a checkpoint of the files using `git add filename`
1. Create a checkpoint using `git commit -m "message about what you did"

Now, check git status and git log. You should see the following:

git status
On branch main
nothing to commit, working tree clean
git log

We now have a log with three commits. This means there are three versions of the repository we are working in.

git log lists all commits made to a repository in reverse chronological order. The listing for each commit includes the commit’s full identifier,the commit’s author, when it was created, and the commit title.

We can see differences in files between commits using git diff.

git diff HEAD~1

Here HEAD refers to the point in our commit history (and current branch). When we use ~1, we are asking git to show us the different of the current point minus one commit.

Lines that have been added are indicated in green with a plus sign next to them (‘+’), while lines that have been deleted, if we had any, would be indicated in red with a minus sign next to them (‘-‘).

Viewing previous versions#

If you need to check out a previous version

git checkout COMMIT_ID

This will temporarily revert the repository to whatever the state was at the specified commit ID.

Let’s checkout the version before we made the most recent edit to the README.

git log --oneline
fe357b0 (HEAD -> main) add information about how to make a commit to the readme
8c39357 add more information to README
dc466ff add README with information about project

In this log, the commit ID is the first number on the left.

To revert to the version of the repository where we first edited the readme, use the git checkout command with the appropriate commit id.

git checkout 8c39357

If you now view your readme, it is the previous version of the file.

To return to the most recent point,

git switch main

Creating new features - using branches#

When you are working on a project to implement new features, it is a good practice to isolate the the changes you are making and work on one particular topic at a time. To do this, you can use something called a branch in git. Working on branches allows you to isolate particular changes. If you make sure that your code works before merging to your main or main branch, you will ensure that you always have a working version of code on your main branch.

By default, you are typically in the main branch. To create a new branch and move to it, you can use the command

git switch -c new_branch_name

The command git switch switches branches when followed by a branch name. When you use the -c option, git will create a branch with the specified name and switch to it. For this exercise, we will add a new feature - we are going to a python function to print “Hello, World!” (the famous first programming exercise).

First, we’ll create a new branch:

git switch -c hello_world

Next, create a new file called quotes.py. We are going to add the ability to print “Hello, World!”

Add the function to your quotes.py file.

"""
Some quotes.
"""

def hello_world():
    quote = "Hello, World!"
    return quote

Verify that this function works in the interactive Python prompt.

>>> import quotes
>>> print(quotes.hello_world())

Next, commit this change:

git add quotes.py
git commit -m "add function to print Hello World"

Let’s switch back to the main branch to see what it is like. You can see a list of all the branches in your repo by using the command

git branch

This will list all of your branches. he active branch, or the branch you are on will be noted with an asterisk (*).

To switch back to the main branch,

git switch main

On your main branch, you should see that there is no quotes.py module.

You can further verify this by using the git log command.

Consider that at the same time we have some changes or features we’d like to implement. Let’s make a branch to do a documentation update.

Create a new branch

git switch -c doc_update

Let’s add some information about developing on branches to the README. Update your README to include this information:

## Adding Features
Features should be developed on branches. To create and switch to a branch, use the command

`git switch -c new_branch_name`

To switch to an existing branch, use

`git switch branch_name`

Save and commit this change.

Getting Changes to Your main Branch#

To incorporate these changes in main, you will need to do a git merge. When you do a merge, you should be on the branch you would like to merge into. In this case, we will first merge the changes from our doc_update branch, then our hello_world branch, so we should be on our main branch. Next we will use the git merge command.

The syntax for this command is

git merge branch_name

where branch_name is the name of the branch you would like to merge.

We can merge our doc_update branch to get changes from our doc_update branch to our main branch:

git switch main
git merge doc_update

Now our changes from the branch are on main.

We can merge our hello_world branch to get our changes on main:

git merge hello_world

This time, you will see a different message, and a text editor will open for a merge commit message.

Merge made by the 'recursive' strategy.

This is because main and hello_world had development histories which have diverged (their commit histories were different). Git had to do some work in this case to merge the branches. A merge commit was created.

Merge commits create a branched git history. We can visualize the history of our project by adding --graph to git log. There are other workflows you can use to make the commit history more linear, but we will not discuss them in this course.

git log --graph

Once we are done with a feature branch, we can delete it:

git branch -d hello_world
git branch -d doc_update

Ignoring Files#

Sometimes while you work on a project, you may end up creating some temporary files. For example, if your text editor is Emacs, you may end up with lots of files called <filename>~. By default, Git tracks all files, including these. This tends to be annoying, since it means that any time you do “git status”, all of these unimportant files show up.

We are now going to find out how to tell Git to ignore these files, so that it doesn’t keep telling us about them ever time we do “git status”. Even if you aren’t working with Emacs, someone else working on your project might, so let’s do the courtesy of telling Git not to track these temporary files. First, lets ensure that we have a few dummy files. Create empty files in your text editor called quotes.py~ and README.md~.

Now check what Git says about these files:

git status
On branch main
Your branch is up to date with 'origin/main'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	README.md~
	quotes.py~

nothing added to commit but untracked files present (use "git add" to track)

Now we will make Git stop telling us about these files. We do this with a file called .gitignore. A .gitignore does what it sounds like - it tells git files or directories to ignore. If we can created our repository on GitHub and cloned it to our computer, we could have selected to create the repository with a .gitignore. We could have told GitHub what language we were planning to use, and it would have given us a starting .gitignore with files we would be likely to want to ignore.

Navigate here to get a good starting gitignore for python. Copy the contents of this file to a file in your repository called .gitignore.

Look at the contents of .gitignore

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

...

Git looks at .gitignore and ignores any files or directories that match one of the lines.

Commit your gitignore to the repository.

git add .gitignore
git commit -m "add gitignore to repository

Next, let’s ignore those emacs temporary files we added.

Add the following to the end of .gitignore:

# emacs
*~

Now do “git status” again. Notice that the files we added are no longer recognized by git.

git status
On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   .gitignore

no changes added to commit (use "git add" and/or "git commit -a")

We want these additions to .gitignore to become a permanent part of the repository, so do

git add .gitignore
git commit -m "Ignores Emacs temporary files and data directory"

More Tutorials#

If you want more git, see the following tutorials.

Basic git#

Key Points

  • Git provides a way to track changes in your project.

  • Making a commit (or version or checkpoint) of your project requires several steps which allow to selectively choose what files you want to include.

  • Git is a software for version control, and is separate from GitHub.