Git & HTML Recitation
Overview of git, how it relates to Gitlab; how to build a documentation website
If you have any questions or run into issues with these topics - reach out! camronb [at] mit.edu
what is git?
git is an open source distributed version control system that is practically standard for most software development.
git allows any number of people to simultaneously collaborate on large projects while easily tracking different “branches” of work - this allows for nonlinear development of without a fear of losing work.
OK - what does this mean?
it’s easiest to describe with a story . . .
imagine a world with no git . . .
Let’s say you are building a website. You’re new at this so you spend some time building a very simple, barebones html site .. it’s not the prettiest, but it’s functional.
You’re content with the website at first, but a bit of time goes by and you’re becoming more comfortable with html. You decide that you’d like to add fancier styles, fonts, and colors to your site with css (cascading style sheets) scripts included in your html. You realize that in order for the styles and scripts to stay modular and clean, it’s best to reorganize the head of your html file by expanding it in to local directories or files. BUT doing so could break the working simple html file that you have now . . .
so you copy the directory with the simple html website and paste it into a folder called “backup” outside of your current working directory to keep it untouched and safe.
now you are safe to fall down the css/html rabbit hole:
it’s beautiful. you’re happy. But now you have a friend who has way more experience with javascript web development and they want to work with you on maintaining your site. You take the directory you’ve been working in, zip it up, and email it to your friend. They quickly download it, make some changes, and send the directory back to you.
You then download and upzip the modified directory, review the changes, and decide you’re happy with all of their edits. You move your now outdated directory to the “backup” folder next to the old simple version, and change your current working directory to the one your friend sent.
Now you and your friend both have the most up-to-date version of your site. In the next couple of days, you decide you want to change the format around the headers and your friend decides to change the borders around the images. You have both changed the ‘styles.css’ file! The different versions of your directories need to be consolidated, or “merged”! They send you their changes, you compare the differences, and decide to keep the image borders. You consolidate your ‘styles.css’ file to include their change, rezip the directory, and send them the updated version. They compare the differences, and match their ‘styles.css’ file to update the header format. You guys are both working on the same up-to-date version again - merge conflict resolved :)
this is a grossly inefficient way of doing conceptually the same thing that git handles for us!
when you create a git directory on your computer, you periodically “commit” versions of your directory as a snapshot stored in a hidden ‘.git’ directory that behaves like the “backup” folder from our story example - but it stores all these version waaay more efficiently than just copy/pasting by deduplicating and compressing the changes. Each commit is identified with a unique hash, author name and date, and commit message. The commit message allows you to store a quick summary of the changes that where made within the version, which is helpful when later reflecting on project development or reverting to an older version.
what is GitLab?
Gitlab is the web-based git manager that allows the project to be hosted in the cloud or on a secure server for global access - aka many people from many different places can make changes to the .git folder at once.
Gitlab also has a ton of other features to manage all sorts of DevOps functions like issue trackers, wikis, CI / CD piplines, and user priviledge management.
You may have also heard of GitHub or BitBucket - they’re the same hing, just different companies with different servers tailored to slightly different customizations and applications.
how do we use it?
GitLab UI
if you reaaalllyyy want, you could manage all of your HMT(a)A documentation from the GitLab browser … I don’t recommend this but, you ~ could ~. To demonstrate just how to setup your website do the following:
-
Go to your section page:
~ Architecture
~ EECS
~ CBA
~ Harvard -
Click into the “people” directory. Click the plus on the top right, and create a new directory with your name (NOTE that you should not create directory names with spaces or slashes becaue this can confuse different operating systems and get confusing when navigating from the command line).
- Inside your directory, make a new file called “index.html”.
and this will be the main page of your website. Here’s the simplest html page you can write:
- Now you need to edit the main section page so that your individual site is added as a hyperlink. Go to “index.html” on the section page, and click “Edit”. Then add a line to the list of people like the following:
<a href="people/your_directory/index.html">Your Name</a>
You should now see a link pointing to your webpage appear with your name on your respective section sites: Architecture, CBA, EECS, Harvard Note that you do not need to edit the global people page - only staff has access to this repo.
GitLab Web IDE
As of this year, GitLab now has an integrated IDE - making it mcuh easier to edit files from the browser. You can access it by trying to edit a file and selecting “Open in Web IDE”:
And a new window that looks and functions just like VSCode IDE will open. You can navigate to different files in the repo and make as many edits as you want.
Once you make all the changes that you want, you must share the changes with the rest of the class by navigating to the source control menu on the left-side menu options, adding a commit message, and clicking “commit to main” (for the purpose of this class, we will always use the main branch).
local git
Now using the GitLab UI from your browser is managable, but it’s not harnessing the full power of git and you’re stuck writing all of your code in the browser without useful text editors and IDEs AND you must always be connected to the internet ://
To fix this, it’s best to copy, or “clone”, a local version of the repo to your computer where you can then do all your development in an editor of your choice, easily test and view your changes, and then “push” these edits back to the global repo. Most of these steps require the command line, so if you’re not familiar with using it at all try checking out this tutorial, or this one.
to start, we’ll review some simple git vocabulary:
- clone: copying a global repository from GitLab to your local directory
- pull: update your local branch to match the changes that have been made in the global repo
- branch: a moveable pointer to designate the linear history of one working version of the current project. this won’t be super important for this class, since you will be doing all of your development on the master branch.
- staging/add: the limbo state when files have not yet fully been commited to a branch, but a snapshot of their current state has been saved. the add command is used to add files to the staging area
- commit: the snapshot that is saved to the git history. it is designated by a hash, author, date, and commit message
- push: send local commits to the global project. whatever you push will be visible to everyone and downloaded onto everyone else’s computer on their next pull!
configure git
You most likely have git installed on your computer already. Check by opening the command line and running git --version
. if you do not, follow the instructions here to install it.
There are endless resources online to help you get familiar with git from the command line, the GitLab guide is a good place to start.
basically, you will create a global configuration for your local git using your GitLab username and email. This will allow all of the commits that you push to the global branch to be tagged with your username, so that a history of the project contributions can be saved.
GitLab needs some way to authenticate your local git against your account on the global server every time you pull or push changes. This can be done two different ways:
- https authentification: git clone the repo with HTTPS - this will require you to input your credentials (gitlab username and password) every time that you clone, push, or pull. This is fine, but it gets a little tiring to enter your password every time, so it’s recommended to use vvv
- ssh authentification: you setup a secret key which is added to both your computer and in your gitlab account. then when you clone, push, or pull, git checks the path where you have saved the ssh key to verify your credentials. a guide on how to set this up can be found here. Note that when setting up your rsa key, you do not need to make this file password protected, it’s an unecessary level of security for our use cases.
A note for more advanced git users :: if you already have an ssh key setup for another git client (like GitHub), you can configure two SSH keys on the same machine which point to different hosts - here’s a helpful guide, but feel free to reach out to me if you run into issues.
git clone
Now you have git configured! Go to your section repo and at the top right corner click “Clone” and copy the path that matches the authentification method you decided to use. Then, in the command line, navigate to the directory that you want to clone the class repo to and run
git clone path/copied/from/clipboard
obviously, replacing the path with your copied value.
Now you have a local version of the section website!
local editing
You are free to edit outside of the gitlab browser! You can open the files in whatever editor you are most comfortable in (if you are unfamiliar with all, I reccomend VSCode). The changes are on your local copies of the files and will not be visible to anyone else in the class until you stage–> commit –> push them.
Once you are happy with your changes ALWAYS run the git status
command. This will tell you what files you have changed, how far ahead you are from the remote repo, and what will be added to staging.
If you are happy with the files listed as “modified” or “untracked” from git status, then run git add .
- this will add all of those files to the staging area. If there are many modified files and you only want to commit changes from a few, you can specify which files to commit by calling them by name, for example git add specific_files.html
.
Once you have files staged, run git commit -m "type out commit message"
to make a commit on your local branch.
finally, run git push
to push your latest commit to the remote repository for everyone else to download in their next pull.
MOST important git rules
don’t push large media files!!!!
as we’ve learned, everything that you push to the remote repo is downloaded onto everyone else computer. As Neil will point out many times, if everyone is pushing large image or video files (> a few hundred KB) to each week’s page, this quickly adds up to a huge amount of storage, not to mention it’s unnecessary for web-resolution images. Be sure to compress your image files before adding them to the staging area. ffmpeg is a good tool to do this. Neil has a cheat sheet of ffmpeg commands here.
for those who are python savy, I wrote a python command line script that runs ffmpeg to automatically compress and overwrite images in a given directory - feel free to modify it for your needs. img_format.py
ALWAYS run git status
it’s just a good habit to always check what will be staged, what is left out, how many commits ahead you are from the last time you pulled, etc.
For example, on MacOS there are typically hidden directory metadata files called “.DS_Store” which can unecessarily sneak into git commits. they can be added to .gitignore (more info here), but it’s always good practice to check for things like that.
you will run into a merge conflict
you will most most likely run into a merge conflict at some point throughout the semester. This is like the last part of our git story when you and your friend both made changes to the same file at the same time. This happens when you pull the latest version of the remote repo, and in the amount of time that it takes you to stage and commit and push your changes, someone else has already push the remote branch to a newer commit. To avoid this, it’s best practice to always run a git pull
right before running git commit
(after staging with git add
). But when you inevitably clash with someone else, it’s not a problem - you’ll just need to work through a merge request which shouldn’t be an issue since each person should be making their changes in their unique directory. Follow the git warning/error messages to resolve the conflict or learn more here.
When all else fails (or even before then), you can always clone a fresh version of the global repository and copy your modified files into the clean directory.
someone will probably break a repo
it’s ok. the beauty of git is that nothing can really be deleted or lost (only hidden in a complicated net of old commits) and arguably the best way to learn git is to seriously screw it up once and have to dive deep into debugging :-)
Web Design
The on-going assignment of the course is to be building a documentation website. The git motivation story also gives a quick guide to building websites from scratch with html, css, and javascript. The link to each of these websites are underneath the image and you can find the repo for the source code here. At the bottom of each of the websites are links to tutorials and resources for learning html.
It’s also helpful to look through previous year website to get inspiration and see how other people set up their directories. Instead of crawling through the old gitlab repositories, you can quickly see the source code for any website by right clicking and selecting “View Page Source” (at least for Chrome, but it should be similar in other browsers as well).
There are also a lot of free website templates online, like Bootstrap templates, or more blogpost/marketing style ones like this. They can offer a good starting point to expand your site with more javascript functions.
If you like typing all of your documentation in markdown and want to automatically convert your markdown into a website with Bootstrap themes, then Strapdownjs does exactly that!
If you want to get fancier with your markdown documentation, you can use a static site generator like Jekyll or Hugo. Erik’s 2019 git recitation site goes into more detail on these options. [[ this website was generated with jekyll alembic theme ]]
More Resources
- recording of the covid era’s recitation covering this same content
- git tutorial for beginners
- git cheat sheet
- if you want a way better and more indepth version of the git story, check out Tom Preston-Werner’s blogpost, The Git Parable. this guy founded GitHub.