Do you use GitHub?
How do you ensure if GitHub was down for a reason that your developers could still gain access to their code but also how many people actually run their code from GitHub directly into their environment?
Why do you need to backup GitHub?
As mentioned above if something was to happen to access to GitHub and that doesn’t just mean a site failure from that end it could also mean internet connectivity or issues within your environment which stops the ability to gain access to GitHub.
What if one of your developers or GitHub administrators brings down an important repository or makes a change that needs to be rolled back, this will also give you the ability to backup any other GitHub repository that you have watched or starred.
How did we get to this topic?
Well it was thanks to a couple of conversations but the trigger to actually exploring things more was having a quick chat with Ruairi McBride which then pushed me to go and do some digging which led me to some articles I will also mention as they could be useful.
The first resource I found was from Volkan Paksoy Volkan is a software developer so although approached this with backup in mind he also talks about some tools that are not the normal for us infrastructure people, but he covers things really well here. The bulk of the script I used actually is based on Volkans work I have just added some additional benefits to it.
Do I need to backup my GitHub?
My argument is how important is this code base, project work that you have within your GitHub account? Can you afford to lose it? Yes you most likely have a version of GitHub desktop running somewhere but what if mistakes occur? What if you lost that? Were compromised? If you feel like you should then there are lots of different scripts and open source tools out there as well as some paid for offerings that you can also use to create backups.
How can I start backing up GitHub?
As I have said there are many ways in which you can make this happen as with any backup methodology it’s down to what you want to achieve. I decided that as a test I wanted to create a daily backup of my GitHub repositories, I had no concern for space as I also know my Github only really contains PowerShell or code based repositories nothing with a huge size, I chose to take a full backup as it were on a daily basis
Having followed Volkans blog above where he states he already had GIT installed (Software Developers generally will have, in my case I did not) so this was the first step in order to start some level of backup.
Another resource to help with this –
https://www.atlassian.com/git/tutorials/install-git#windows
We then need to connect to your GitHub and this involves a few commands that can be found here but I will also print below.
Open a terminal/shell and type:
$ git config --global user.name "Your name here"
$ git config --global user.email your_email@example.com
Next we need to setup ssh on your machine, in my instance this machine is purely going to be a standalone machine that looks after this backup or other backup tasks this is not a developer machine or anywhere I will likely consume this source code we are backing up.
If you have not generated an SSH key for access to GitHub this resource will also help.
Connect GIT to your GitHub – https://kbroman.org/github_tutorial/pages/first_time.html
Not sure if this is needed but this helped me get some folder structure in place - git clone https://hostname/YOUR-USERNAME/YOUR-REPOSITORY
https://help.github.com/en/enterprise/2.18/user/articles/cloning-a-repository
Creating personal access token with Repo Scope – https://github.com/settings/tokens
How to then compress a group of files – https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.archive/compress-archive?view=powershell-6
create such public/private keys: Open a terminal/shell and type:
$ ssh-keygen -t rsa -C your_email@example.com
On windows you are going to find your required files here: C:\users\username\.ssh
- Go to your github Account Settings
- Click “SSH Keys” on the left.
- Click “Add SSH Key” on the right.
- Add a label “backup” and paste the public key from id_rsa into the text box
Then we can test if the above worked by running
ssh -T git@github.com
If that worked then you will get a return of
Hi username! You've successfully authenticated, but Github does
not provide shell access.
Ok so we now have GIT installed and we have now connected to our GitHub account. Next we are back to the Volkan page for the backup script. I have added some additional steps here as I want a point in time scheduled copy of my GitHub repositories that I can access if GitHub is not available or if someone is malicious within and deletes or edits my repositories.
#Script Original from https://volkanpaksoy.com/archive/2017/11/30/Backing-up-GitHub-Account-with-PowerShell/
#Define these four variables based on your own environment.
$backupDirectory = 'BACKUP LOCATION'
$backupretention = 'COMPRESSEDBACKUPLOCATION'
$token = 'GITUSERNAME:PERSONALACCESSTOKEN'
$base64Token = [System.Convert]::ToBase64String([char[]]$token)
$headers = @{
Authorization = 'Basic {0}' -f $base64Token
};
Set-Location -Path $backupDirectory
$page = 1
$perPage = 30
Do
{
Write-Host "Getting page: $page"
$response = Invoke-RestMethod -Headers $headers -Uri "https://api.github.com/user/repos?page=$page&per_page=$perPage"
foreach ($repo in $response)
{
$repoName = $repo.name
$repoPath = "$backupDirectory/$repoName"
Write-Host "Processing repo at path: $repoPath"
if ( (Test-Path $repoPath) -eq 0)
{
Write-Host "Repo doesn't exist, clone it"
git clone $repo.ssh_url
}
else
{
Write-Host "Repo exists, update"
# Change to repo directory to fetch updates
Set-Location -Path $repoPath
git fetch --all
#git reset --hard origin/master
# Change back to root backup directory
Set-Location -Path $backupDirectory
}
}
$page = $page + 1
}
While ($response.Count -gt 0)
# Enable this command if you wish to store retention points for your GitHub repositories.
# The following commands will allow for us to take a compressed point in time version of our GitHub repository and assign the date to the compressed file and store to a relevant backup location.
# The Compress-Archive -Path <LOCATION> should be your GitHub repository location, this could also be used in conjunction with another script that on a schedule will bring down and update from the live GitHub repository to this landing area.
# The -DestinationPath should be the target location you wish your backups to reside and potentially then be further protected by your Backup Software.
Compress-Archive -Path C:\Backup\Github\ -CompressionLevel Optimal -DestinationPath ('$backupretention' + (get-date -Format yyyyMMdd) + '_GitHubBackup.zip') -force
This is what I have started to do on a scheduled basis so I have at least a copy of my scripts and work completed outside of GitHub, the next challenge is going to be restoring that back into GitHub. If anyone has that as a workaround then please let me know and I will add to this post.
Also, you can consider a third-party service, like https://cloudback.it for automated GitHub backups
Thanks Victor, good alternative to explore also.