OSDC
Introduction
Overview of the Course
-
Goal: Become familiar with tools and processes used for software development both in the industry and in the academic institutions through contribution to open source projects.
-
List of tools we learn, they are all used in both academy and corporation, they could be used better in both places.
-
Overview of the Open Source Development Course
-
git
-
GitHub
- Issues
- Pull Request
- Pages
- Actions (Workflows)
-
(GitLab)
-
Markdown (blog/journal, issues, etc.)
-
Docker
-
Programming languages: Python, JavaScript
-
(Functional) Testing
-
Static analysis
-
Communication
- Slack
Expected End results
- Blog posts (journal entries).
- Personal web site.
- Issues (tickets) opened on various projects.
- Pull-Requests sent to various projects.
- Development of a personal open source project.
Background of the lecturer
- Self employed
- Training
- Introducing testing, CI etc. to teams in corporations.
Planned Assignments
-
Will be in some public GitHub or GitLab repositories
-
At the end of each assignment you'll write a report - a blog post / journal entry.
-
You will add it to your personal JSON file and send a Pull-Request with the change. (We'll learn these soon)
-
In order to create a substantial contribution to an established project you would probably have to spend 10s of hours learning about the project.
-
Contributing to small, relatively new projects needs less time as there are a lot more "easy" things to do.
-
Contributiong "meta" data, instructions on how to setup development environment needs almost no internal knowledge of the project.
-
Setting up CI needs knowledge about the programming language and the tooling of that language, but not the specific project.
-
Writing test requires you to know how to USE the project, but not necessarily the internals.
-
We start with a few projects developed by the lecturer - that way you can get fast feedback.
-
Then we'll have a few relatively simple changes to a number of open source projects.
-
Then we'll have a few slightly more difficult contributions.
Grades if relevant
- Grades are based on the work done during the course.
- There is no end-project or exam at the end.
Version Control
-
Why use version control?
-
Version 1, 2, 3 of some document
-
Version by date
-
Collaboration
-
Two people cannot edit the same file at the same time
-
Who changed what, when and why?
Version Control in Wikipedia
-
Wikipedia and the version control there. Recommended to watch:
GitHub
-
GitHub: process of contributing to an Open Source project using the GitHub web site.
-
Creating a file and sending a Pull-Request.
-
Use a the
cm-demo
user to add the json file of the user. -
Show how the CI fails when we add an incorrectly formatted file.
-
What is JSON?
-
Show the Git repository of the project and the web site generated from it.
Docker
Which open source project to work on?
What is the motivation?
- Do you get paid for it? - Then probably you are told which project(s) to work on.
- Are you passionate about it?
- To "show off"? Probably won't work. Find the passion!
- To learn a new language or technique.
- To learn the social interaction.
- To "give back" to the world.
There are people who work at some company that happens to develop an open source product (e.g. GitLab, Wordpress) or an open source library (e.g. a driver to their own proprietary database). In this case the people working on open source do so because that's what they were told to do. They get paid for their work. In many cases these project have two version control systems and two bug-tracking systems. One internal to the company where they might discuss requests from clients and their own priorities. Then there is the public VCS with its public issue-tracker.
Other people work on open source projects because they are passionate about them. They might have built a solution for themselves that caught on and now they keep maintaining and developing that piece of software. They might have used an open source project written by others, but wanted to make improvements or just wanted to "give back".
There are others who think that contributing to open source will help them with their career. I have been promoting this idea for many years, so do the people in the Maakaf community and probably elsewhere. There were various efforts (e.g. Google Summer of code) to get more people introduced to the world of open source contribution. These programs usually manage to get people make some contributions, but as far as I can tell very few of the participants became long-term open source contributors. Once the monetary incentive was gone, very few retained the passion. So while I still think that contributing to open source can help with your career (e.g. by gaining experience), these days I think that you need to work on your motivation, on your passion to contribute. To see that your volunteer contribution makes the world a better place.
Your own project
Pro: You can decide on everything:
- programming language
- technology stack
- architecture
- code layout
- testing
Contra: You have to do everything yourself.
- programming
- documentation
- examples
- design
- logo
- web site for the project
Drawbacks:
- No users.
- No cooperation from others (at least at the beginning).
- No community.
Advice:
- Something like that probably already exists. Maybe better finding it and contributing to that project.
- Take an existing proprietary product and implement an Open Source alternative
Should you start your own open source project or should you contribute to an existing one?
When talking about Open Source or contributing to Open Source many people immediately think of developing something new on their own and "making it Open Source", that is, publishing it under and an Open Source license.
Writing your own project is really nice. It lets you make all the decisions in the world. It is also much easier to get started on a new green-field project than to contribute to an existing code-base written by someone else or even just finishing the one you have started earlier.
We all start projects in the hope that many people will use it, but the reality is that most projects have very few users. I personally started a lot of open source projects and most of them were never used by anyone. Some I have not even used myself. I think the 80/20 rule applies here even more to the extreme. 1% of the open source libraries are used by 99% of the projects and the a huge percentage of projects are not used by anyone. Not even the author.
As a practice to gain experience it is great to write your own project, but unless your project somehow hits that sweet-spot that makes it popular the impact will be small. Writing your own project, unless it becomes popular, also means you don't learn how to read other peoples code, you don't need to interact with users and developers who want to contribute to your project.
I don't want to discourage you from starting your own project, but I'd like to encourage you to also contribute to existing projects.
Here is an idea:
If you have an idea for a project start writing it. As you pull in dependencies check those dependencies and try to contribute to them and to their dependencies as well. This will give you the opportunity to experience both working on your own and having all the freedom to make decisions on one hand and working with other people within the constraints of their projects.
A well-known project
- Linux kernel - source code
- SQLite - source code
- PostgreSQL
- VS Code - GitHub
- Blender - Get Involved
- VLC - GitLab
- Moodle - GitHub
- Wordpress
- MediaWiki - GitHub (Wikipedia)
- Django - GitHub
- GitLab
Pro: Sounds really good.
Contra: Very difficult.
- All the simple problems are already solved.
- The code-base is huge and quite complex.
Opportunities:
- There might be need for documentation, translation, examples.
Contributing to a well known, established projects used by millions of people sounds really nice. However it has its difficulties.
For one, those projects have been around for quite a while so any simple issue is probably already implemented and any simple bug is probably already fixed. So adding a new feature or fixing a bug will probably require a lot of time. Both to understand the code and to make the changes.
These projects also are usually managed in a much stricter way than those less established projects. That means you'll probably have to adhere to coding standards, commit-message standard, etc. much more than in a less established project. It is actually beneficial to you to learn such practices, but it will make it harder to contribute to such a project.
However even established project have typos, they also lack documentation, good examples of usages. Many have translations either as part of the application (localization) or as translated documentation. They also probably have lots of users that need help.
So there are many ways to contribute to well-known projects as well, just keep in mind that it might need more investment on your part.
Join a brand new project
- Explore GitHub
- Pick the topic and the language
- Sort by "Recently uploaded" and pick something with 0-1 stars
Pro:
- Lots of opportunities to contribute.
- You don't have to come up with an idea for a project.
Contra:
- The author might not be interested in external contributors (yet).
- It might not go anywhere.
Something that you use
- A (well known?) desktop or web application.
- A web site you use.
- Wikipedia (MediaWiki)
- DEV.to
- PyPI
- Readthedocs
- Crates.io
- MetaCPAN
- API of a web site you use.
- Dependencies of the applications at work.
A project that is missing something
-
Missing meta data from the package.
-
Missing Continuous Integration.
-
Not enough tests (low test coverage).
-
PyDigger (Python)
-
CPAN Digger (Perl)
A project by an organization
- Open Source projects by organizations
- Universities
- Companies
- Governments
- Non-profits
Good first issues
Issues labeled as good first issue
Awesome lists
There are many collections of "important" or "interesting" or "awesome" projects. These lists are manually maintained. They can be a nice source of, well, interesting projects.
By country or by (human) language
The topics are set by the developers of the project. It is unclear (to me) what each developer means by setting a country name or a human language as a topic, but these filters might be useful.
Type of project
- Operating System specific application? (e.g. something for Windows, macOS, or Linux only?)
- Desktop application
- Web applications
Desktop applications
Web application
-
Learning Management Systems LMS-es
JavaScript frameworks
HTML/CSS frameworks
Databases
Compilers
Networking (TCP/IP)
Static Site Generators
CMS - Content Management System
In the OSDC
- Start with small, non-coding changes to projects maintained by the mentor (get used to the process, fast feedback)
- Open Source by organization
- Coding projects of each-other.
- Small contributions to other projects.
Other
- Open source Co-pilot
- Open source AI systems?
Entry points
- Assuming the project has a web site visit that web site.
- Look for links to "Community", "Forum". Maybe "Support" or "help"
- Maybe "contact".
- Look for "Docs", "Documentation" and/or "FAQ".
- Look for links to "Source", Source code", "GitHub" or Octocat, the GitHub logo.
Collaborative Development and Open Source Projects
Videos
The material covered in these slides have also been recorded. You can check out the videos following the links here.
- Hebrew
- English - TBD
Book
A much more detailed version of the material is also available in eBook format.
Who is this for?
- For non-programmers who might not know git and GitHub yet.
- For programmers who don't know git or GitHub.
- For programmers who know git, but don't know GitHub well enough.
- For people who know git and GitHub, but never contributed to an Open Source project.
Why do it?
- Why is it important to be able to contribute to Open Source projects?
Reasons to contribute
-
Why do people contribute to open source projects?
-
What is the motivation of people to contribute to open source projects?
-
That is the job.
-
Want to fix a bug / add a feature to something that the person uses.
-
Hope for better employment opportunities.
-
Project that used by the company has a bug, needs a feature, needs documentation, tests.
-
You want to learn something new. Doing it is the best way to learn it.
-
For fun.
You use an open source product, e.g. Firefox, VLC, WordPress, or Moodle. There is a bug that annoys you or a feature you really want to have. On one hand the promise of Open Source is that you can make those changes both technically as you have access to the source code and legally as you have the proper licenses. On the other hand these are large and very complex applications and are probably written in a programming language that you are not familiar with.
How to contribute to a large, established Open Source project?
I think we can assume that there are no simple bugs or simple missing features in large established project. What they are
Scratch your own itch
Historically the most common reason to contribute to Open Source was "scratching your own itch". That is a person used a piece of open source software that had an annoying bug or the person wanted to have a feature that did not exist. With proprietary software one cannot not do anything. Even reporting the bug or submitting the feature request is almost impossible in most of the cases. Getting the company that developed the software to acknowledge and fix the problem hardly ever happens.
With open source it is usually very easy to report the problem and if the person has the technical knowledge then, at least in theory the bug can be fixed or the feature can be added.
In reality it can be a lot more complex than "just fix it" depending on the complexity of the project and the culture around it, but regarding the primary motivation, this is probably the strongest. So if you would like to contribute to an open source project, one of the best direction might be to find a project you use that has some issues and fix that.
I am going to go over a number of projects to see how easy or difficult it might be to contribute to that project.
Customer support - help - documentation
Developers might think that customer support is not a fancy thing, but it turns out providing help to the users is sorely missing in most open source projects. Both in large projects such as Moodle or VLC and in smaller ones like mdbook.
There are always people who don't know how to do thing with the software or who encounter things that don't work as they expected. Each Open Source project has some forum where people can ask questions or report problems. Figuring out what is the solution and verifying if reported problem is indeed a problem with the product takes a lot of time. Taking that off the hands of the core developers will help them a lot, and it is also an excellent way to make yourself familiar with the application and the code base.
In many cases the reported issue comes in because the user did not find the documentation describing how that part works. If there is no such documentation, then this is an opportunity to add it. If there is such documentation then maybe the wording has to be adjusted. There are many cases where user are not familiar with the jargon used in the project or are for some other reason they are using words different from what you have in the documentation.
For example recently I was trying to figure out as a reader how to get notification when new version is published but I could not find the answer. I sent an email to their support. Within a few hours I got be a link to explanation and a note, that they have updated the response with the keywords I used.
We are all different, we use different words, most of us are not native English speakers, and even the English speakers use different words or different spellings for the same thing depending on their country.
So by improving the documentation you can reduce the frustration of the users. You can reduce the time wasted on support. You can do it pro-actively writing documentation or you can do that in response to question by users.
Do you need to be a programmer to contribute to open source projects?
A common misconception is that only programmers can contribute to open source project. Being a programmer of course make it possible for you to make changes to the source code of the application, but there are tons of other things that need to be done in a project. Especially if it is a large, end-user facing project such as Firefox, VLC, Moodle, or mdbook.
A few of the areas where one can help:
- QA - Quality Assurance. There is always a need to help checking the quality of these open source projects.
- Someone needs to act as the product manager trying to understand what the users would like to have and if that's something the product should actually do.
- Someone needs to provide customer support.
- Someone needs to write documentation.
- Maybe there is a need for creating "marketing material". That could be a nice web site for the project, a logo, nice images etc.
- There is also a need to help with fundraising. Many open source projects and many open source developers could do a lot more if they got some payment for their time.
Overview: Git - GitHub - Travis-CI
- Git - the most popular Open Source Distributed VCS
- GitHub - the most popular cloud-based hosting service for Git repositories.
- BitBucket
- GitLab
- Pull-Request
- Travis-CI - cloud-based Continuous Integration service for GitHub based projects.
Why use a Version Control System - VCS?
- Replace manual versioning.
- Easier collaboration.
- Easy to look at history and go back to earlier versions.
- Safe to experiment.
- Safe to delete old stuff.
Why Git?
- Most popular Open Source VCS
- Distributed VCS (DVCS)
- De-facto standard. Now.
Why GitHub?
- GitHub
- Cloud based hosting for Git repositories
- Public vs Private
- fork
- pull-request
CI = Continuous Integration
- Continuous Integration with Travis-CI
- Unit and integration tests
- Appveyor CI for Windows
- Circle-CI
Travis-CI
- Travis-CI
- Cloud based Continuous Integration service for GitHub based projects.
- Open projects free, closed projects $$
- Virtual Machine for each push and for each pull-request
- Run any code to check your project.
- Usually automated tests
Register on GitHub
- GitHub
- Privacy!
- e.g. use: username+github@gmail.com
Hacktoberfest
GitHub names
- The "official" repository of a project.
clone
of the "official" repo on the computer of the main developer.fork
the project (copy to your GitHub account).clone
yourfork
to your computer.
Task: Edit the README file
- edit-readme
- Edit the README.md on GitHub adding your name to it. Then send a Pull-request.
- Create a file YOUR-NAME.md with some content and send a PR.
- GitHub flavored Markdown
- Raw
- Edit file
- Send Pull Request
Task: Edit a CSV file
- What is a CSV file?
- edit-csv
- Edit the file on GitHub adding the name of Snow white both in English and Hungarian.
- Fork manually!
- .travis.yml
- test.py
- Observe Travis
Task: Edit a JSON file
- What is a JSON file?
- edit-participants
- Edit the file on your computer adding your username and name. Send a PR.
Git
-
GitHub: visit edit-participants and fork the project.
-
git clone git@github.com:cm-demo/participants.git
-
git branch add-myself
-
git checkout add-myself
-
edit the participants.json file
-
git status
-
git diff
-
git add participants.json
-
git commit -m "some excuse"
-
git push
-
git push --set-upstream origin add-myself
-
GitHub: Send Pull-Request
-
gitk --all
-
git pull
-
git remote
-
git rebase
Task: Update Code-Maven articles or these slides
-
slides is the repository of some of my slides.
-
slider-py is the code that generates the slides.
-
code-maven.com is the repository of all the content.
-
Code-Maven workshops and on GitHub
-
If you have encountered any issue with the slides or with the (public) articles, you can help me fix them.
Task: Code and Talk
- Code and Talk.
- GitHub repository.
- Pick an event from the
missing_events.md
file (not all the entries are events, some might be already included). - Add the JSON file representing the event to the
data/events
directory. - See EVENTS for details.
Task: Awesome for beginners and non-programmers
- Awesome for beginners.
- Awesome for non-programmers
- List of projects with tasks that can be done by beginner contributors.
Task: Pydigger
- PyDigger
- Look at the stats page and find a recent PyPI package on GitHub without Travis-CI.
- Try to use it.
- If it has tests, configure Travis-CI, add .travis.yml
- If no tests, then first write a test and then set up Travis-CI.
Testing and CI
- Libraries is relatively easy
- Plugins is harder
- UI/GUI testing is to be avoided
- Setup-teardown is usually the hard part
- Mocking
- CI - Continuous Integration
- CD - Continuous Delivery (or even Deployment)
Projects
Some large or well-know Open Source projects and how to to contribute to them.
How to contribute to Moodle?
Moodle is web-based open source learning platform aka. Learning Management System (LMS) written in PHP. Most people encounter it as students or teachers at some academic institution. For example at the Weizmann Institution of Science we use Moodle to communicate with students. Announce assignments, grade the assignments. It also collects the recordings from the lectures and students can access the recording from their Moodle account.
I also encountered it at Azrieli College of Engineering in Jerusalem when I taught the Open Source Development Course in a semester.
The gap
One of the difficulties with such as application is that when you encounter a problem or when you miss some functionality you don't know why.
It is unclear if the problem you encounter is due to the decision of the local admins, due to using an old version of Moodle, or a real issue with that you also have in the development version of Moodle.
So you don't know if you need to ask the local admins to change the configuration or you need to ask them to upgrade the version of Moodle or someone really needs to send a fix to the developers of Moodle?
If you find out that it is really an issue with Moodle and you fix it, how long will it take till it is released and the administrators upgrade the local installation of Moodle?
So I guess, unless you have a long-term need of that Moodle it will be a hard sell to get you invest energy in improving it.
Steps to take
In any case if you as a student, teacher, or administrative worker encounter a problem, probably the best thing is to get help from the system administrators of the local installation or your local support people.
If they can't help then you might want to explore the various support channels of Moodle.
Who will contribute to Moodle?
Based on this my expectation is that only those who work as system administrators who install Moodle for their educational institutions will have any interest contributing up-stream to the Moodle project. They have control over the installation and thus they can decide when to upgrade. So if there is an improvement in Moodle they are the ones who can bring that change to the local user-base.
There might be a case to get a bunch of Computer Science (or programming) students and as part of their course teach them how to contribute to Moodle, but this needs investment from the teacher and maybe from the core developers of Moodle as well.
What to contribute to Moodle?
On the web site of Moodle you can find many Moodle communities in various languages. For many contributors probably the first step is to get involved in the local community and help them.
Maybe get involved with the localization (translation) efforts.
There is a link to the documentation, that, I guess, always needs improvement.
Moodle has a plugin system that probably means you can add features by developing or improving a plugin. So it might not be necessary to get involved in the main project to have an impact.
How to contribute to nmap?
As described on its web site nmap is a Network exploration tool and security / port scanner.
It is a command line tool that also has an official GUI called Zenmap
As a Cyber Security expert you are probably already familiar with it or if not yet then this is a good opportunity to learn how to use it.
In any case the first step to contribute to a project is to learn how to use it. Try to accomplish various tasks. This will bring up questions. Try to find answers to those question. If you cannot find a good answer in the official documentation, that can be the first thing to contribute. Once you learn the correct answer.
Visiting the project web site I noticed that it is basically maintained by a single person as shown on the about/contact page
Under docs I see it has its documentation translated to 15 languages. Is yours among them? Does that translation need help. (They almost always do.) If your language is not there, maybe consider doing the translation.
Visiting the GitHub repository of nmap, I saw there are several languages used in the repository: C, Lua, C++, Shell, Python. This both provides opportunity to more people to contribute, but might also make it a lot more difficult to contribute code as you might need to be familiar with more than one language.
It also seems that the project actually uses Subversion as its main repository and the GitHub is only a mirror. However there are some 276 Open and 617 Closed Pull-requests on the GitHub repository so apparently you don't need to deal with Subversion in order to contribute.
There are also 591 Open issues that need attention.
Running zenmap on the command line revealed that it is written in Python. It seems to be in the same repository as zenmap itself.
Apparently nmap was featured in some really high-profile movies, such as the Matrix and Ocean's 8.
PyPI
PyPI is the central registry of all the 3rd party Python libraries. When you use pip
or any other tool to install a dependency, by default they consult the API of PyPI to get the distribution and all of its dependencies.
When people release a new version of their Python package they upload it to PyPI.
So when people talk about contributing to Python they usually talk about improving one of the packages and uploading it to PyPI, but who maintains PyPI? Can one contribute to it?
If you visit PyPI and scroll to the bottom you can see that it is available in a number of languages including Hebrew, which indicates it should also support RTL (Right-to-left) rendering. Those translations need maintenance and more translations could be added.
Also at the bottom of the page I found a link to the warehouse in the GitHub organization of PyPI. That organization has a number of other projects in it as well, but looking at the warehouse one can see that there are 75 open pull-requests and 441 open issues. There are certainly things to do.
One of the nice things about working on a project like PyPI itself is that you can also get involved in the operation aspect of a real high-load system. It is not like contributing to a framework which might be important and satisfying, but quite distant from the operations.