I have 42 repos that are now going through a process (not THE process, because I’m pretty sure they make this up as they go…) to be released as open source.
I needed to add a license comment to javascript, python, yaml, ruby, erb, haml, json, xml, bash shell, and groovy source code files.
Each comment takes a different form and in some cases we want to sidecar a filename.license file with the license info.
And they need to be validated and checked and then checked back in.
So, I wrote a script. It’s not refactored and it is truly hacking together a workable format for doing this but it works.
The process is
- create a directory licensing
- cd licensing
- clone open_source repo from Nebula org
git clone --recurse-submodules reponame
./license_inserter.sh pipeline-demo
git checkout -b add_license git add -A git commit -m "add licensing" git push --set-upstream origin add_license
The script:
! /bin/bash ### 1. gather a list using grep of this type of file from the file extension ### 2. parse the list and add license after shebang or at top of file or as a sidecar file containing the license info (json, xml) ### 3. check matster_list and NOT_licensed_list for any mis-oidentified or missed files ### 4. remove the created files ### 5. git checkout -b add_license ### 6. PR add_license to rc in Nebula ### ARGS # $1 is the target directory if [[ (-z "$1") ]] then echo "USAGE: $0" exit 1 fi echo "" echo "FILES..." echo "" read -p "Press enter to continue and create lists of files to license..." echo "" ### GATHER LISTS # instantiate master list date +%Y%m%d%H%M%S > master_list ## grab any files with license|License|LICENSE string present and list them separately for eval echo "creating list of files with license string present in them for later review" # list all files du -ak $1 | grep -v .git | awk '{ print $2 }' > /tmp/allfiles_list date +%Y%m%d%H%M%S > license_list for i in `cat /tmp/allfiles_list` do if [[ ! -d ${i} ]]; then # skip directory if [[ `grep -i license ${i}` ]]; then echo "found license string" grep -i license ${i} echo $i >> license_list fi fi done rm -f /tmp/allfiles_list echo "completed look for license string..." echo "start exec for create files" # ruby if [[ `du -ak $1 | grep -v .git | egrep ^*.rb$ | awk '{ print $2 }'` ]]; then echo "found rb files, listing..." du -ak $1 | grep -v .git | egrep ^*.rb$ | awk '{ print $2 }' > rb_list du -ak $1 | grep -v .git | egrep ^*.rb$ | awk '{ print $2 }' >> master_list else echo "failed to find rb files to list" fi # javascript if [[ `du -ak $1 | grep -v .git | egrep ^*.js$ | awk '{ print $2 }'` ]]; then echo "found js files, listing..." du -ak $1 | grep -v .git | egrep ^*.js$ | awk '{ print $2 }' > js_list du -ak $1 | grep -v .git | egrep ^*.js$ | awk '{ print $2 }' >> master_list else echo "failed to find js files to list" fi # erb if [[ `du -ak $1 | grep -v .git | egrep ^*.erb$ | awk '{ print $2 }'` ]]; then echo "found erb files, listing..." du -ak $1 | grep -v .git | egrep ^*.erb$ | awk '{ print $2 }' > erb_list du -ak $1 | grep -v .git | egrep ^*.erb$ | awk '{ print $2 }' >> master_list else echo "failed to find erb files to list" fi # haml if [[ `du -ak $1 | grep -v .git | egrep ^*.haml$ | awk '{ print $2 }'` ]]; then echo "found haml files, listing..." du -ak $1 | grep -v .git | egrep ^*.haml$ | awk '{ print $2 }'> haml_list du -ak $1 | grep -v .git | egrep ^*.haml$ | awk '{ print $2 }'>> master_list else echo "failed to find haml files to list" fi # yml if [[ `du -ak $1 | grep -v .git | egrep ^*.yml$ | awk '{ print $2 }'` ]]; then echo "found yml files, listing..." du -ak $1 | grep -v .git | egrep ^*.yml$ | awk '{ print $2 }'> yml_list du -ak $1 | grep -v .git | egrep ^*.yml$ | awk '{ print $2 }'>> master_list else echo "failed to find yml files to list" fi # sh if [[ `du -ak $1 | grep -v .git | egrep ^*.sh$ | awk '{ print $2 }'` ]]; then echo "found sh files, listing..." du -ak $1 | grep -v .git | egrep ^*.sh$ | awk '{ print $2 }'> sh_list du -ak $1 | grep -v .git | egrep ^*.sh$ | awk '{ print $2 }'>> master_list else echo "failed to find sh files to list" fi # j2 if [[ `du -ak $1 | grep -v .git | egrep ^*.j2$ | awk '{ print $2 }'` ]]; then echo "found j2 files, listing..." du -ak $1 | grep -v .git | egrep ^*.j2$ | awk '{ print $2 }'> j2_list du -ak $1 | grep -v .git | egrep ^*.j2$ | awk '{ print $2 }'>> master_list else echo "failed to find j2 files to list" fi # init if [[ `du -ak $1 | grep -v .git | egrep ^*.init$ | awk '{ print $2 }'` ]]; then echo "found init files, listing..." du -ak $1 | grep -v .git | egrep ^*.init$ | awk '{ print $2 }'> init_list du -ak $1 | grep -v .git | egrep ^*.init$ | awk '{ print $2 }'>> master_list else echo "failed to find init files to list" fi # py if [[ `du -ak $1 | grep -v .git | egrep ^*.py$ | awk '{ print $2 }'` ]]; then echo "found py files, listing..." du -ak $1 | grep -v .git | egrep ^*.py$ | awk '{ print $2 }'> py_list du -ak $1 | grep -v .git | egrep ^*.py$ | awk '{ print $2 }'>> master_list else echo "failed to find py files to list" fi # css if [[ `du -ak $1 | grep -v .git | egrep ^*.css$ | awk '{ print $2 }'` ]]; then echo "found css files, listing..." du -ak $1 | grep -v .git | egrep ^*.css$ | awk '{ print $2 }'> css_list du -ak $1 | grep -v .git | egrep ^*.css$ | awk '{ print $2 }'>> master_list else echo "failed to find css files to list" fi # sql if [[ `du -ak $1 | grep -v .git | egrep ^*.sql$ | awk '{ print $2 }'` ]]; then echo "found sql files, listing..." du -ak $1 | grep -v .git | egrep ^*.sql$ | awk '{ print $2 }'> sql_list du -ak $1 | grep -v .git | egrep ^*.sql$ | awk '{ print $2 }'>> master_list else echo "failed to find sql files to list" fi # xml if [[ `du -ak $1 | grep -v .git | egrep ^*.xml$ | awk '{ print $2 }'` ]]; then echo "found xml files, listing..." du -ak $1 | grep -v .git | egrep ^*.xml$ | awk '{ print $2 }'> xml_list du -ak $1 | grep -v .git | egrep ^*.xml$ | awk '{ print $2 }'>> master_list else echo "failed to find xml files to list" fi # groovy if [[ `du -ak $1 | grep -v .git | egrep ^*.groovy$ | awk '{ print $2 }'` ]]; then echo "found groovy files, listing..." du -ak $1 | grep -v .git | egrep ^*.groovy$ | awk '{ print $2 }'> groovy_list du -ak $1 | grep -v .git | egrep ^*.groovy$ | awk '{ print $2 }'>> master_list else echo "failed to find groovy files to list" fi # json if [[ `du -ak $1 | grep -v .git | egrep ^*.json$ | awk '{ print $2 }'` ]]; then echo "found json files, listing..." du -ak $1 | grep -v .git | egrep ^*.json$ | awk '{ print $2 }'> json_list du -ak $1 | grep -v .git | egrep ^*.json$ | awk '{ print $2 }'>> master_list else echo "failed to find json files to list" fi # jenkinsfile if [[ -f ${i}/cicd/pipeline/Jenkinsfile ]]; then echo "found cicd/pipeline/Jenkinsfile..." echo "${i}/cicd/pipeline/Jenkinsfile" > jenkinsfile_list echo "${i}/cicd/pipeline/Jenkinsfile" >> master_list fi if [[ -f ${i}/Jenkinsfile ]]; then echo "found Jenkinsfile at top level of repo (alrternate location)" echo "${i}/Jenkinsfile" >> jenkinsfile_list echo "${i}/Jenkinsfile" >> master_list fi ### MASTER LIST, NOT_licensed_list # list all other files in target repo NOT collected (looking for anomalies) echo "creating list of files NOT licensed in this run..." # grep -v -f file2 file1 # create a list of all files du -ak $1 | grep -v .git | awk '{ print $2 }' > all_list # collect files NOT processed for licensing grep -v -f master_list all_list > NOT_licensed_list # drop directories from NOT listed rm -f /tmp/NOT_temped for i in `cat NOT_licensed_list` do if [[ ! -d ${i} ]]; then echo "${i}" >> /tmp/NOT_temped fi done mv /tmp/NOT_temped NOT_licensed_list rm -f all_list echo "" echo "PARSE..." echo "" read -p "Press enter to continue & parse the list files..." echo "" ### LICENSING ADDED echo "start parsing files" ## check for each list if [[ -f rb_list ]]; then echo "found rb list" for i in `cat rb_list` do echo "$i" # check if directory if [[ -d ${i} ]]; then echo "skipping $i as directory..." else FIRSTLINE=`head -n 1 ${i}` if [ "`head -c 2 ${i}`" == '#!' ]; then echo "hashbang present" # create temp file and then add back hashbang, cat added content in, move into place with comment placed echo ${FIRSTLINE} > /tmp/temped echo "" >> /tmp/temped echo "# Copyright 2018, Oath Inc. # Licensed under the terms of the Apache 2.0 license. See LICENSE file in https://github.com/NebulaCICD/licenses/blob/master/Apache_2.0_license for terms." >> /tmp/temped echo "" >> /tmp/temped #cat ${i} >> /tmp/temped tail -n +2 ${i} >> /tmp/temped mv /tmp/temped $i else echo "no hashbang" # comment added at start to temp file, place file echo "# Copyright 2018, Oath Inc. # Licensed under the terms of the Apache 2.0 license. See LICENSE file in https://github.com/NebulaCICD/licenses/blob/master/Apache_2.0_license for terms." >> /tmp/temped echo "" >> /tmp/temped cat ${i} >> /tmp/temped mv /tmp/temped $i fi fi done fi if [[ -f js_list ]]; then echo "found js list" for i in `cat js_list` do echo "$i" if [[ -d ${i} ]]; then echo "skipping $i as directory..." else FIRSTLINE=`head -n 1 ${i}` if [ "`head -c 2 ${i}`" == '#!' ]; then echo "hashbang present" # create temp file and then add back hashbang, cat added content in, move into place with comment placed echo ${FIRSTLINE} > /tmp/temped echo "" >> /tmp/temped echo "// Copyright 2018, Oath Inc. // Licensed under the terms of the Apache 2.0 license. See LICENSE file in https://github.com/NebulaCICD/licenses/blob/master/Apache_2.0_license for terms." >> /tmp/temped echo "" >> /tmp/temped #cat ${i} >> /tmp/temped tail -n +2 ${i} >> /tmp/temped mv /tmp/temped $i else echo "no hashbang" # comment added at start to temp file, place file echo "// Copyright 2018, Oath Inc. // Licensed under the terms of the Apache 2.0 license. See LICENSE file in https://github.com/NebulaCICD/licenses/blob/master/Apache_2.0_license for terms." >> /tmp/temped echo "" >> /tmp/temped cat ${i} >> /tmp/temped mv /tmp/temped $i fi fi done fi if [[ -f erb_list ]]; then echo "found erb list" for i in `cat erb_list` do echo "$i" if [[ -d ${i} ]]; then echo "skipping $i as directory..." else FIRSTLINE=`head -n 1 ${i}` if [ "`head -c 2 ${i}`" == '#!' ]; then echo "hashbang present" # create temp file and then add back hashbang, cat added content in, move into place with comment placed echo ${FIRSTLINE} > /tmp/temped echo "" >> /tmp/temped echo "<% copyright oath inc. %> <% licensed under the terms of apache license. see license file in https: for terms. %>" >> /tmp/temped echo "" >> /tmp/temped tail -n +2 ${i} >> /tmp/temped mv /tmp/temped $i else echo "no hashbang" # comment added at start to temp file, place file echo "<% copyright oath inc. %> <% licensed under the terms of apache license. see license file in https: for terms. %>" >> /tmp/temped echo "" >> /tmp/temped cat ${i} >> /tmp/temped mv /tmp/temped $i fi fi done fi if [[ -f haml_list ]]; then echo "found haml list" for i in `cat haml_list` do echo "$i" if [[ -d ${i} ]]; then echo "skipping $i as directory..." else FIRSTLINE=`head -n 1 ${i}` if [ "`head -c 2 ${i}`" == '#!' ]; then echo "hashbang present" # create temp file and then add back hashbang, cat added content in, move into place with comment placed echo ${FIRSTLINE} > /tmp/temped echo "" >> /tmp/temped echo "-# Copyright 2018, Oath Inc. -# Licensed under the terms of the Apache 2.0 license. See LICENSE file in https://github.com/NebulaCICD/licenses/blob/master/Apache_2.0_license for terms. " >> /tmp/temped echo "" >> /tmp/temped #cat ${i} >> /tmp/temped tail -n +2 ${i} >> /tmp/temped mv /tmp/temped $i else echo "no hashbang" # comment added at start to temp file, place file echo "-# Copyright 2018, Oath Inc. -# Licensed under the terms of the Apache 2.0 license. See LICENSE file in https://github.com/NebulaCICD/licenses/blob/master/Apache_2.0_license for terms." >> /tmp/temped echo "" >> /tmp/temped cat ${i} >> /tmp/temped mv /tmp/temped $i fi fi done fi if [[ -f yml_list ]]; then echo "found yml list" for i in `cat yml_list` do echo "$i" if [[ -d ${i} ]]; then echo "skipping $i as directory..." elif [[ "${i}" =~ "installer.yml" ]]; then echo "skipping installer.yml file $i" else echo "---" >> /tmp/temped echo "" >> /tmp/temped echo "# Copyright 2018, Oath Inc. # Licensed under the terms of the Apache 2.0 license. See LICENSE file in https://github.com/NebulaCICD/licenses/blob/master/Apache_2.0_license for terms. " >> /tmp/temped echo "" >> /tmp/temped #cat ${i} >> /tmp/temped tail -n +2 ${i} >> /tmp/temped mv /tmp/temped $i fi done fi if [[ -f sh_list ]]; then echo "found sh list" for i in `cat sh_list` do echo "$i" if [[ -d ${i} ]]; then echo "skipping $i as directory..." else FIRSTLINE=`head -n 1 ${i}` if [ "`head -c 2 ${i}`" == '#!' ]; then echo "hashbang present" # create temp file and then add back hashbang, cat added content in, move into place with comment placed echo ${FIRSTLINE} > /tmp/temped echo "" >> /tmp/temped echo "# Copyright 2018, Oath Inc. # Licensed under the terms of the Apache 2.0 license. See LICENSE file in https://github.com/NebulaCICD/licenses/blob/master/Apache_2.0_license for terms. " >> /tmp/temped echo "" >> /tmp/temped #cat ${i} >> /tmp/temped tail -n +2 ${i} >> /tmp/temped mv /tmp/temped $i else echo "no hashbang" # comment added at start to temp file, place file echo "# Copyright 2018, Oath Inc. # Licensed under the terms of the Apache 2.0 license. See LICENSE file in https://github.com/NebulaCICD/licenses/blob/master/Apache_2.0_license for terms." >> /tmp/temped echo "" >> /tmp/temped cat ${i} >> /tmp/temped mv /tmp/temped $i fi fi done fi # j2 (sidecar license) if [[ -f j2_list ]]; then echo "found j2 list" for i in `cat j2_list` do echo "$i" if [[ -d ${i} ]]; then echo "skipping $i as directory..." else echo "Copyright 2018, Oath Inc. Licensed under the terms of the Apache 2.0 license. See LICENSE file in https://github.com/NebulaCICD/licenses/blob/master/Apache_2.0_license for terms." > ${i}.license fi done fi # init if [[ -f init_list ]]; then echo "found init list" for i in `cat init_list` do echo "$i" if [[ -d ${i} ]]; then echo "skipping $i as directory..." else FIRSTLINE=`head -n 1 ${i}` if [ "`head -c 2 ${i}`" == '#!' ]; then echo "hashbang present" # create temp file and then add back hashbang, cat added content in, move into place with comment placed echo ${FIRSTLINE} > /tmp/temped echo "" >> /tmp/temped echo "# Copyright 2018, Oath Inc. # Licensed under the terms of the Apache 2.0 license. See LICENSE file in https://github.com/NebulaCICD/licenses/blob/master/Apache_2.0_license for terms. " >> /tmp/temped echo "" >> /tmp/temped #cat ${i} >> /tmp/temped tail -n +2 ${i} >> /tmp/temped mv /tmp/temped $i else echo "no hashbang" # comment added at start to temp file, place file echo "# Copyright 2018, Oath Inc. # Licensed under the terms of the Apache 2.0 license. See LICENSE file in https://github.com/NebulaCICD/licenses/blob/master/Apache_2.0_license for terms." >> /tmp/temped echo "" >> /tmp/temped cat ${i} >> /tmp/temped mv /tmp/temped $i fi fi done fi # py if [[ -f py_list ]]; then echo "found py list" for i in `cat py_list` do echo "$i" if [[ -d ${i} ]]; then echo "skipping $i as directory..." else FIRSTLINE=`head -n 1 ${i}` if [ "`head -c 2 ${i}`" == '#!' ]; then echo "hashbang present" # create temp file and then add back hashbang, cat added content in, move into place with comment placed echo ${FIRSTLINE} > /tmp/temped echo "" >> /tmp/temped echo "# Copyright 2018, Oath Inc. # Licensed under the terms of the Apache 2.0 license. See LICENSE file in https://github.com/NebulaCICD/licenses/blob/master/Apache_2.0_license for terms. " >> /tmp/temped echo "" >> /tmp/temped #cat ${i} >> /tmp/temped tail -n +2 ${i} >> /tmp/temped mv /tmp/temped $i else echo "no hashbang" # comment added at start to temp file, place file echo "# Copyright 2018, Oath Inc. # Licensed under the terms of the Apache 2.0 license. See LICENSE file in https://github.com/NebulaCICD/licenses/blob/master/Apache_2.0_license for terms." >> /tmp/temped echo "" >> /tmp/temped cat ${i} >> /tmp/temped mv /tmp/temped $i fi fi done fi # css if [[ -f css_list ]]; then echo "found css list" for i in `cat css_list` do echo "$i" if [[ -d ${i} ]]; then echo "skipping $i as directory..." else # comment added at start to temp file, place file echo "/* Copyright 2018, Oath Inc. */ /* Licensed under the terms of the Apache 2.0 license. See LICENSE file in https://github.com/NebulaCICD/licenses/blob/master/Apache_2.0_license for terms.*/" >> /tmp/temped echo "" >> /tmp/temped cat ${i} >> /tmp/temped mv /tmp/temped $i fi done fi # sql if [[ -f sql_list ]]; then echo "found sql list" for i in `cat sql_list` do echo "$i" if [[ -d ${i} ]]; then echo "skipping $i as directory..." else # comment added at start to temp file, place file echo "/* Copyright 2018, Oath Inc. */ /* Licensed under the terms of the Apache 2.0 license. See LICENSE file in https://github.com/NebulaCICD/licenses/blob/master/Apache_2.0_license for terms.*/" >> /tmp/temped echo "" >> /tmp/temped cat ${i} >> /tmp/temped mv /tmp/temped $i fi done fi # xml if [[ -f xml_list ]]; then echo "found xml list" for i in `cat xml_list` do echo "$i" if [[ -d ${i} ]]; then echo "skipping $i as directory..." else # firstline will be xml_version e.g. # license comes right after that FIRSTLINE=`head -n 1 ${i}` echo "version string: ${FIRSTLINE}" # create temp file and then add back hashbang, cat added content in, move into place with comment placed echo ${FIRSTLINE} > /tmp/temped echo "" >> /tmp/temped echo " " >> /tmp/temped echo "" >> /tmp/temped #cat ${i} >> /tmp/temped tail -n +2 ${i} >> /tmp/temped mv /tmp/temped $i fi done fi # py if [[ -f groovy_list ]]; then echo "found groovy list" for i in `cat groovy_list` do echo "$i" if [[ -d ${i} ]]; then echo "skipping $i as directory..." else FIRSTLINE=`head -n 1 ${i}` if [ "`head -c 2 ${i}`" == '#!' ]; then echo "hashbang present" # create temp file and then add back hashbang, cat added content in, move into place with comment placed echo ${FIRSTLINE} > /tmp/temped echo "" >> /tmp/temped echo "// Copyright 2018, Oath Inc. // Licensed under the terms of the Apache 2.0 license. See LICENSE file in https://github.com/NebulaCICD/licenses/blob/master/Apache_2.0_license for terms. " >> /tmp/temped echo "" >> /tmp/temped #cat ${i} >> /tmp/temped tail -n +2 ${i} >> /tmp/temped mv /tmp/temped $i else echo "no hashbang" # comment added at start to temp file, place file echo "// Copyright 2018, Oath Inc. // Licensed under the terms of the Apache 2.0 license. See LICENSE file in https://github.com/NebulaCICD/licenses/blob/master/Apache_2.0_license for terms." >> /tmp/temped echo "" >> /tmp/temped cat ${i} >> /tmp/temped mv /tmp/temped $i fi fi done fi ## json # json (sidecar license) if [[ -f json_list ]]; then echo "found json list" for i in `cat json_list` do echo "$i" if [[ -d ${i} ]]; then echo "skipping $i as directory..." else echo "Copyright 2018, Oath Inc. Licensed under the terms of the Apache 2.0 license. See LICENSE file in https://github.com/NebulaCICD/licenses/blob/master/Apache_2.0_license for terms." > ${i}.license fi done fi ## Jenkinsfile at cicd/pipeline/Jenkinsfile or at top level (Jenkinsfile) if [[ -f jenkinsfile_list ]]; then echo "found jenkinsfile list" for i in `cat jenkinsfile_list` do echo "$i" if [[ -d ${i} ]]; then echo "skipping $i as directory..." else echo "// Copyright 2018, Oath Inc. // Licensed under the terms of the Apache 2.0 license. See LICENSE file in https://github.com/NebulaCICD/licenses/blob/master/Apache_2.0_license for terms." > /tmp/temped echo "" >> /tmp/temped cat ${i} >> /tmp/temped mv /tmp/temped $i fi done fi if [[ -f ${i}/cicd/pipeline/Jenkinsfile ]]; then echo "// Copyright 2018, Oath Inc. // Licensed under the terms of the Apache 2.0 license. See LICENSE file in https://github.com/NebulaCICD/licenses/blob/master/Apache_2.0_license for terms. " > /tmp/temped echo "" >> /tmp/temped cat ${i}/cicd/pipeline/Jenkinsfile >> /tmp/temped mv /tmp/temped ${i}/cicd/pipeline/Jenkinsfile echo "${i}/cicd/pipeline/Jenkinsfile" > jenkinsfile_list fi if [[ -f ${i}/Jenkinsfile ]]; then echo "// Copyright 2018, Oath Inc. // Licensed under the terms of the Apache 2.0 license. See LICENSE file in https://github.com/NebulaCICD/licenses/blob/master/Apache_2.0_license for terms. " > /tmp/temped echo "" >> /tmp/temped cat ${i}/Jenkinsfile >> /tmp/temped mv /tmp/temped ${i}/Jenkinsfile echo "${i}/Jenkinsfile" > jenkinsfile_list fi ### Verify echo "" echo "" echo "******************* VERIFY *********************" echo "" read -p "Press enter to continue and verify list files..." echo "" echo "" for i in *_list do echo "--------------------------------------------------------------" echo $i echo "--------------------------------------------------------------" cat $i echo "" if [[ ("${i}" == 'license_list') || ("${i}" == 'master_list') || ("${i}" == 'NOT_licensed_list') ]]; then echo "skipping list ${i}..." else for j in `cat ${i}` do echo "--------------------------------------------------------------" echo ${j} echo "--------------------------------------------------------------" head -n 5 ${j} echo "" done fi done ### verify before deletion echo "" echo "FILES..." echo "" read -p "Press enter to continue and delete list files..." echo "" ### CLEANUP echo "before rm section" ## delete each list if present for i in rb js erb haml yml sh j2 init py css sql xml groovy master NOT_licensed license json jenkinsfile do if [[ -f "${i}_list" ]]; then echo "found $i list, removing..." rm ${i}_list fi done %>%>%>%>
With any luck this’ll go through and get released in the next several weeks…
— doug
Updated 20190128 Monday
Or not. It looks like this will not be directly released from Oath (or Verizon Media Group) as Open Source. That’s always somewhat problematic from a company view of the world, and in this case, no joy.