Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tooling Landscape: Alignment with OpenChain terminology #61

Open
misappi opened this issue Sep 3, 2019 · 14 comments
Open

Tooling Landscape: Alignment with OpenChain terminology #61

misappi opened this issue Sep 3, 2019 · 14 comments
Assignees

Comments

@misappi
Copy link
Collaborator

misappi commented Sep 3, 2019

Hi,
I did a comparison between the glossary of the tooling landscape (TL) and the terminology around tooling of the OpenChain project (OC). I propose the following changes in the glossary (I can to the changes, but first I would like to discuss that in the community):

  • "Component Analysis Service" -> "Component Scanner".
    This tool scan software (source code or binaries) to identify the contained (open source) components. OC distinguishes between binary scanners and source code scanners. IMO, we do not need to make this distinction in the TL. In the TL we should IMO distinguish between the identification of contained components and the identification of the corresponding licenses (OC does this, TL didn't do it so far).

For documentation purposes I list additional differences between the TL and OC terminology below. I do not think that we should keep the TL terminology in these cases but want to document this decision:

OpenChain Tooling Landscape Remark
License Scanner License & Copyright Scanner TL term seems more appropriate
Notices File FOSS Compliance Bundle Though the TL term is more complex it explains the object better
Component Catalogue Product Metadata Repository, Component Metadata Repository, License Metadata Repository The OC terminology is quite coarse grained. The TL terminology is more explicit and more detailed. Therefore, I prefer the TL terms.

Thanks,
Michael

@blaumeiser-at-bosch
Copy link
Collaborator

I am a bit confused, you say that you would like to change the terminology towards OC, but your remarks indicate that the TL definitions are more accurate.

Concerning the component analysis service, the intention was, to express that there is a service that runs the scan, not only a tooling, because IMO this is a manual step which requires an expert to deal with the raw result coming from the tool, the License and Copyright Scanner. The point you make about identification is, I admit, a bit hidden. The "Build Tool" description contains a phrase: "During this process, the build technology has a technology dependent way to identify and provide dependencies needed to build and run the software. This information is used to run the compliance check on the project." I agree that we could highlight this a bit more.

@misappi
Copy link
Collaborator Author

misappi commented Sep 6, 2019

@blaumeiser-at-bosch : Thanks for your remarks. Just for clarification: I looked at OC to figure out where it could makes sense to adopt some of the OC components and/or terminology into our TL and where that doesn't make sense (and to document this decision).

IMO, the description of the"build tool" is OK, but the description of the "component analysis service" is IMO misleading. With the current description the service is a collection of tools that provide license information (at least this is my understanding). I question that we need such a component in addition to the "license & copyright scanner" (which is another component in the landscape). If the "component analysis service" is mainly a manual service we should discuss if we want to model manual services in the landscape and - if we want this - adjust the description.

Do you agree to the content I put in the table - the things that we should not adopt from OC?

@shanecoughlan
Copy link
Collaborator

I would strongly suggest we align with OpenChain terminology wherever possible. OpenChain will shortly be the ISO standard for open source compliance. Any deviation in terminology will serve to cause confusion and require constant re-explanation to new stakeholders.

@blaumeiser-at-bosch
Copy link
Collaborator

@misappi : My reason to add this as a service was, that there is a break in tooling, because someone has to do something manually, which is not automated. If you model the flow in a sequence diagram, without this you would see that at someplace the license & copyright scanner is called, he comes back with a result and this is directly used in the compliance automation. Perhaps we can model this differently, but IMO we have to express this break in the workflow. Perhaps, if Thomas would model this, he would do it differently, because ORT, e.g., gathers the information completely and generates a report, that is then analysed in a later step on a project basis, but, e.g., my understanding of how MCJ is using SW360 with Fossology, there is some automation by sending the request automatically to Fossology, but in the end, someone has to clean up the raw data and come back with an analysis report.

@misappi
Copy link
Collaborator Author

misappi commented Sep 6, 2019

@blaumeiser-at-bosch : I don't mind having "component analysis service" in the model, but then we need to rephrase the description to make it clearer.

@blaumeiser-at-bosch
Copy link
Collaborator

@shanecoughlan : I principally agree with you, but from the findings from @misappi I would recommend to reconsider the terminology in OC because at least the names imply a subset of what is really needed from my point of view. Why?

  • Because we need the copyright information as well
  • The notices file might not be enough to fulfil compliance
  • We have the three kinds of information + metadata, Licenses, Component Releases and Deliverable products and their trace to used component releases.

I have not looked into the definition of OC and since a name is only a tag for a concept, it might be ok, but at least with the Notices File, my impression is that there is a conceptual mismatch to what we need, e.g., the sources that need to be passed to a customer because of requirements in the license.

@misappi
Copy link
Collaborator Author

misappi commented Sep 6, 2019

@shanecoughlan : Regarding the differences I list in the table:

  • 1 and 2 is more or less only about the naming
  • 3 is - also - about the structure. IMO we could have a component "component catalogue" in the TL that has the mentioned TL components as sub components.

@shanecoughlan
Copy link
Collaborator

@misappi and @blaumeiser-at-bosch, I think we might be talking at cross purposes here.

Let me set context for OpenChain. From the spec:
“This specification defines the key requirements of a quality Open Source license compliance program. The objective is to provide a benchmark that builds trust between organizations exchanging software solutions comprised of Open Source software. Specification conformance provides assurance that a Program has been designed to produce the required Compliance Artifacts (i.e., legal notices, source code and so forth) for each software solution. The OpenChain Specification focuses on the “what” and “why” aspects of a Program rather than the “how” and “when”. This ensures flexibility for different organizations of different sizes in different markets to choose specific policy and process content that fits their size, goals and scope.”

This specification has been developed in a four year process with over 150 contributors, including significant contributions from companies involved in this tooling group prior to beginning the larger process of collaborating with our Japanese and other stakeholders. Each term used in the OpenChain Spec has been discussed at length by technical, legal and management representatives from companies of different sizes across three continents. The terms are purposeful and intended to provide clarity in the context of managing open source license obligations globally. We cannot and should not seek to unilaterally redefine any aspects of this industry standard outside of the established processes: the work teams, the calls and the face to face meetings of the Steering Committee. It has been mindfully created, recorded and can be audited by any party to ensure each decision has a clear beginning and end point. The primary “single source of truth” is the specification mailing list.

Illustrating one example from the beginning of this thread:

License Scanner or License & Copyright Scanner
Comment was:
“TL term seems more appropriate”

Not from a legal perspective. Each and every open source license is a copyright license, applying terms on distribution of the code (and not otherwise), with those terms potentially including references to other matters such as patents but at no point deviating from the mechanism of legal rights provided under copyright. The mechanism is simple: to distribute the copyrighted third party IP you need to agree to the conditions of the IP holder as described in the license.

The TL phase identified is redundant and potentially confusing, suggesting that the licenses and copyright may be two different things.

The the key item that we should align around is the specification here:
https://wiki.linuxfoundation.org/_media/openchain/openchainspec-2.0.pdf
(Version 2 of the industry standard for oss compliance, already heading into ISO)

The key area of the OpenChain specification itself around Section 3.1:
3.1 Bill of Materials
A process exists for creating and managing a bill of materials that includes each Open Source component (and its Identified Licenses) from which the Supplied Software is comprised.
Verification Material(s):
 3.1.1 A documented procedure for identifying, tracking, reviewing, approving, and archiving
information about the collection of Open Source components from which the Supplied
Software is comprised.
 3.1.2 Open Source component records for the Supplied Software that demonstrates the
documented procedure was properly followed.
Rationale:
To ensure a process exists for creating and managing an Open Source component bill of materials used to construct the Supplied Software. A bill of materials is needed to support the systematic review and approval of each component’s license terms to understand the obligations and restrictions as it applies to the distribution of the Supplied Software.

From a quick read, Component Catalogue is not in the defined terms of the Specification itself so there may be flexibility around how we address it in FAQs and so forth. However, the appropriate space to discuss that is to the international audience of hundreds of companies engaging with OpenChain, not in an isolated manner on this issue. In Japan alone we have 68 companies - mostly multinationals - syncing their efforts and building shared consensus regarding how we can refer to, share information and develop practical processes - some tooling processes, some legal processes, some business processes - that will be shared with thousands of companies across the global supply chain. In two weeks we have over a dozen Chinese companies in Shenzhen doing a face to face to discuss the same. In Taipei local companies meet two days later. Meanwhile, global activities such as SPDX are seeing co-development or production ready SBOMs using SPDX-Lite, again explicitly in collaboration between continents.

We see usage of the Component Catalogue terminology from OpenChain members such as Toshiba discussing sw360 in this presentation:
https://events.linuxfoundation.org/wp-content/uploads/2018/07/OpenSourceSummitJapan_final.pdf
Here they are expressing how sw360 is a repository of Software Bill of Materials they derive from using the FOSSology license scanner. The best course forward is to reach out to these stakeholders, cite where terms are used, and build consensus or clarification from there. However, we should not be assuming that any term used in the context of OpenChain has been chosen without detailed open discussion with an open invitation to all parties over a period of years. Each choice has been made in the context of open source license compliance - legal, technical and business process workshop, and not confined to one area such as tooling. We cannot seek to arbitrarily change these choices at this juncture.

@misappi
Copy link
Collaborator Author

misappi commented Sep 6, 2019

From a quick read, Component Catalogue is not in the defined terms of the Specification itself so there may be flexibility around how we address it in FAQs and so forth.

@shanecoughlan : I took the OC components from the OC curriculum slides.

@shanecoughlan
Copy link
Collaborator

@misappi, Adding some notes for the rest of the thread.
The definition was taken from Chapter 10, slide 41 onward: Tooling Types.

Component Catalogue: Introduction
 Purpose:
 Collect information about used software components and their use in
products or projects is centrally collected and can be reused
 Other purposes:
 A component catalogue captures also the used components in a product or
project, maintains a so-named BOM
 Also interesting:
 Enables also vulnerability management or reuse of export classifications

Component Catalogue: Solved Problem
 Problem: Once analysed component w.r.t. license compliance shall not require repeated analyses, but reuse of information shall be possible
 Component catalogue:
 Maps component usage in products or projects
 Makes sense if an organisation has actually multiple products
 Shows organisation the important software components
 Allows for a comprehensive overview about involved licensing per product

Component Catalogue: Technical
 A component catalogue can be viewed as a portal
 Database holding the catalogue information
 Another use case is archiving OSS distributions / source code
 Storing also multiple other files,
for example license analysis reports, SPDX files
 Provides reporting output, for example OSS product documentation
 Component catalogue can be implemented as Web portal, thus accessible
from various client computers in organisation

Component Catalogue: More Remarks
 Component catalogue can be integrated with other license compliance tooling: scanners can directly feed the analyses
 Also integration in Dev Ops tooling is useful to automatically create BOM of products
 Component catalogues can also serve uses cases for vulnerability management
 Another related topic is license management and license metadata

Personally, I do not see any overtly coarse-grained or confusion material with respect to this definition in the slides:
A Component Cateloge:
"Collect information about used software components and their use in
products or projects is centrally collected and can be reused"
+
"A component catalogue captures also the used components in a product or
project, maintains a so-named BOM"
+
"Enables also vulnerability management or reuse of export classifications"

I do not see a benefit in confining the discussion to "Product Metadata Repository, Component Metadata Repository, License Metadata Repository"

Note: the above tooling slides were created in a collaboration between principles in the Software Compliance Academy: The principles in that organization are @mcjaeger, Catharina Maracke (Lawyer) and Miriam Ballhausen (Lawyer).

@shanecoughlan
Copy link
Collaborator

Adding @OliverFendt

@blaumeiser-at-bosch
Copy link
Collaborator

@shanecoughlan Thank you Shane for the clarifications! I think this is valuable input for our discussion and I understand now better the idea behind the terms.

@OliverFendt
Copy link
Contributor

Hi
regarding the "Component Analysis Service" in the Wednesday afternoon meeting there was the proposal to rename this to "Forensic Code Analysis Service". I personnally think that the proposed name is more accurate. What do you think?

Concerning the entire discussion I will walk through and provide my view

@jthDEV
Copy link
Collaborator

jthDEV commented Sep 6, 2019

Hi,
just to add to the confusion, I would want to point out that by selecting the term you somehow indicate on what you want to achieve. I would distinguish between three approaches:

  1. identify the declared license => Component Analysis (outside view)
  2. identify effective license => Source code scan of component (inside view)
  3. identify provenance and security => forensic source code scan (further assessing inside view)

This is not a given definition but would be my internal matching...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants