article-seirdy-an-experiment-to-test-github-copilot-s-legality.mw - tgtimes - T… | |
git clone git://bitreich.org/tgtimes git://enlrupgkhuxnvlhsf6lc3fziv5h2hhfrinws… | |
Log | |
Files | |
Refs | |
Tags | |
README | |
--- | |
article-seirdy-an-experiment-to-test-github-copilot-s-legality.mw (11221B) | |
--- | |
1 .SH seirdy | |
2 An experiment to test GitHub Copilot's legality | |
3 .2C 157v | |
4 . | |
5 .QP | |
6 This article was posted on 2022-07-01 by Rohan Kumar | |
7 .FS | |
8 https://seirdy.one/posts/2022/07/01/experiment-copilot-legality/ | |
9 gemini://seirdy.one/posts/2022/07/01/experiment-copilot-legality/index.g… | |
10 .FE | |
11 and is now republished on this newspaper, with permission (CC-BY-SA 4.0). | |
12 . | |
13 . | |
14 .IP "Preface" | |
15 . | |
16 .PP | |
17 I am not a lawyer. | |
18 This post is satirical commentary on: | |
19 . | |
20 .IP \(bu | |
21 The absurdity of Microsoft and OpenAI's legal justification for GitHub C… | |
22 . | |
23 .IP \(bu | |
24 The oversimplifications people use to argue against GitHub Copilot (I do… | |
25 . | |
26 .IP \(bu | |
27 The relationship between capital and legal outcomes. | |
28 . | |
29 .IP \(bu | |
30 How civil cases seem like sporting events where people “win” or “l… | |
31 . | |
32 .PP | |
33 In the process, I intentionally misrepresent how the judicial system wor… | |
34 I portray the system the way people like to imagine it works. | |
35 Please don't make any important legal decisions based on anything I say. | |
36 . | |
37 .PP | |
38 The only section you should take seriously is “Context: | |
39 the relevant technologies”. | |
40 . | |
41 . | |
42 .IP "Introduction" | |
43 . | |
44 .PP | |
45 GitHub is enabling copyleft violation \fBat scale\fR with Copilot. | |
46 GitHub Copilot encourages people to make derivative works of source code… | |
47 This facilitates the creation of permissively-licensed or proprietary de… | |
48 . | |
49 .PP | |
50 Unfortunately, challenging Microsoft (GitHub's parent company) in court … | |
51 their legal budget probably ensures their victory, and they likely alrea… | |
52 How can we determine Copilot's legality on a level playing field? We can… | |
53 . | |
54 .PP | |
55 A chat with Matt Campbell about a speech synthesizer gave me a horrible … | |
56 I think I know a way to find out if GitHub Copilot is legal: | |
57 we could use its legal justification against another software project wi… | |
58 Specifically, against a speech synthesizer. | |
59 The outcome of our actions could set a legal precedent to determine the … | |
60 . | |
61 .PP | |
62 Context: the relevant technologies | |
63 Let's cover the technologies and actors at play before I start my evil m… | |
64 . | |
65 . | |
66 .IP "Exhibit A: GitHub Copilot" | |
67 . | |
68 .PP | |
69 GitHub Copilot is a predictive autocompletion service for writing softwa… | |
70 It's powered by OpenAI Codex, | |
71 .FS | |
72 https://openai.com/blog/openai-codex/ | |
73 .FE | |
74 a language model based on GPT-3. | |
75 .FS | |
76 https://en.wikipedia.org/wiki/GPT-3 | |
77 .FE | |
78 It was trained using the source code of public repositories hosted on Gi… | |
79 In response to a Request for Comments from the US Patent and Trademark O… | |
80 .FS | |
81 See Comment Regarding Request for Comments on Intellectual Property Prot… | |
82 for Artificial Intelligence Innovation submitted by OpenAI to the USPTO. | |
83 https://www.uspto.gov/sites/default/files/documents/OpenAI_RFC-84-FR-581… | |
84 .FE | |
85 . | |
86 .PP | |
87 Many of the code snippets it suggests are exact copies of source code fr… | |
88 For an example, see this tweet: | |
89 I don't want to say anything but that's not the right license Mr Copilot. | |
90 .FS | |
91 https://nitter.net/mitsuhiko/status/1410886329924194309 | |
92 https://twitter.com/mitsuhiko/status/1410886329924194309 | |
93 .FE | |
94 by Armin Ronacher | |
95 .FS | |
96 https://lucumr.pocoo.org/about/ | |
97 .FE | |
98 It contains a screen recording of Copilot suggesting this Quake code. | |
99 .FS | |
100 https://github.com/id-Software/Quake-III-Arena/blob/master/code/game/q_m… | |
101 At line 552 | |
102 .FE | |
103 When prompted to do so, it obediently fills in a permissive license. | |
104 That permissive license violates the Quake code's GPL-2.0 license. | |
105 Copilot provides no indication that a license violation is taking place. | |
106 . | |
107 .PP | |
108 GitHub performed its own research into the matter. | |
109 .FS | |
110 I doubt anybody worth their salt would count on a company to hold itself | |
111 accountable, but at least they tried. | |
112 .FE | |
113 You can read about it on their blog: | |
114 GitHub Copilot research recitation, | |
115 .FS | |
116 https://github.blog/2021-06-30-github-copilot-research-recitation/ | |
117 .FE | |
118 by Albert Ziegler. | |
119 .FS | |
120 https://github.com/wunderalbert | |
121 .FE | |
122 I'm not convinced that it accounts for the fact that suggested code migh… | |
123 . | |
124 . | |
125 .IP "Exhibit B: The Eloquence speech synthesizer" | |
126 . | |
127 .PP | |
128 I recently had a chat with Matt on IRC about screen readers and differen… | |
129 I mentioned that while I do like some variety, I always find myself retu… | |
130 .FS | |
131 https://github.com/espeak-ng/espeak-ng/ | |
132 .FE | |
133 He shared some of my fondness, and also shared his preference for a simi… | |
134 . | |
135 .PP | |
136 Downloads of Eloquence are easy to find (it's even included with the JAW… | |
137 Nuance acquired Eloquent Technology, the developer of Eloquence. | |
138 Microsoft later acquired Nuance. | |
139 . | |
140 . | |
141 .IP "Eloquence sample audio" | |
142 . | |
143 .PP | |
144 Matt recorded this sample audio clip of Eloquence reading some text. | |
145 .FS | |
146 https://seirdy.one/a/eloquence.mp3 | |
147 .FE | |
148 The text is from the introduction of Best practices for inclusive textua… | |
149 .FS | |
150 https://seirdy.one/posts/2020/11/23/website-best-practices/ | |
151 .FE | |
152 . | |
153 .QP | |
154 My primary focus is inclusive design. | |
155 Specifically, I focus on supporting underrepresented ways to read a page. | |
156 Not all users load a page in a common web-browser and navigate effortles… | |
157 Authors often neglect people who read through accessibility tools, tiny … | |
158 I list more niches in the conclusion. | |
159 Compatibility with so many niches sounds far more daunting than it reall… | |
160 if you only selectively override browser defaults and use plain-old, sem… | |
161 . | |
162 .PP | |
163 I like the Eloquence speech synthesizer. | |
164 It sounds similar to the robotic yet predictable voice of my beloved eSp… | |
165 Unfortunately, Eloquence is proprietary. | |
166 . | |
167 . | |
168 .IP "Exhibit C: Deep learning speech synthesis" | |
169 . | |
170 .PP | |
171 Deep learning speech synthesis | |
172 .FS | |
173 https://en.wikipedia.org/wiki/Deep_learning_speech_synthesis | |
174 .FE | |
175 is a recent approach to speech synthesizer creation. | |
176 It involves training a deep neural network on voice samples, and using t… | |
177 One synthesizer using deep learning speech synthesis is Mozilla's TTS. | |
178 .FS | |
179 https://github.com/mozilla/TTS | |
180 .FE | |
181 . | |
182 .PP | |
183 Zero-shot approaches could allow a pre-trained model to generate multipl… | |
184 YourTTS | |
185 .FS | |
186 https://doi.org/10.48550/arXiv.2112.02418 | |
187 .FE | |
188 is one such example. | |
189 This could allow us to synthetically re-create a person's voice more eas… | |
190 . | |
191 . | |
192 .IP "My horrible plan" | |
193 . | |
194 .PP | |
195 My horrible plan revolves around going through two different lawsuits to… | |
196 . | |
197 .PP | |
198 If this succeeds, we have new legal justification that GitHub Copilot is… | |
199 It's a win-win situation. | |
200 . | |
201 . | |
202 .IP "Part One: set a precedent" | |
203 . | |
204 .IP 1. | |
205 Train a modern text-to-speech (TTS) engine using the voice a proprietary… | |
206 Keep the model's internals hidden. | |
207 . | |
208 .IP 2. | |
209 Then release the final TTS under a permissive license. | |
210 Remember, we're still keeping the machine-learning model hidden! | |
211 . | |
212 .IP 3. | |
213 Wait for that company to file suit. | |
214 .FS | |
215 If the stars align, you could file an anticipatory suit against the comp… | |
216 It's common for declaratory judgement regarding intellectual property ri… | |
217 https://en.wikipedia.org/wiki/Declaratory_judgment | |
218 .FE | |
219 . | |
220 .IP 4. | |
221 Win or lose the case. | |
222 . | |
223 . | |
224 .IP "Part Two: use that precedent against Microsoft's Nuance" | |
225 . | |
226 .PP | |
227 Our goal here is to get the same legal outcome as the low-stakes “tria… | |
228 . | |
229 .PP | |
230 Microsoft owns Nuance. | |
231 Nuance previously bought Eloquent Technology, the developers of the Eloq… | |
232 . | |
233 .IP 1. | |
234 Repeat Part One against Nuance speech synthesizers, including Eloquence. | |
235 Go to court. | |
236 . | |
237 .IP 2. | |
238 Have the ruling from Part One cited as legal precedent. | |
239 . | |
240 .IP 3. | |
241 Achieve the same outcome as Part One, demonstrating that we have indeed … | |
242 . | |
243 . | |
244 .IP "Implications of the outcomes" | |
245 . | |
246 .PP | |
247 If we \fIwin\fR both cases: | |
248 Microsoft has the legal high ground. | |
249 Making a derivative of a copyrighted work using a machine-learning algor… | |
250 . | |
251 .PP | |
252 If we \fIlose\fR both cases: | |
253 Microsoft does not have the legal high ground. | |
254 We have good judicial precedent against Microsoft to use when filing sui… | |
255 . | |
256 .PP | |
257 Either way, it's an absolute win for free software. | |
258 Taking down Copilot protects copyleft from enabling proprietary derivati… | |
259 But if we accidentally win these two low-stakes “test” cases, we sti… | |
260 we can liberate huge swaths of proprietary software, starting with speec… | |
261 . | |
262 . | |
263 .IP "Update: on satire" | |
264 . | |
265 .PP | |
266 This post isn't “satire through-and-through” like something from The… | |
267 Rather, my intent was to make some clear points, but extrapolate them to… | |
268 I don't think I was clear enough when doing this. | |
269 I'm sorry. | |
270 . | |
271 .PP | |
272 Copilot has been found to suggest significant amounts of code that is da… | |
273 It does this without disclosing obligations that come with those works' … | |
274 Training a model on copyrighted works may not be wrong in and of itself;… | |
275 Copilot's users could apply proprietary licenses to the generated works,… | |
276 . | |
277 .PP | |
278 When a tool almost exclusively encourages problematic behavior, the make… | |
279 GitHub and OpenAI have not demonstrated a sufficiently careful approach. | |
280 . | |
281 .PP | |
282 I don't think that “going after” a smaller player just to manipulate… | |
283 The fact that this idea seems plausible to some of my readers shows how … | |
284 Even if it's accurate (I doubt it's accurate, but I'm not certain), it's… | |
285 Judicial systems incentivise too much predatory behavior. | |
286 . | |
287 . | |
288 .IP "Corrections" | |
289 . | |
290 It's come to my attention that Eloquence may or may not still belong to … | |
291 Further research is needed. | |
292 Eloquent Technology was acquired by SpeechWorks in 2000. |