[2]About Myself[3]Find Me On...[4]Open Source[5]Archive

[1]Dawid Ciężarkiewicz aka `dpc`

[2]About Myself[3]Find Me On...[4]Open Source[5]Archive

The faster you unlearn OOP, the better for you and your software

November 20, 2018

Object-oriented programming is an exceptionally bad idea which could
only have originated in California.

— Edsger W. Dijkstra

Maybe it's just my experience, but [6]Object-Oriented Programming seems
like a default, most common paradigm of software engineering. The one
typically thought to students, featured in online material and for some
reason, spontaneously applied even by people that didn't intend it.

I know how succumbing it is, and how great of an idea it seems on the
surface. It took me years to break its spell, and understand clearly
how horrible it is and why. Because of this perspective, I have a
strong belief that it's important that people understand what is wrong
with OOP, and what they should do instead.

Many people discussed problems with OOP before, and I will provide a
list of my favorite articles and videos at the end of this post. Before
that, I'd like to give it my own take.

Data is more important than code

At its core, all software is about manipulating data to achieve a
certain goal. The goal determines how the data should be structured,
and the structure of the data determines what code is necessary.

This part is very important, so I will repeat. goal -> data
architecture -> code. One must never change the order here! When
designing a piece of software, always start with figuring out what do
you want to achieve, then at least roughly think about data
architecture: data structures and infrastructure you need to
efficiently achieve it. Only then write your code to work in such
architecture. If with time the goal changes, alter the architecture,
then change your code.

In my experience, the biggest problem with OOP is that encourages
ignoring the data model architecture and applying a mindless pattern of
storing everything in objects, promising some vague benefits. If it
looks like a candidate for a class, it goes into a class. Do I have a
Customer? It goes into class Customer. Do I have a rendering context?
It goes into class RenderingContext.

Instead of building a good data architecture, the developer attention
is moved toward inventing “good” classes, relations between them,
taxonomies, inheritance hierarchies and so on. Not only is this a
useless effort. It's actually deeply harmful.

Encouraging complexity

When explicitly designing a data architecture, the result is typically
a minimum viable set of data structures that support the goal of our
software. When thinking in terms of abstract classes and objects there
is no upper bound to how grandiose and complex can our abstractions be.
Just look at [7]FizzBuzz Enterprise Edition – the reason why such a
simple problem can be implemented in so many lines of code, is because
in OOP there's always a room for more abstractions.

OOP apologists will respond that it's a matter of developer skill, to
keep abstractions in check. Maybe. But in practice, OOP programs tend
to only grow and never shrink because OOP encourages it.

Graphs everywhere

Because OOP requires scattering everything across many, many tiny
encapsulated objects, the number of references to these objects
explodes as well. OOP requires passing long lists of arguments
everywhere or holding references to related objects directly to
shortcut it.

Your class Customer has a reference to class Order and vice versa.
class OrderManager holds references to all Orders, and thus indirectly
to Customer's. Everything tends to point to everything else because as
time passes, there are more and more places in the code that require
referring to a related object.

[8]You wanted a banana but what you got was a gorilla holding the
banana and the entire jungle.

Instead of a well-designed data store, OOP projects tend to look like a
huge spaghetti graph of objects pointing at each other and methods
taking long argument lists. When you start to design Context objects
just to cut on the number of arguments passed around, you know you're
writing real OOP Enterprise-level software.

Cross-cutting concerns

The vast majority of essential code is not operating on just one object
– it is actually implementing cross-cutting concerns. Example: when
class Player hits() a class Monster, where exactly do we modify data?
Monster's hp has to decrease by Player's attackPower, Player's xps
increase by Monster's level if Monster got killed. Does it happen in
Player.hits(Monster m) or Monster.isHitBy(Player p). What if there's a
class Weapon involved? Do we pass it as an argument to isHitBy or does
Player has a currentWeapon() getter?

This oversimplified example with just 3 interacting classes is already
becoming a typical OOP nightmare. A simple data transformation becomes
a bunch of awkward, intertwined methods that call each other for no
reason other than OOP dogma of encapsulation. Adding a bit of
inheritance to the mix gets us a nice example of what stereotypical
“Enterprise” software is about.

Object encapsulation is schizophrenic

Let's look at the definition of [9]Encapsulation:

Encapsulation is an object-oriented programming concept that binds
together the data and functions that manipulate the data, and that
keeps both safe from outside interference and misuse. Data
encapsulation led to the important OOP concept of data hiding.

The sentiment is good, but in practice, encapsulation on a granularity
of an object or a class often leads to code trying to separate
everything from everything else (from itself). It generates tons of
boilerplate: getters, setters, multiple constructors, odd methods, all
trying to protect from mistakes that are unlikely to happen, on a scale
too small to mater. The metaphor that I give is putting a padlock on
your left pocket, to make sure your right hand can't take anything from
it.

Don't get me wrong – enforcing constraints, especially on [10]ADTs is
usually a great idea. But in OOP with all the inter-referencing of
objects, encapsulation often doesn't achieve anything useful, and it's
hard to address the constraints spanning across many classes.

In my opinion classes and objects are just too granular, and the right
place to focus on the isolation, APIs etc. are
“modules”/“components”/“libraries” boundaries. And in my experience,
OOP (Java/Scala) codebases are usually the ones in which no
modules/libraries are employed. Developers focus on putting boundaries
around each class, without much thought which groups of classes form
together a standalone, reusable, consistent logical unit.

There are multiple ways to look at the same data

OOP requires an inflexible data organization: splitting it into many
logical objects, which defines a data architecture: graph of objects
with associated behavior (methods). However, it's often useful to have
multiple ways of logically expressing data manipulations.

If program data is stored e.g. in a tabular, data-oriented form, it's
possible to have two or more modules each operating on the same data
structure, but in a different way. If the data is split into objects
with methods it's no longer possible.

That's also the main reason for [11]Object-relational impedance
mismatch. While relational data architecture might not always be the
best one, it is typically flexible enough to be able to operate on the
data in many different ways, using different paradigms. However, the
rigidness of OOP data organization causes incompatibility with any
other data architecture.

Bad performance

Combination of data scattered between many small objects, heavy use of
indirection and pointers and lack of right data architecture in the
first place leads to poor runtime performance. Nuff said.

What to do instead?

I don't think there's a silver bullet, so I'm going to just describe
how it tends to work in my code nowadays.

First, the data-consideration goes first. I analyze what is going to be
the input and the outputs, their format, volume. How should the data be
stored at runtime, and how persisted: what operations will have to be
supported, how fast (throughput, latencies) etc.

Typically the design is something close to a database for the data that
has any significant volume. That is: there will be some object like a
DataStore with an API exposing all the necessary operations for
querying and storing the data. The data itself will be in form of an
ADT/PoD structures, and any references between the data records will be
of a form of an ID (number, uuid, or a deterministic hash). Under the
hood, it typically closely resembles or actually is backed by a
relational database: Vectors or HashMaps storing bulk of the data by
Index or ID, some other ones for “indices” that are required for fast
lookup and so on. Other data structures like LRU caches etc. are also
placed there.

The bulk of actual program logic takes a reference to such DataStores,
and performs the necessary operations on them. For concurrency and
multi-threading, I typically glue different logical components via
message passing, actor-style. Example of an actor: stdin reader, input
data processor, trust manager, game state, etc. Such “actors” can be
implemented as thread-pools, elements of pipelines etc. When required,
they can have their own DataStore or share one with other “actors”.

Such architecture gives me nice testing points: DataStores can have
multiple implementations via polymorphism, and actors communicating via
messages can be instantiated separately and driven through test
sequence of messages.

The main point is: just because my software operates in a domain with
concepts of eg. Customers and Orders, doesn't mean there is any
Customer class, with methods associated with it. Quite the opposite:
the Customer concept is just a bunch of data in a tabular form in one
or more DataStores, and “business logic” code manipulates the data
directly.

Follow-up read

As many things in software engineering critique of OOP is not a simple
matter. I might have failed at clearly articulating my views and/or
convincing you. If you're still interested, here are some links for
you:
* Two videos by Brian Will where he makes plenty of great points
against OOP: [12]Object-Oriented Programming is Bad and
[13]Object-Oriented Programming is Garbage: 3800 SLOC example
* [14]CppCon 2018: Stoyan Nikolov “OOP Is Dead, Long Live
Data-oriented Design” where the author beautifully goes through an
example OOP codebase and points out problems with it.
* [15]Arguments Against Oop on wiki.c2.com for a list of common
arguments against OOP.
* [16]Object Oriented Programming is an expensive disaster which must
end by Lawrence Krubner – this one is long and goes in depth into
many ideas

Feedback

I've been receiving comments and more links, so I'm putting them here:
* [17]Quora: Is C++ OOP slower than C? If yes, is the difference
significant? [18]#programming [19]#oop [20]#opinion
__________________________________________________________________

published with [21]write.as

References

1. https://dpc.pw/
2. https://dpc.pw/about
3. https://dpc.pw/social
4. https://dpc.pw/open-source
5. https://dpc.pw/archive
6. https://en.wikipedia.org/wiki/Object-oriented_programming
7. https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpriseEdition
8. https://www.johndcook.com/blog/2011/07/19/you-wanted-banana/
9. https://en.wikipedia.org/wiki/Object-oriented_programming#Encapsulation
10. https://en.wikipedia.org/wiki/Abstract_data_type
11. https://en.wikipedia.org/wiki/Object-relational_impedance_mismatch
12. https://www.youtube.com/watch?v=QM1iUe6IofM
13. https://www.youtube.com/watch?v=V6VP-2aIcSc
14. https://www.youtube.com/watch?v=yy8jQgmhbAU
15. http://wiki.c2.com/?ArgumentsAgainstOop=
16. http://www.smashcompany.com/technology/object-oriented-programming-is-an-expensive-disaster-which-must-end
17. https://www.quora.com/Is-C++-slower-than-C-If-yes-is-the-difference-significant/answer/Simon-Hardy-Francis
18. https://dpc.pw/tag:programming
19. https://dpc.pw/tag:oop
20. https://dpc.pw/tag:opinion
21. https://write.as/