You’re All Doing Entity Framework Wrong – Michael Hoagland – Medium

advertisement
Applause from you and 59 others
Michael Hoagland Follow
May 8, 2017 · 11 min read
You’re All Doing Entity Framework Wrong
Preface
This is a conversational talk about a deeply problematic trend I’ve seen
with Entity Framework utilization across organizations small and large
with teams of juniors to teams of architects. This isn’t a how to. This
also isn’t for juniors. If something sparks a thought or you’re curious
about something I mention, Google is your friend. This is also my rst
blog entry. Critique is welcome.
History and Feature Introduction
by Version
First, let’s simply review the feature rollout of EF over time. This isn’t
exhaustive by any means and certainly doesn’t list things out by
updates to a major version. It simply serves as a reminder of the story so
far with EF.
EF / EF 3.5
· DB First
4
· Lazy Loading
· Migrations
· POCOs
5
· Enums
· Spatial
6
· Async
· Interception
· Logging
· NuGet Installation
· Recovery
7 / Core
· Code First Only
· In-Memory Support
· Limited Batching
· Nonrelational Support
Seeing that kind of fractured rollout and Microsoft’s general reputation
in the development space, there’s little surprise Entity Framework has
gotten the bad wrap it has not that it excuses the issues I’m going to
cover. Features pop into existence seemingly at random depending
which minor version you’re working on. So, you get used to something,
go to another environment, make claims, try it out on their existing
framework even though the same major version is installed, it doesn’t
work, and you get that “told you so” look which only serves to further
deepen already entrenched positions.
Basically, the typical story with EF goes something like this:
Senior Person: “Let’s use EF and the repository pattern!”
Other Devs: “Idk, haven’t heard good things about it.”
Senior: “No, it’s great! See this example?”
Devs: “Hmm, OK.”
At rst, it works somewhere between acceptable and great. However, as
it grows, the slowness sets in and people grumble. Because of the
utterly poor state of paying technical debt in our industry or due to
wholly refusing the see the technical debt in the repository pattern to
begin with, whole departments of supposedly smart people fold their
arms and simply come to conclude that EF is garbage, not that their
usage of it is garbage. I’m here to make the case it’s more the latter and
to show you how to avoid that trap.
Not an Abstraction
Early on in my career, I started using classic ASP and SQL Server
directly via ADO. I worked in an extremely small web department so I
often had to dive into the database myself to create tables and do tasks.
I got quickly familiar with all the nobs of SQL Server in a urry of
copy/paste deployments, testing in production, and so on. “What about
this? Nope, that product page still doesn’t load. What about that?! Nope,
still doesn’t load. C’mon… this?? Success!” And through almost literally
stumbling around in the dark I got quite intimate with indexes, views,
replication, security permissions and so on even barely out of high
school.
Enter my rst few places with more structured environments and my
rst blush with Entity Framework. It was completely devoid of any of
the options I was used to. So, I jumped aboard but it didn’t take long for
the grumbles to set in. In case you have the memory of a gold sh, let
me reiterate my being used to being able to tinker at will with all the
levers. When there were problems, I would investigate. Often, I found
critical components wholly ignored such as index utilization. When I
would bring these problems up, I got told EF was at fault for not
knowing how to utilize them and that we were here to work on
business problems and not do Microsoft’s job for them. I was still
largely a junior dev, if I’m honest. Who was I to disagree?
The Repository Pattern is a Throne of Lies
Enter the repository pattern. The problem with the repository pattern is
two-fold. First, it requires you to declare up front how your application
will be bound to interact with the database. Even if you build these
super complex methods that let you pass in expressions, dictionaries, or
shudder dynamics, and you get inventive, all you’ve accomplished is
creating a maintenance nightmare.
“But the callers can de ne what they need!” No, they can’t. Sure, they
get to point at an entity and generally de ne the shape of the data to be
selected, but they can’t determine things like eld selection. They have
zero way to say they need to load the data up front in a friendly way or
delay it. They can’t say in one instantiation that they also need data
from here or there but in the next only go after the targeted entity
unless the blessed repository lets them do so. Instead, you get these all
or nothing decisions that you chain your applications with and we
wonder why it quickly degrades. I sure hope your crystal ball is better
than mine.
Second, even Microsoft’s own examples don’t use the proper interfaces
that some intern probably coded anyway. Therefore, I say -everyone- is
doing EF wrong. The common wisdom with EF is to use the repository
pattern and since the repository pattern’s own documentation and
examples aren’t correct, then nobody is letting EF do what it was
designed to do as the source of knowledge is poisoned. In the face of
this, I’ve heard lots of complaints from lots of people about the
examples of MVC tutorials that butt right up to and utilize the
DbContext directly complaining that it’s not SOLID, not that hardly
anyone does SOLID either but that’s another blog post. (Most software
jumps straight to ID and ignore the rest.)
Let the Database Be a Database
SQL Server, because it’s the most common backed data store used with
EF, is not a clean piece of software. It’s messy. It has a TON of features
for a TON of scenarios. If you want to let your applications actually use
even a fraction of what you’re paying those huge licensing fees for, stop
constraining SQL to an EF driven hellish wasteland of something little
better than SELECT *. Then, we like to complain when things get slow.
If you don’t let EF utilize features for the correct scenario, you can’t
possibly realize the potential of your platform. There must be billions of
dollars wasted in licensing and development costs that only ever see
single digit utilization out of SQL Server in terms of distinct features
used even as applications grow in wildly di erent paths. This is a gut
feeling, but seeing the stupidly naïve repository implementations I’ve
seen out of companies small and large, I nd it hard to see me being
grossly incorrect here. This is a disservice to ourselves, to our
employers, and to each other.
Entity Framework is still locked, step by step, to the way the underlying
data store works. In SQL Server, this means join performance, view and
index utilization, stored procedure calls, and so on. This like calling a
latex glove on a hand an abstraction for a hand. It’s not and neither is
EF an abstraction for the storage mechanism it relies upon. It is instead
a set of common APIs that let us access data in a uniform way. This is
not an abstraction for the very reasons I just stated in that we can not
deny or mitigate the behavior of the underlying implementation in any
way. Therefore we must account for those behaviors in our code
breaking the abstraction either explicitly or implicitly. The only thing
we can do if we want to pretend it is an abstraction is to bury our heads
in the sand and simply continue to groan when things get clumsy.
Most recently, I had -architects- almost in awe at the suggestion to let
the database de ne views and to point EF at the views instead of tables,
you know, letting DBAs actually do their job and to give the database
the ability to change without breaking application code. This isn’t hard
stu , but the problem is endemic so most can’t see past their noses in
environments they’re too familiar with. So, what do we do about it?
Use IQueryable; Not IEnumerable
The rst step to using Entity Framework correctly is to break the love
a air with IEnumerable. It’s simply bad when talking about
disconnected stores. The only thing IEnumerable gives us is delayed
execution. If that’s the only feature you want out of your ORM, then
you don’t need an ORM. The thing that makes IEnumerable insidious to
working with data stores is that they are pinned once and for all time in
their representation. Even as an application grows, even as repositories
get new methods added to them, the old implementations returning
IEnumerable are blind, deaf, and dumb to the new world they live in.
You are literally forcing your code to work with your data layout and
expectations as it was when it was rst implemented years ago. This is a
developer’s fault yet the blame gets levied upon EF.
IQueryable, however, can morph and change to its given context. Even
when passed around and clauses added to it, it can evaluate instance
for instance the needs of the individual call. The DbContext can still
retrieve entities from the cache if it has already fetched the data before
giving very fast speeds to repeated calls making hot paths a bit cooler.
Even more, it exposes features such as letting us stream data if the
underlying provider supports it, loading data without needing
instantiate List objects to be more heap friendly, inspecting the
underlying type so we can make smart decisions in complex work ows,
accessing the underlying context, and so on.
These are all features that let your code actually understand what’s
going on without breaking the abstraction barrier since EF is not an
abstraction. The abstraction, by the way, should be the component that
EF is being used in, not EF itself. I scratch my head at the many
discussions we programmers have where some need is expressed but
we balk at many solutions out of hand in the name of “abstraction” and
the ensuing hoops we gleefully contort ourselves in just so we can
continue the delusion of being SOLID.
Get Comfortable with Anonymous Types
Perhaps the single biggest complaint I’ve heard about EF is how much
damn data it retrieves. Who de ned the entities? EF? No! You did. You
can’t much be blamed, per se, as an entity per table approach seems to
be all anyone can see. Still, we don’t need to be hindered by entities, no
matter how large. Passing an anonymous type to an EF query will cause
EF to only select the elds you de ned. That monster table that is
dozens of columns and “can’t be refactored” can be chopped down to
the 3 or 4 elds you actually need. The fascination with selecting whole
entities at a time and pretending there’s nothing to be done about it can
only be described as a form of mass hysteria where we plug our ears
and shout “I can’t see you!”
Use the Right Tool for the Job
You know all those Microsoft Press books with the various tools on the
cover? There’s a reason to that, you know, beyond some person just
picking a random image. Most of the tools aren’t just a screwdriver or
planer. There are some truly odd ones that don’t have obvious
applications, but, assuredly, they have their purpose and they excel at
it. The mantra of “right tool” is often repeated but we don’t really pause
to really think about the job let alone the tool for it. Here are a few for
EF.
SqlBulkCopy for the Win Since .Net 2.0
Another large complaint that’s a close runner up to the amount of data
EF retrieves how it supposedly can’t handle large amounts of data. I
love developers’ dualities. I would have you know, combining
AsStreaming as talked about below, reactive extensions, and
SqlBulkCopy, I can retrieve, transform, and push millions of records a
minute without breaking a sweat creating a perfectly good ETL solution
that is completely code based for any workload from small up to the
crest of moderately large, say 5–10 billion records, and still have good
performance. If you need more, there are more specialized tools.
However, don’t say Entity Framework can’t handle large amounts of
data. Your code can’t handle large amounts of data. EF is ne. The sad
part is we’ve had SqlBulkCopy since 2005 yet we pretend there is this
large hole in our toolbox. The problem is already solved. There is zero
reason to reinvent the wheel. Guess what? It supports streaming too!
AsTracking vs. AsNoTracking
I feel like I’m a broken record. Yet another large complaint about EF is
its caching of data. You could almost always tell the DbContext to get
rid of cached entities. Recently, though, we gained the ability to set that
as the default policy in Entity Framework Core. Instead, we can be
selective about what we want to track rather than what we don’t. There
is one annoyance that I will gladly acknowledge is that you still need to
detach entities.
AsStreaming
Queries in Entity Framework normally bu er all the results before
returning. Streaming gets around that and immediately lets you start
processing data as it enters your application. You can both start work
more quickly and be more memory friendly to your servers.
Special Snow akes
There is a disturbing trend I’ve seen among developers. There is a lack
of desire to explore and invent. We want out of the box solutions that
“just work” while being ignorant of the details. We still believe in the
magic of the unseen even though code is not magic.
The general approach I would take is instead of writing these super
pinned down repositories is to build extensions that let our applications
behave in the unique ways we need them to. Want the bene ts of
cached data in a moderately long running process but not have it
persist outside of that given operation? Sounds like a perfect extension
method to DbContext to me that takes some entities, processes them
gaining the bene ts of caching, and then clears the cache before
returning. Another extension method would be one that detaches all
those entities after an operation is complete.
Don’t Fear the Reaper
I’m talking about the DbContext here because that’s how many people
treat it. It’s seen as the big, bulky, unwieldy thing that will steal your
kids if you’re not careful. We go through extraordinary lengths to keep
the DbContext’s existence known to only a select few components. This
strangles our implementations to repositories even further. Since we
must go through the repository to get any kind of data, we need to
violate the Open/Closed principle on a regular basis as change occurs
or be forced to accept the tradeo of the bloat of decisions the
repository dictates and be extra careful when we call to it.
Let the DbContext out. If a module needs data, don’t delude yourself
that the DbContext isn’t a dependency already. I can promise you if you
get comfortable with it being accessible and take away the mysticism of
“what if people make mistakes?!” gasp! it will actually make us better as
a whole. If someone can commit naughty code and it makes it to
production untested at least once, you truly have no release control or
quality checks. Policies like hiding the DbContext are stopgaps on an
already bleeding wound in your organization that does nothing to
actually mitigate the real problem.
No More Excuses
We programmers need to stop acting like solutions to the problems we
face are ever evasive or somehow must be divined using the right mix
of node.js and dapper, not that they don’t have their legitimate uses,
but they are often scapegoated at the expense of Entity Framework
when it is a ne tool for what it does. The tools we’ve had for a decade
now have been su cient for most of our needs. It is the culmination of
bad decision after bad decision that has led us to the straight-jacket
we’re in. Get comfortable with your tools. Try new things. One thing is
for sure, we only have ourselves to blame.
Download