Posted by Uncle Bob
on 09/11/2009
For years authors and consultants (like me) have been telling us that functions should do one thing. They should do it well. They should do it only.
The question is: What the hell does “one thing” mean?
After all, one man’s “one thing” might be someone else’s “two things”.
Consider this class:
class SymbolReplacer {
protected String stringToReplace;
protected List<String> alreadyReplaced = new ArrayList<String>();
SymbolReplacer(String s) {
this.stringToReplace = s;
}
String replace() {
Pattern symbolPattern = Pattern.compile("\\$([a-zA-Z]\\w*)");
Matcher symbolMatcher = symbolPattern.matcher(stringToReplace);
while (symbolMatcher.find()) {
String symbolName = symbolMatcher.group(1);
if (getSymbol(symbolName) != null && !alreadyReplaced.contains(symbolName)) {
alreadyReplaced.add(symbolName);
stringToReplace = stringToReplace.replace("$" + symbolName, translate(symbolName));
}
}
return stringToReplace;
}
protected String translate(String symbolName) {
return getSymbol(symbolName);
}
}
It’s not too hard to understand. The replace() function searches through a string looking for $NAME and replaces each instance with the appropriate translation of NAME. It also makes sure that it doesn’t replace a name more than once. Simple.
Of course the words “It also…” pretty much proves that this function
does more than one thing. So we can probably split the function up
into two functions as follows:
String replace() {
Pattern symbolPattern = Pattern.compile("\\$([a-zA-Z]\\w*)");
Matcher symbolMatcher = symbolPattern.matcher(stringToReplace);
while (symbolMatcher.find()) {
String symbolName = symbolMatcher.group(1);
replaceAllInstances(symbolName);
}
return stringToReplace;
}
private void replaceAllInstances(String symbolName) {
if (getSymbol(symbolName) != null && !alreadyReplaced.contains(symbolName)) {
alreadyReplaced.add(symbolName);
stringToReplace = stringToReplace.replace("$" + symbolName, translate(symbolName));
}
}
OK, so now the replace() function simply finds all the symbols that need replacing, and the replaceAllInstances() function replaces them if they haven’t already been replaced. So do these function do one thing each?
Well, the replace() compiles the pattern and build the Matcher() Maybe those actions should be moved into the constructor?
class SymbolReplacer {
protected String stringToReplace;
protected List<String> alreadyReplaced = new ArrayList<String>();
private Matcher symbolMatcher;
private final Pattern symbolPattern = Pattern.compile("\\$([a-zA-Z]\\w*)");
SymbolReplacer(String s) {
this.stringToReplace = s;
symbolMatcher = symbolPattern.matcher(s);
}
String replace() {
while (symbolMatcher.find()) {
String symbolName = symbolMatcher.group(1);
replaceAllInstances(symbolName);
}
return stringToReplace;
}
private void replaceAllInstances(String symbolName) {
if (getSymbol(symbolName) != null && !alreadyReplaced.contains(symbolName)) {
alreadyReplaced.add(symbolName);
stringToReplace = stringToReplace.replace("$" + symbolName, translate(symbolName));
}
}
protected String translate(String symbolName) {
return getSymbol(symbolName);
}
}
OK, so now certainly the replace() function is doing one thing? Ah, but I see at least two. It loops, extracts the symbolName and then does the replace. OK, so how about this?
String replace() {
for (String symbolName = nextSymbol(); symbolName != null; symbolName = nextSymbol())
replaceAllInstances(symbolName);
return stringToReplace;
}
private String nextSymbol() {
return symbolMatcher.find() ? symbolMatcher.group(1) : null;
}
I had to restructure things a little bit. The loop is a bit ugly. I wish I could have said for (String symbolName : symbolMatcher) but I guess Matchers don’t work that way.
I kind of like the nextSymbol() function. It gets the Matcher nicely out of the way.
So now the replace() and nextSymbol() functions are certainly doing one thing. Aren’t they?
Well, I suppose I could separate the loop from the return in replace().
String replace() {
replaceAllSymbols();
return stringToReplace;
}
private void replaceAllSymbols() {
for (String symbolName = nextSymbol(); symbolName != null; symbolName = nextSymbol())
replaceAllInstances(symbolName);
}
I don’t see how I could make these functions smaller. They must be doing one thing. There’s no way to extract any other functions from them!
Uh… Wait. Is that the definition of one thing? Is a function doing one thing if, and only if, you simply cannot extract any other functions from it? What else could
“one thing” mean? After all, If I can extract one function out of
another, the original function must have been doing more than one thing.
So does that mean that for all these years the authors and
consultants (like me) have been telling us to extract until you can’t
extract anymore?
Let’s try that with the rest of this class and see what it looks like…
class SymbolReplacer {
protected String stringToReplace;
protected List<String> alreadyReplaced = new ArrayList<String>();
private Matcher symbolMatcher;
private final Pattern symbolPattern = Pattern.compile("\\$([a-zA-Z]\\w*)");
SymbolReplacer(String s) {
this.stringToReplace = s;
symbolMatcher = symbolPattern.matcher(s);
}
String replace() {
replaceAllSymbols();
return stringToReplace;
}
private void replaceAllSymbols() {
for (String symbolName = nextSymbol(); symbolName != null; symbolName = nextSymbol())
replaceAllInstances(symbolName);
}
private String nextSymbol() {
return symbolMatcher.find() ? symbolMatcher.group(1) : null;
}
private void replaceAllInstances(String symbolName) {
if (shouldReplaceSymbol(symbolName))
replaceSymbol(symbolName);
}
private boolean shouldReplaceSymbol(String symbolName) {
return getSymbol(symbolName) != null && !alreadyReplaced.contains(symbolName);
}
private void replaceSymbol(String symbolName) {
alreadyReplaced.add(symbolName);
stringToReplace = stringToReplace.replace(
symbolExpression(symbolName),
translate(symbolName));
}
private String symbolExpression(String symbolName) {
return "$" + symbolName;
}
protected String translate(String symbolName) {
return getSymbol(symbolName);
}
}
Well, I think it’s pretty clear that each of these functions is doing one thing. I’m not sure how I’d extract anything further from any of them.
Perhaps you think this is taking things too far. I used to think so
too. But after programming for over 40+ years, I’m beginning to come
to the conclusion that this level of extraction is not taking things too
far at all. In fact, to me, it looks just about right.
So, my advice: Extract till you just can’t extract any more. Extract till you drop.
After all, with modern tools it takes very little time. It
makes each function almost trivial. The code reads very nicely. It
forces you to put little snippets of code into nicely named functions.
And, well gosh, extracting till you drop is kind of fun!
Leave a response I like the 2nd and 3rd iterations best. By the time the last iteration is reached the code feels a lot less compact and succinct. I think that extracting ad nauseum on every method in a class would balloon the size of the class and make it hard to keep all in mind at once. P.S. It would be interesting to see patch sets with a commit after each change of the full class. Watch out! Uncle Bob’s coding style is infectious, and there is no going back if you catch the bug! Readable code has come to mean just that: code that reads naturally. Not just that code is easy to figure out, but that you can read it like English. What’s not to like? Aaron about 1 hour later: Following the flow of the data through the fully extracted version becomes difficult, since the developer will need to jump around constantly throughout the body of the class. If the goal is to make development and maintenance easier and fully extracting the class makes it more difficult for a developer to follow the flow of the data is it better to fully extract just for the sake of following a rule? As you mentioned, you were looking for support of the Iterator interface on the Matcher class. Creating a utility Adapter to the Matcher class would have moved the responsibility nextSymbol out of this class. But that isn’t my point. My point is that patterns in code are easier to see when things are not broken down into such small chunks. At the fully decomposed state it isn’t obvious that an Adapter on the Matcher would simply fit into place. By decomposing the methods so fine you lose context, so much so it isn’t evident how the method relates to the accomplishing the goal of the class. Two critical issues with over extracting: 1. Extraction takes time. In many cases the first working version of the code is a one, long, do-it-all, method. Only then comes extraction/refactoring. So when you extract you spend time which slows you down. Hopefully, in the long run this slowdown will be worth it due to ease of maintenance, less bugs, etc. Thus, the million dollar question is that of the breakeven point: What is the point after which the benefits of extraction will not match its slow down? No one can calculate the breakeven point, but nonetheless, it exists. Smart developers try to stay on the right side of this point. 2. A method offers a sense of encapsulation: client code can see its signature, but it cannot see/touch its internals. When a method is decomposed, its internals are exposed as new (extracted) methods. Even if the extracted methods are private, they are still visible within the containing class. Some of them will become protected. Soon, you will lose the ability to modify them since other parts of the code depend on them. In short: extraction implies less encapsulation which implies increased coupling. This does not mean you should never extract. It just means that you should weigh the consequences before you do so. Aaron about 2 hours later: Itay Maman: I agree with your second point. But your first point does not fly with me. Assuming you are using TDD (which you may not be) you should not wind up with a behemoth method that needs to be decomposed into smaller methods. The decomposition should occur while developing the method. Otherwise you will likely wind up using a ton of copy/pasted code in your test cases as you attempt to handle each branch to make sure the state is correctly setup. Why should functions do one – and only one – thing? What are we trying to achieve? My answer is readability AND maintainability. If extracting ad infinitum drives your code to less readability, just stop to extract methods. Kind Regards I’ve had a similar thought when following the Single Responsibility Principle: extract classes until you can not extract any more. In my current pet project (http://dimdwarf.sourceforge.net/) that has resulted in very small and focused classes. I feel good about the small classes. On the other hand, when a class begins to approach 200 LOC (Java), it begins to smell and I try harder to find behaviour that could be extracted into its own class. Often times I find something and extracting it improves the design. Here are some statistics of that project: “On 2009-05-21, the core module has 156 classes of which 75% are I’ve had a similar thought when following the Single Responsibility Principle: extract classes until you can not extract any more. In my current pet project (http://dimdwarf.sourceforge.net/) that has resulted in very small and focused classes. I feel good about the small classes. On the other hand, when a class begins to approach 200 LOC (Java), it begins to smell and I try harder to find behaviour that could be extracted into its own class. Often times I find something and extracting it improves the design. Here are some statistics of that project: “On 2009-05-21, the core module has 156 classes of which 75% are <50 LOC, and 15 classes are over 100 LOC, the longest file being 194 LOC.” I don’t want to state whether it’s too much / too little extracting. Each one to their own. From my humble experience (a lot less than 40 years), having small methods - really helps a ton in making OO stuff inherit, compose & collaborate without breaking your brain and cursing a lot.
- is the basis to ensure that developers will not miss Ctrl+C, Ctrl+V (I swear in my next project I will map both those shortcuts in the IDE to Alt+F4 for all team members).
Wedge about 4 hours later: If you find that extracting methods starts to generate too many methods, that’s probably a code smell that you need to start extracting classes from your new methods. I have rarely, if ever, found code that was over extracted (if that’s even possible). It’s far, far more common to have bloated methods and bloated classes. Shannon Cornish about 5 hours later: Good article Uncle Bob. I had a quick question regarding your replaceAllSymbols function. private void replaceAllSymbols() {
for (String symbolName = nextSymbol(); symbolName != null; symbolName = nextSymbol())
replaceAllInstances(symbolName);
}
Would a while loop be better in this case as you could remove the duplicate symbolName = nextSymbol()? Something like. private void replaceAllSymbols() {
while ((String symbolName = nextSymbol()) != null) {
replaceAllInstances(symbolName);
}
}
Not meant as a troll, just an honest question. Brian Slesinsky about 6 hours later: It looks like there’s a bug. Suppose we have these symbols and replacements: $a -> _$b_
$b -> c
Here are the results of translating two inputs. (I’ve only checked this mentally.) $a $b --> _$b_ $b --> _c_ c
$b $a --> c $a --> c _$b_
The order of the symbols in the input determines whether a symbol will replaced once or multiple times in the output. This is sure to be confusing to a user who stumbles onto it. Doing token replacement by repeated calls to String.replace() isn’t a good idea unless the replacements are hard-coded and you’re careful to think about whether order matters and test any edge cases. It’s good enough to HTML-escape something if you’re careful, but for anything more complicated, better to scan the input and build up the output using a StringBuilder. Refactoring the code to use smaller methods doesn’t help you find the bug; if anything it obfuscates it by introducing unnecessary abstractions. Better to spend time writing some tests and thinking about how to break it. Also, the API is a bit odd. From the caller’s point of view, passing the input string to the constructor doesn’t help with dependency injection, so I don’t see an advantage to an API that looks like “out = new SymbolReplacer(input).replace()” versus “out = SymbolReplacer.replace(input)”, and it leaves open the possibility that someone will call the constructor once and the replace() method multiple times, which would result in further bugs and possibly even race conditions if it were done in separate threads. (The use of non-final instance variables in what should be a pure function is a code smell.) It seems like a better approach would be to start out with a single static method, possibly extract other static methods for clarity once you’re sure the code works, and put off creating a real class until it’s more clearly needed. Tim Stewart about 8 hours later: Just curious. Where is getSymbol defined? Filip van Laenen about 14 hours later: I’m also wondering what getSymbol is, and how it is different from translate. It’s really interesting reading the stages that you went through from the original code to a stage where every function does exactly one thing. I like the step where you pulled out the nextSymbol() function the best. Question – how come you instantiate the symbolPattern and alreadyReplaced fields at the same time as the field definition instead of in the constructor as well? Is there some rule of thumb to know when each way is preferable? JonB about 19 hours later: If I had to guess at a rule of thumb from the example it would be something along the lines of “instantiate in the constructor if to instantiate you need the arguments of the constructor”. From memory, instantiations at definition time occur between the call to new in the client class and the start of the constructor in the class in question so it’s like they’re included at the start of the constructor without adding to the bulk required to get an understanding of what the constructor is doing. Note however that symbolPattern is declared final ergo it must be instantiated at definition time. Also alreadyReplaced is instantiated in the original example so “if it ain’t broke don’t fix it” potentially applies as well. If the sample had several instances of :- this.alreadyReplaced = new ArrayList();
this.stringToReplace = s;
symbolMatcher = symbolPattern.matcher(s);
in it then there’d be an argument to bring the alreadyReplaced instantiation inside the constructor and then extract all 3 lines into an initialiser function (resetSymbolPattern perhaps?) and then replace all occurences with that. JonB about 19 hours later: Note however that symbolPattern is declared final ergo it must be instantiated at definition time.
[cough] ok… must fact check next time before submitting. That statement is not true, you can instantiate final variables in the constructor. I’d probably consider it poor form but it is legal code. Nothing to see here :) “For years authors and consultants (like me) have been telling us that functions should do one thing. They should do it well. They should do it only.” A function by definition, returns 1 result from 1 input. If there’s no reuse, there is no “should”. Decomposition is for reuse, not just to decompose. Depending on the language/compiler there may be additional decision weights. What I see from the example is you’ve gone and polluted your namespace with increasingly complex,longer,more obscure, function name mangling which could have been achieved (quick and readable) with whitespace and comments. To mirror a previous poster, I rather see a javadoc with proper commenting than to trace what’s going on for such a simplistic case. I’m afraid to ask what you plan to do when the class is more complex and symbolExpression(..) isnt descriptive enough! Code review often allows other developers to point out “that’s a bit complex”. This doesnt mean “because you can decompose, you should”, but that you need better documentation. Name mangling is not enough. That’s just my experience with very large applications. While I agree with decomposition is general, I find this excessive for three more reasons: 1. When you read such a decomposed piece of code you find yourself constantly jumping from one place to another in order to understand what’s going on. Having more stuff in one place is easier to understand: our brains are very good at quickly analyzing complex pictures, but bad at following links. 2. Excessive decomposition not only increases the amount of code, but also the number of parameters that need to be passed to helper functions. In the all-in-one-method solution, these parameters are just in scope, so do not need to be explicitly declared. 3. Perhaps most importantly: I often find excessive decompositions very unstable: a requirement change that would result in a small change in the all-in-one solution, often requires a significant refactoring of the decomposed one. One might say that excessive decomposition tries to extracts a structure that is not really there. This code is smelly to me: String replace() {
replaceAllSymbols();
return stringToReplace;
}
The Composed Method pattern says that all operations in a method should be at the same level of abstraction. But I think that one more principle should be added to it: all operations in a method should be at a lower level of abstraction than the name of the method. In the above code “replace” and “replaceAllSymbols” are at about the same level of abstraction. Also “replace” does not describe the method as well as “replaceAllSymbols”, so it would be better renamed to “replaceAllSymbols”. I would prefer the “replace” method to be changed to: String replaceAllSymbols() {
String symbol;
while ((symbol = nextSymbol()) != null)
replaceAllInstancesOf(symbol);
return stringToReplace;
}
...and with some refactoring, it would read better as: String replaceAllSymbols() {
for (String symbol : symbolsToBeReplaced())
replaceAllInstancesOf(symbol);
return stringToReplace;
}
Brian Slesinsky: Thanks for finding that potential bug. It’s not an actual bug because in the context of the system in which it was written, symbols can’t contain other symbols. Still, it’s a good spot, and something I should fix. The API remark is also interesting. This class actually came about through an “Extract Method Object” refactoring. You are probably right that this makes it feel strange. Reminded me of a quote by Antoine de Saint-Exupery: “A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away.” Had he been into software, he might have said “Extract till you Drop”! Overall, I agree that extracting smaller methods from bigger ones is a good idea. It improves readability at different levels of abstraction, encourages reuse, and makes it easy to spot subsequent refactorings (move method, extract class, etc). I cannot agree with some of the comments on this post that suggest javadocs or inline comments instead of decomposing methods: comments can quickly get outdated or become an unnecessary overhead. http://c2.com/cgi/wiki?CommentCostsAndBenefits Along with extracting methods, I believe the inverse “Inlining methods” is also important at times while refactoring. Depends on where the refactorings take you. As for this code example, like someone else pointed out, it can lead to unexpected results (at least difficult to track) depending on how many times replace() is called. I’d much prefer such “verb classes” be immutable. I do still prefer that the original string be made a field (final one) injected via a constructor, rather than have a static method that takes the string as an argument, because static methods when decomposed lead to most of the decomposed methods also having the same arguments as the original, and the values need to be passed around (instead of directly being referenced as a field). I think a functional style is more elegant. an Erlang implementation: -module(sr). -export([replace/1]). replace(Str) -> {ok,MP} = re:compile(”\\$([a-zA-Z]\\w*)”), {match,Matches}=re:run(Str,MP,[global]), replace(Str,[{B+1,L} || [_,{B,L}] <- Matches]). replace(Str,Indices) -> replace(Str,Indices, 1, []). replace(,[],,Acc)-> lists:append(lists:reverse(Acc)); replace(Str,[{B,L}|T],C,Acc)-> S1=string:substr(Str,C,B-C), S2=string:substr(Str,B,L), replace(Str,T,B+L,[get_replacement(S2),S1|Acc]). get_replacement(Symbol) -> .... I think a functional style is more elegant. an Erlang implementation:
-module(sr).
-export([replace/1]).
replace(Str) ->
{ok,MP} = re:compile(”\\$([a-zA-Z]\\w*)”),
{match,Matches}=re:run(Str,MP,[global]),
replace(Str,[{B+1,L} || [_,{B,L}] <- Matches]).
replace(Str,Indices) ->
replace(Str,Indices, 1, []).
replace(,[],,Acc)->
lists:append(lists:reverse(Acc));
replace(Str,[{B,L}|T],C,Acc)->
S1=string:substr(Str,C,B-C),
S2=string:substr(Str,B,L),
replace(Str,T,B+L,[get_replacement(S2),S1|Acc]).
get_replacement(Symbol) -> ....
Hej, Uncle Bob, Thanks for another, fine post. I hope you won’t be too bored if I make two points. Firstly, I tend to agree with Itay Maman’s post. Encapsulation theory tells us that a benefit of encapsulation is that it minimises the number of potential dependencies with the highest probability of modification event propagation [1]. In his landmark paper on cyclomatic complexity, Thomas McCabe describes source code where, “Each node in the graph corresponds to a block of code in the program where the flow is sequential and the arcs correspond to branches taken in the program.” Thus, we can view a method as encapsulating blocks of sequential code, where only the first block (the default entry-point into a method) is accessible from outside that method, all other blocks being information-hidden. If all your methods have only one block, then none of the blocks is information-hidden. With n such blocks (and thus n methods), the number of potential dependencies between your blocks will be n(n-1). In your case n=9; so the, “Potential coupling,” as it’s known, will be 9(9-1)=72. It’s possible to calculate the number of methods to minimise this potential coupling; this minimisation occurs when n=3, i.e., with 3 methods (which you had originally, though for minimisation, the blocks must be evenly spread over the 3 methods). At minimisation, the potential coupling will be 36. If that all sounds like EXTREME, nerdy overkill, then, it is: for a small problem like the one you posted, the semantics will probably dominate; but it is something to think about before making all your blocks accessible to one another within a class. Secondly, consider how your idea might scale. It’s easy to view programs as a graph stack; a graph here being a region encapsulating nodes. On the bottom, sequential blocks are encapsulated into methods. At the second graph, methods are encapsulated into classes. At the third graph, classes are encapsulated into packages. The point is that there’s no reason why the rules for encapsulating one graph should be very different from any other. Thus if you think that, “Doing one thing only,” implies decomposing all methods in single blocks, then does that mean that all classes should contain only one method, and that all packages should contain only one class? I’m sure you do not think so, and there may be good reason why this advice holds for block/method graphs but not for method/class or class/package graphs; again, I just wanted to offer some food for thought. Thanks, again, for the fine post. Regards, Ed Kirwan. [1] http://www.edmundkirwan.com/encap/l2/paper6.html Great post. I had wondered in my own code if I was often going too far with small methods. I felt the results worked well, but its often hard to communicate the benefits with others who haven’t given it a try. I found this part particularly interesting: “Perhaps you think this is taking things too far. I used to think so too. But after programming for over 40+ years, I’m beginning to come to the conclusion that this level of extraction is not taking things too far at all. In fact, to me, it looks just about right.” For those you disagree with the approach, I would be curious to hear whether you had actual negative results with a project of decent size? Again, notice that Uncle Bob also used to think it was excessive, but changed his mind after using it for a while… Just something to think about. I wanted to post more here:http://chrismelinn.wordpress.com/2009/09/14/extract-method-how-much-is-too-much/ Thanks again! @Chris Melinn No, so far my problems are twofold: a) Methods that are too large and mungified b) Methods that are split badly (chronologically, etc) I find the Table Of Contents method to work pretty well. Extract method seems to be the simplest refactoring to think of and the most benfical one – people who use it say it let them reclaim their code :). Yet because you need good names for your methods as well as to organize them in the source wisely it becomes more tricky for some people, escpecially those whose native language is not English – so it’s sometimes costs more effort. If decomposed code is to be more readable for a new programmer naming and organization is critical. You NEED to take a potential reader perspective. It’s not emphasised enough in my opinion. And here I thought Tomas Malmsten was a loon and had misread Bob’s book. Now I’m not so sure. I’ll disregard the high cost of doing “contains”-tests on a list. That’s easily corrected. What I cannot let slide is the “stringToReplace” clearly a temporary variable spilled over into a field to satisfy Bobs unquenchable thirst for splitting methods into atoms. Without seeing the usage It’s hard to tell for sure, but it looks like the “alreadyReplaced” list is a moronic cache to cover for the fact that the symbolPattern matcher is still happilly working on the original string, while String.replace is a hammer which replaces everything in one blow. It is never needed or used for anything else, which means that all of this nasty state can fit neatly into a static utility method if not for the authors perversions. if (shouldReplaceSymbol(symbolName)) replaceSymbol(symbolName); Is probably a fun read for a non programmer but it’s pointless filler. Kudos to you for programming for 40+ years. Have you considered when It’s time to stop? @Tomek It is certainly emphasized in the first few chapters of the Clean Code book. I agree that extracting smaller private methods helps with readability at varying levels of abstraction (at least, personally I prefer this, rather than having to deal with looping contructs or nested if’s when I want to quickly know what a method deals with). Additional advantages include reduced duplication, and quick identification of opportunities to apply other refactorings like Move Method, Extract Class, etc. I’d also like to highlight that the inverse refactoring of Inline Method is almost as important. That is helpful again as one goes about refactoring code. I disagree with some comments on this post that suggest documentation or inline comments as a better alternative to extracting methods. In my experience, comments quickly begin to lie (get outdated or be totally wrong because of a copy/paste + code modification): http://c2.com/cgi/wiki?CommentCostsAndBenefits As far as the code example of this post goes: it makes me uncomfortable that the field stringToReplace is modified in place by the public methods. These can easily lead to unexpected/uncontrolled changes as someone else pointed out (by repeated calls to replace()). In my experience, “verb classes” are best immutable. I still prefer injecting the string to operate upon in the constructor and kept as a field (just final) over being passed as an argument to a static method. The problem with the latter is that when you extract smaller methods, the original argument will need to be passed around too as those new methods will have to be static too and can’t refer a field. Henrik Gustafsson 5 days later: Personally I would prefer an approach such as this:
public class SymbolReplacer {
private final Pattern pattern = Pattern.compile("\\$([a-zA-Z]\\w*)");
private final SymbolLookup lookup;
public interface SymbolLookup {
String lookupValue(String symbol);
};
SymbolReplacer(SymbolLookup lookup) {
this.lookup = lookup;
}
public String replace(CharSequence stringToReplace) {
final Matcher matcher = pattern.matcher(stringToReplace);
return buildReplacement(matcher).toString();
}
private StringBuffer buildReplacement(Matcher matcher) {
final StringBuffer buffer = new StringBuffer(matcher.regionEnd() - matcher.regionStart());
while (matcher.find()) {
appendMatch(buffer, matcher);
}
matcher.appendTail(buffer);
return buffer;
}
private void appendMatch(StringBuffer buffer, Matcher matcher) {
if (shouldExpandMatch(matcher))
appendExpandedMatch(buffer, matcher);
else
appendOriginalMatch(buffer, matcher);
}
private String getMatchExpansion(Matcher matcher) {
return lookup.lookupValue(matcher.group(1));
}
private void appendExpandedMatch(StringBuffer buffer, Matcher matcher) {
matcher.appendReplacement(buffer, getMatchExpansion(matcher));
}
private void appendOriginalMatch(StringBuffer buffer, Matcher matcher) {
matcher.appendReplacement(buffer, Matcher.quoteReplacement(matcher.group(0)));
}
private boolean shouldExpandMatch(Matcher matcher) {
return lookup.lookupValue(matcher.group(1)) != null;
}
}
It avoids mutable state making it thread safe and allows reuse of the same instance for many strings. It also allows injection of the symbol dictionary for testing, and using the matchers built-in functionality for replacements to avoid the replacement mis-features that have been highlighted above :) @Angry Man your tone is disrespectful. While you may or may not have a valid point, your point is lost in your vitriol and disrespect. Everything can be overdone. While I agree that simple expressions are better than complex ones, that shorter statements are more readable than longer ones, and that short methods are more readable than long ones, the old addage of “everything in moderation” still holds true. Let one method do one thing, sure. But let that one thing be specified at the correct level of abstraction. Do one thing in the problem domain, even though that one thing may be divided into a handfull of statements at the implementation level. The first piece of code on this page is to me the simplest and most readable by a clear margin, even though Henrik Gustafssons implementation is better. “A shorter class with fewer methods is better than a long class with many methods” might be the missing rule in Bob’s strategy. The ideal is not to have all methods containing only one statement. Natural language programming has pretty much failed because natural language is ill equipped to provide the unambiguity required by computers. I question the need to emulate it in Java by maximizing the number of verbosely named methods. Splitting everything into tiny fragments creates wonderfully succinct methods which are easilly understood, but the big picture of what replace() or even the entire class tries to accomplish is obfuscated. @Sam for you convenience I’ve placed any disrespectful comments at the end: It’s not vitriol, it’s bile. A direct result of the gastrointestinal discomfort provoked by Bob’s code. It is more vile to me than any words I can think of. As many have noticed, this sort of “extraction” makes the complete code harder to follow. Just look at the difference between the original code and the final one. The final code is a lot messier, as if Java wasn’t verbose enough as it is. What is particularly bad in this post is the fact that Bob Martin doesn’t even try to justify this kind of approach by arguing how it benefits the programmer. He’s “splitting functions for the sake of following a rule”, as Aaron pointed out. Perhaps all that butchering improves readability for non programmers, but really, can anyone imagine working with an API that splits every single abstract idea you want to code into 5 times as many functions as necessary? Abstraction goes down the drain with such an approach, and saying “I’ve been programming for 40 years, I know what I’m doing” is no substitute for rational argument. I think Angry Man really nailed this down. I quote: “A shorter class with fewer methods is better than a long class with many methods”. Maybe they should come up with some easily digestible “principle” (which are not really principles, but guidelines) to take that fact into account. In my experience, if a class needs to do complex internal work, it should have a set of private/protected functions. Those functions, however, should be split into logical units within the highest possible level of abstraction. That makes them easy to understand and work with. It makes things as complex as they have to be, but no more. What Martin seems to be advocating is the complete opposite. IMO, extracting methods to the depth that Uncle Bob describes doesn’t hurt the follow-ability of the code. As a developer, we owe it to ourselves to make our lives easier. Extracting methods allows us to think at a level of abstraction that is appropriate to the level of the method. We’re cutting out the low level details and putting them where they belong: a concise method that is easy to refer to if we need that level of detail. Angry Man is angry 18 days later: @Sam you can ignore this one altogether. @Stacy well you’re a moron. And so is Unclebob, and it’s about time he was exposed as the pathetic amateurish fraud he is. Unfortunately he is Emperor among the people who have no business writing code in the first place, of which i suspect you belong. I guess I don’t have to tell you what kind of clothes I believe the Emperor is wearing? For those who have yet to drink the kool-aid, Take a look at how they are working throughhttp://butunclebob.com/ArticleS.UncleBob.ThePrimeFactorsKata inhttp://www.vimeo.com/2499161 It is a sad exercise in misuse of Test driven development. See if you can comprehend the level of insanity that’s going on there. Take note of Bob’s answer on how he knows he’s done at #9. It at about 6 minutes in. It’s cargo cult programming masqueraded as philosophy. Insight is claimed but it is false, imagined insight. It is like insight of the paranormal. The strange thing is that when you read Bob’s articles a lot of the time they seem perfectly resonable. Fun fact: Bob’s prime factor algorithm performs 2x as many divisions as necessary. PatrickWelsh 18 days later: Every class has at least two DSLs (Domain Specific Languages) going on: 1. What I am Trying To Do Here, (the “What” DSL) 2. How I Have Implemented That (the “How” DSL) To me, the main reason to learn to “extract till you drop” is that we need to keep those DSLs separate. What you are “dropping” is the conflation of those 2 DSLs. The “How” DSL is all about the tips, tricks, and workarounds of the language in question. Matchers and StringBuilders an whatnot. That’s important, but less important than the “What” DSL. This is because, when it is time to extend or maintain SymbolReplacer, it is the local semantics of the “What” DSL that usually is being changed. We need to get a handle on “What” the class is doing quickly, before we dive into the “How.” The nextSymbol() extraction, for example, makes replaceAllSymbols() more expressive, not less. It hides silly Java implementation cruft behind a locally-useful, extensible new noun. It pushes a bit more “How” away, refining the “What” a bit more. Personally, I will gladly pay the tax of a few more methods, in exchange for quickly being able to wrap my head around that “What” DSL quickly. Uncle Bob is right. Extract till you drop. Extract till your drop every bit of coupling between your “What” DSL and your “How” DSL. Angrier Man 18 days later: @PatrickWelsh you’ve split the how into grains of sand. Replace whatever grain you want but what you’ve actually done is semented the algorithm into stone. You have defined the algorithm explicitly, not only in the written code, but in the nouns of the methods and how they call each other. You have done the exact opposite of what you claim to. It is impossible to change the how without pretty much changing every method below the one you are working on, because that is the only way of making a change that actually changes anything at all. The thinly sliced “hows “are reduced to the point that they cannot be anything else and still be part of the same context, and still they depend on class fields providing state making the whole exercise even more pointless. Buck nekkid I tell you. Buck nekkid. @Angrier Man You actually have a point. Lots of tiny methods may be make code readable, but they definitely make changes harder. Good. Perhaps you are now ready to drop all side effects and go fully functional? After all, the benefits of that have been mathematically demonstrated a long time ago. Forgive me for seeming to go off on a tangent for a short while (all will become clear soon)... The way we structure code is directly related to how we structure solutions within the mind. Psychologists model memory as a network of chunks (a chunk is defined as “memory elements that have strong associations with each other but weak associations with elements in other chunks”). Each node on the network has an image associated with it, this can be a word, a visual image etc. (anyone familiar with mind maps will recognise this structure). This is detailed more in the paper I presented at OOPSLA this year (apologies for the shameless self promotion), see http://www.chunkinganalogies.com for the presentation (with script) and the paper. When we write code, we group together elements that are strongly associated (the cohesion/coupling principle). Functions & methods are a way of grouping such elements and providing an image (method name). Saying that a function should do one thing is the same as identifying it as a chunk. However, using the indirection of a function has time and capacity penalties for our limited cognitive process (see the section “Code for Experts and Novices” in the paper). We can use statement structure and comments to identify chunks and provide images rather than revert to formal functions (and the associated indirection cost). I would envisage that each person’s view of the most appropriate level of extraction is different and probably primarily dependent upon programming expertise. Tom ChaosEngine 6 months later: The best for me are 1st, 2nd and maybe 3rd iterations. The last is by far too divided and complex to understand by human. Usability and simplicity is much more important in that piece of code
|
|