F# How should I think about delimiting items in sequences?

  • A+
Category:Languages

Apologies for a rookie question. I'm trying to change my mental paradigm from procedural to functional.

For instance, suppose I have a list of names that I want to print like this "John, Paul, George, and Ringo." But this code does not satisfy:

let names = [ "John"; "Paul"; "George"; "Ringo" ] names |> Seq.iter (fun s -> printf "%s, " s) 

My procedural instinct is to seek a way to insinuate a predicate into that lambda so that it can branch between ", " or ", and " or ". " depending upon where we're at iterating the sequence. I think that's wrong, but I'm feeling around for what's right.

Would it be better to split the sequence in parts?

In this case it seems that we want to split the sequence into parts corresponding to distinct delimiter behaviors. We want to split it at the end, so we can't use Seq. But we can use List.splitAt instead.

let start, ending = List.splitAt (names.Length - 1) names let penultimate, last = List.splitAt 1 ending start |> Seq.iter (fun s -> printf "%s, " s) penultimate |> Seq.iter (fun s -> printf "%s, and " s) last |> Seq.iter (fun s -> printf "%s. " s) 

Is this a righteous approach? Is there a better solution I've overlooked? Am I thinking along the right lines?

 


The general approach I take to tackle these kind of problems is to split them into smaller parts and solve individually:

  • an empty list [] results in ""
  • one element ["a"] results in "a."
  • two elements [ "a"; "b" ] result in "a and b."
  • more elements (that is a :: rest) result in "a, " + takeCareOf rest, where takeCareOf follows above rules. Note that we don't need to know the length of the full list.

Above recipe directly translates to F# (and functional languages in general):

let rec commaAndDot' = function | [] -> () | [ a ] -> printfn "%s." a | a :: [ b ] -> printfn "%s and %s." a b | a :: rest -> printf "%s, " a; commaAndDot' rest 

Are we done yet? No, commaAndDot' violates the Single Responsibility Principle because the function implements our 'business logic' and prints to the console. Let's fix that:

let rec commaAndDot'' = function | [] -> "" | [ a ] -> sprintf "%s." a | a :: [ b ] -> sprintf "%s and %s." a b | a :: rest -> sprintf "%s, " a + commaAndDot'' rest 

As an additional benefit we can now call the function in parallel and the output does not get mixed up anymore.

Are we done yet? No, above function is not tail-recursive (we need to compute commaAndDot'' rest before concatenating it to the current result) and would blow the stack for large lists. A standard approach to fixing this is to introduce an accumulator acc:

let  commaAndDot''' words =     let rec helper acc = function     | [] -> acc     | [ a ] -> sprintf "%s%s." acc a     | a :: [ b ] -> sprintf "%s%s and %s." acc a b     | a :: rest ->  helper (acc + sprintf "%s, " a) rest     helper "" words 

Are we done yet? No, commaAndDot''' creates a lot of strings for intermediate results. Thanks to F# not being a pure language, we can leverage local (private, non-observable) mutation to optimize for memory and speed:

let  commaAndDot words =     let sb = System.Text.StringBuilder()     let rec helper = function     | [] -> sb     | [ a ] -> sprintf "%s." a |> sb.Append     | a :: [ b ] -> sprintf "%s and %s." a b |> sb.Append     | a :: rest ->         sprintf "%s, " a |> sb.Append |> ignore         helper rest     helper words |> string 

Are we done yet? Probably... at least this is something I would consider idiomatic F# and happily commit. For optimising further (e.g. Appending commas and dots separately or changing the order of the patterns) I'd first write micro-benchmarks before sacrificing readability.

All versions generate the same output:

commaAndDot []                          // "" commaAndDot [ "foo" ]                   // "foo." commaAndDot [ "foo"; "bar" ]            // "foo and bar." commaAndDot [ "Hello"; "World"; "F#" ]  // "Hello, World and F#." 

Update: SCNR, created a benchmark... results are below as a HTML snippet (for nice tabular data).

BuilderOpt is the StringBuilder version with the [] case moved to the bottom, BuilderChained is with chained Append calls, e.g. sb.Append(a).Append(" and ").Append(b) and BuilderFormat is e.g. sb.AppendFormat("{0} and {1}", a, b). Full source code available.

As expected, 'simpler' versions perform better for small lists, the larger the list the better BuilderChained. Concat performs better than I expected but does not produce the right output (missing ".", lacking one case). Yield gets rather slow...

<!DOCTYPE html> <html lang='en'> <head> <meta charset='utf-8' /> <title>Benchmark.CommaAndDot</title>  <style type="text/css">  table { border-collapse: collapse; display: block; width: 100%; overflow: auto; }  td, th { padding: 6px 13px; border: 1px solid #ddd; }  tr { background-color: #fff; border-top: 1px solid #ccc; }  tr:nth-child(even) { background: #f8f8f8; } </style> </head> <body> <pre><code> BenchmarkDotNet=v0.11.1, OS=Windows 10.0.16299.726 (1709/FallCreatorsUpdate/Redstone3) Intel Core i7 CPU 950 3.07GHz (Nehalem), 1 CPU, 8 logical and 4 physical cores Frequency=2998521 Hz, Resolution=333.4977 ns, Timer=TSC   [Host]     : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit LegacyJIT-v4.7.3190.0 DEBUG   DefaultJob : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3190.0 </code></pre> <pre><code></code></pre>  <table> <thead><tr><th>  Method</th><th>Verbosity</th><th>    Mean</th><th>Error</th><th>StdDev</th><th>  Median</th><th>Scaled</th><th>ScaledSD</th> </tr> </thead><tbody><tr><td>Concat</td><td>0</td><td>39.905 ns</td><td>0.0592 ns</td><td>0.0494 ns</td><td>39.906 ns</td><td>1.02</td><td>0.11</td> </tr><tr><td>Yield</td><td>0</td><td>27.235 ns</td><td>0.0772 ns</td><td>0.0603 ns</td><td>27.227 ns</td><td>0.69</td><td>0.07</td> </tr><tr><td>Accumulator</td><td>0</td><td>1.956 ns</td><td>0.0109 ns</td><td>0.0096 ns</td><td>1.954 ns</td><td>0.05</td><td>0.01</td> </tr><tr><td>Builder</td><td>0</td><td>32.384 ns</td><td>0.2986 ns</td><td>0.2331 ns</td><td>32.317 ns</td><td>0.82</td><td>0.09</td> </tr><tr><td>BuilderOpt</td><td>0</td><td>33.664 ns</td><td>1.0371 ns</td><td>0.9194 ns</td><td>33.402 ns</td><td>0.86</td><td>0.09</td> </tr><tr><td>BuilderChained</td><td>0</td><td>39.671 ns</td><td>1.2097 ns</td><td>3.5669 ns</td><td>41.339 ns</td><td>1.00</td><td>0.00</td> </tr><tr><td>BuilderFormat</td><td>0</td><td>40.276 ns</td><td>0.8909 ns</td><td>1.8792 ns</td><td>39.494 ns</td><td>1.02</td><td>0.12</td> </tr><tr><td>Concat</td><td>1</td><td>153.116 ns</td><td>1.1592 ns</td><td>0.9050 ns</td><td>152.706 ns</td><td>0.87</td><td>0.01</td> </tr><tr><td>Yield</td><td>1</td><td>154.522 ns</td><td>0.2890 ns</td><td>0.2256 ns</td><td>154.479 ns</td><td>0.88</td><td>0.00</td> </tr><tr><td>Accumulator</td><td>1</td><td>223.342 ns</td><td>0.3678 ns</td><td>0.2872 ns</td><td>223.412 ns</td><td>1.27</td><td>0.00</td> </tr><tr><td>Builder</td><td>1</td><td>232.194 ns</td><td>0.2951 ns</td><td>0.2465 ns</td><td>232.265 ns</td><td>1.32</td><td>0.00</td> </tr><tr><td>BuilderOpt</td><td>1</td><td>232.016 ns</td><td>0.5654 ns</td><td>0.4722 ns</td><td>232.170 ns</td><td>1.31</td><td>0.00</td> </tr><tr><td>BuilderChained</td><td>1</td><td>176.473 ns</td><td>0.3918 ns</td><td>0.3272 ns</td><td>176.341 ns</td><td>1.00</td><td>0.00</td> </tr><tr><td>BuilderFormat</td><td>1</td><td>219.262 ns</td><td>6.7995 ns</td><td>6.3603 ns</td><td>217.003 ns</td><td>1.24</td><td>0.03</td> </tr><tr><td>Concat</td><td>10</td><td>1,284.042 ns</td><td>1.7035 ns</td><td>1.4225 ns</td><td>1,283.443 ns</td><td>1.68</td><td>0.05</td> </tr><tr><td>Yield</td><td>10</td><td>6,532.667 ns</td><td>12.6169 ns</td><td>10.5357 ns</td><td>6,533.504 ns</td><td>8.55</td><td>0.24</td> </tr><tr><td>Accumulator</td><td>10</td><td>2,701.483 ns</td><td>4.8509 ns</td><td>4.5376 ns</td><td>2,700.208 ns</td><td>3.54</td><td>0.10</td> </tr><tr><td>Builder</td><td>10</td><td>1,865.668 ns</td><td>5.0275 ns</td><td>3.9252 ns</td><td>1,866.920 ns</td><td>2.44</td><td>0.07</td> </tr><tr><td>BuilderOpt</td><td>10</td><td>1,820.402 ns</td><td>2.7853 ns</td><td>2.3258 ns</td><td>1,820.464 ns</td><td>2.38</td><td>0.07</td> </tr><tr><td>BuilderChained</td><td>10</td><td>764.334 ns</td><td>19.8528 ns</td><td>23.6334 ns</td><td>756.988 ns</td><td>1.00</td><td>0.00</td> </tr><tr><td>BuilderFormat</td><td>10</td><td>1,177.186 ns</td><td>1.9584 ns</td><td>1.6354 ns</td><td>1,177.897 ns</td><td>1.54</td><td>0.04</td> </tr><tr><td>Concat</td><td>100</td><td>25,579.773 ns</td><td>824.1504 ns</td><td>688.2028 ns</td><td>25,288.873 ns</td><td>5.33</td><td>0.14</td> </tr><tr><td>Yield</td><td>100</td><td>421,872.560 ns</td><td>902.5023 ns</td><td>753.6302 ns</td><td>421,782.071 ns</td><td>87.87</td><td>0.23</td> </tr><tr><td>Accumulator</td><td>100</td><td>80,579.168 ns</td><td>227.7392 ns</td><td>177.8038 ns</td><td>80,547.868 ns</td><td>16.78</td><td>0.05</td> </tr><tr><td>Builder</td><td>100</td><td>15,047.790 ns</td><td>26.2248 ns</td><td>21.8989 ns</td><td>15,048.903 ns</td><td>3.13</td><td>0.01</td> </tr><tr><td>BuilderOpt</td><td>100</td><td>15,287.117 ns</td><td>39.8679 ns</td><td>31.1262 ns</td><td>15,293.739 ns</td><td>3.18</td><td>0.01</td> </tr><tr><td>BuilderChained</td><td>100</td><td>4,800.966 ns</td><td>11.3614 ns</td><td>10.0716 ns</td><td>4,801.450 ns</td><td>1.00</td><td>0.00</td> </tr><tr><td>BuilderFormat</td><td>100</td><td>8,382.896 ns</td><td>87.8963 ns</td><td>68.6236 ns</td><td>8,368.400 ns</td><td>1.75</td><td>0.01</td> </tr></tbody></table> </body> </html>

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: