Getting started

Start a new Clojure project in Eclipse using New → Project → Clojure → Clojure Project. Name the new project lab9.

Open the file src/lab9/core.clj. Create your functions in this file. When you are ready to test your functions, start a REPL by right-clicking in the editor and choosing Clojure → Load file in REPL. When you modify your code, do the same thing to reload your definitions in the REPL.

If you would prefer to use Light Table, or edit your code in a text editor and run the Clojure REPL from the command line, feel free.

Your task

Complete the exercises shown below.

Word count

Write a function named count-words. It should take a sequence of strings as a parameter, and return a map indicating, for each word in the input sequence, how many occurrences of that word there were.

Requirement: your function must be tail-recursive. Suggestion: use the loop construct, where the loop variables are the current sequence of words and the current accumulator (the partially constructed of words to word counts.)

Example calls:

=> (count-words ["clojure" "is" "fun" "and" "jello" "is" "yummy"])
{"yummy" 1,
 "jello" 1,
 "and" 1,
 "fun" 1,
 "is" 2,
 "clojure" 1}

=> (count-words ["hey" "ya" "hey" "ya"])
{"ya" 2, "hey" 2}

Hints:

Sort by frequency

Write a function called sort-by-frequency. It should take a sequence of strings as a parameter, and return a sequence of strings as a result. The result sequence should contain the strings in the original sequence, sorted by frequency, so that the most frequent strings are returned first. To break ties (when two words occur with equal frequency), the strings themselves should be compared. So, if "apples" and "oranges" occur with equal frequency, "apples" should be ordered before "oranges" because "apples" compares as less than "oranges".

Example calls:

=> (sort-by-frequency ["apples" "apples" "lemons" "pears" "apples" "oranges" "oranges" "lemons"])
["apples" "lemons" "oranges" "pears"]
=> (seq (sort-by-frequency ["pump" "up" "the" "jam" "pump" "it"]))
("pump" "it" "jam" "the" "up")

Note how in the first example, "lemons" is earlier in the result sequence than "oranges": their counts are the same (2), but the string "lemons" compares as less than "oranges".

Here are some hints.

Have sort-by-frequency call count-words.

Use the seq function to convert the map of strings to counts into a sequence. Each member of the sequence will be a vector whose first element is a string (word) and whose second element is an integer (count).

Sort the sequence of string/integer vectors using the sort function. Here is a comparator function that will probably be useful:

(defn pair-gt [[lkey lval] [rkey rval]]
  (let [valcmp (compare rval lval)  ; a higher count should compare as "earlier"
        keycmp (compare lkey rkey)] ; use the usual sort order for keys (tiebreaker)
    (if (= valcmp 0)
      keycmp
      valcmp)))

This is essentially a Clojure version of a Java comparator: it compares two key/value pairs. The Clojure compare function (used to compare the keys and values in the two pairs being compared) is like Java's compareTo method, returning a negative value if the left value is less than the right, 0 if they are equal, and a positive value if the left value is greater than the right.

Once the key/value pairs are sorted, use the map or mapv functions to extract just the first element from each pair in the sorted sequence of pairs.