3 Property-Based Randomized Testing

8.17

contents ← prev up next →

3 Property-Based Randomized Testing🔗

3.1 PBT: Property-Based Testing

3.2 QCheck: Property-based Randomized Testing

3.3 Another example: Delete an item from the list

3.4 Arbitrary

3.5 Example: Even Number

3.1 PBT: Property-Based Testing🔗

To ensure a program behaves as expected, we write unit tests. However, writing good unit tests that cover all edge cases can be challenging. This is where property-based testing becomes useful.

Property-based testing is a testing approach where you define general properties (invariants) that your code should always satisfy, and the testing framework generates random inputs to check that those properties hold true in a wide variety of cases.

Instead of writing specific test cases with fixed inputs and outputs (as in unit testing), you write rules or properties about the expected behavior, and let the testing framework generate many random test cases to verify those rules.

Let’s test ‘rev‘ (list reverse) with a unit test

OCaml REPL


let rec rev l = 
  match l with 
   [] -> []
   | h::t -> rev t @ [h];;

let test_reverse = rev [1;2;3] = [3;2;1];;


# val rev : 'a list -> 'a list = <fun>
# val test_reverse : bool = true
#

But this test does not show if the ‘rev‘ function works correctly with large lists, lists with large integers, lists with duplicate values, string lists with unicode characters, etc.

Instead of unit tests on specific inputs and outputs, what if we could test properties that hold for all inputs? For example, ‘rev‘ function has the following property: reversing a list twice gives back the original list.

let prop_reverse l = rev (rev l) = l

In other words, each of the following evaluates to true:


prop_reverse []
prop_reverse [1; 2; 3]
prop_reverse [1.0; 2.22]

Now, we can randomly generate large lists of various types and feed them as input to the property prop_reverse. If prop_reverse returns false for any input, we’ve found a bug. If it returns true for millions of inputs, it doesn’t guarantee the absence of bugs — but it gives us a high level of confidence in the correctness of the rev function.

Here some examples of the propeties we can test:

Sorting: the output must be sorted and it is the permutation of the input
HashMap: Inserting a key value pair, and searching the key must return the value. Delete a key and searching for the key must ruturn Not_Found. Inserting the same key multiple times, and searching for the key must return the last value insrted etc.
Valid Triangle: Adding any two sides must be greater than the third side.

Coming up with a correct and meaningful properties is a challnge. In this lecture, we will study how to test OCaml programs using property based testing library called QCheck.

3.2 QCheck: Property-based Randomized Testing🔗

QCheck is a Property-Based Testing framework for OCaml. It is a framework that repeatedly generates random inputs, and uses them to confirm that properties hold. For example, it randomly generaly lists to confirm the ‘prop-reverse‘ property of the ‘rev‘ function hold.

let prop_reverse l = rev (rev l) = l;;

QCheck tests are described by a generator that generates random inputs and a property, a bool-valued function. The input generator generates a random input and tests if the property holds with the input. If the property holds, it will try again with another random input. If the property does not hold, we just found a counterexample, or a bug.

3.2.1 Setting Up QCheck🔗

To install QCheck module, run
```
opam install qcheck
```
In the test module, open the Qcheck module
```
open QCheck
```
In utop, before open QCheck, run
```
#require "qcheck"
```
In a dune project, add the following to the dune file:
```
(libraries qcheck)
```

Here is a complete example of using QCheck to test a property:

OCaml REPL


(* List reverse *)
let rec rev l = 
  match l with 
   [] -> []
   | h::t -> rev t @ [h];;
(* a property that holds for the list reverse *)
let prop_reverse l = rev (rev l) = l;;

#require "qcheck";;
open QCheck;;

(* A QCheck test *)
let test =
   Test.make      (* make a test *)
   ~count:10    (* number of random tests. Change this to a 
                  larger number if you want to test more *)
   ~name:"reverse_test"  (* name of the test *)
   (list small_int) (* an arbitrary. Here is generates a list 
                       of random ints *)
  (fun x-> prop_reverse x) (* calls the property *)
;;
(* run the QCheck Test *)
QCheck_runner.run_tests ~verbose:true [test];;


# val rev : 'a list -> 'a list = <fun>
# val prop_reverse : 'a list -> bool = <fun>
# # # val test : QCheck.Test.t = QCheck2.Test.Test <abstr>
# [2Krandom seed: 516024767
generated error fail pass / total     time test name
[2K[ ]    0    0    0    0 /   10     0.0s reverse_test[2K[[32;1m✓[0m]   10    0    0   10 /   10     0.0s reverse_test
================================================================================
[32;1msuccess[0m (ran 1 tests)
- : int = 0
#

It shows that QCheck generated 10 random tests and all of the 10 tests passed.

3.2.2 Buggy Reverse🔗

Assume the ‘rev‘ function is implemented as:

let rev l = l  (* returns the same list *)

Obviously, this implementation of ‘rev‘ is not correct. A simple unit test would catch the bug:

let test_reverse = rev [1;2;3] = [3;2;1]

However, the property ‘prop_reverse‘ does not catch the bug!

let prop_reverse l = rev (rev l) = l

because the property always holds if ‘rev‘ returns the original list without reversing it. Therefore, we need a better property.


let prop_reverse2 l1 m l2 =
   rev (l1 @ [m] @ l2) = rev l2 @ [m] @ rev l1

This property does not hold for the buggy implementation of rev. Fox example:


rev [1;2]@[3]@[4;5] = rev [4;5] @ rev [3] @ rev [1;2]
    [1;2;3;4;5] = [4;5;3;1;2] (* property fails *)

This property holds only if rev is implemented correctly. Lesson learned: garbage in, garbage out — if the property itself is flawed, property-based testing won’t catch the bug.

3.3 Another example: Delete an item from the list🔗

We want to test the property that if ‘x‘ is deleted from the list, it shoud not be a member of this list.

OCaml REPL

#require "qcheck";; 
 open QCheck;; 
 
 let rec delete x l = match l with 
     [] -> [] 
     | (y::ys) -> if x = y then ys 
                  else y::(delete x ys);; 
 
 let prop_delete x l = not (List.mem x (delete x l));; 
 
 let test = 
    Test.make 
      ~count:1000 
      ~name:"delete_test" 
      (pair small_int (list small_int)) 
      (fun(x,l)-> prop_delete  x l);; 
 
 QCheck_runner.run_tests [test];;

# # # val delete : 'a -> 'a list -> 'a list = <fun>
# val prop_delete : 'a -> 'a list -> bool = <fun>
# val test : QCheck.Test.t = QCheck2.Test.Test <abstr>
# 
[2Krandom seed: 15265360

--- [31;1mFailure[0m --------------------------------------------------------------------

Test delete_test failed (18 shrink steps):

(0, [0; 0])
================================================================================
[31;1mfailure[0m (1 tests failed, 0 tests errored, ran 1 tests)

- : int = 1

It immediately finds a counterexample. The counterexample shows that when are duplicates in the list, The ‘delete‘ only deleted the first occurrence. For example:

delete 2 [2;2;3]  returns [2;3]

3.4 Arbitrary🔗

An ‘’a arbitrary‘ represents an "arbitrary" value of type ‘’a‘. It is used to describe how to

generate random values
shrink them (make counter-examples as small as possible)
print them.

Here a some examples of the arbitraries QCehck offers:


small_int:  int arbitrary
list:       'a arbitrary -> 'a list arbitrary
triple:     'a arbitrary -> 'b arbitrary -> 
            'c arbitrary -> ('a * 'b * 'c) arbitrary

The type ‘arbitrary‘ is defined as:


type 'a arbitrary = {
   gen: 'a Gen.t;
   print: ('a ‑> string) option; (** print values *)
   small: ('a ‑> int) option; (** size of example *)
   shrink: 'a Shrink.t option; (** shrink to smaller examples *)
   collect: ('a ‑> string) option; (** map value to tag, and group by tag *)
   stats : 'a stat list;  (** statistics to collect and print *)
}

We can build custom an ‘arbitrary‘ by calling ‘QCheck.make‘.


make :
  ?print:'a Print.t ->
  ?small:('a -> int) ->
  ?shrink:'a Shrink.t ->
  ?collect:('a -> string) ->
  ?stats:'a stat list -> 'a Gen.t -> 'a arbitrary

For example: Let’s build an arbitrary that generates random ints:


# make (Gen.int);;
- : int arbitrary =
{gen = <fun>; print = None; small = None; shrink = None; collect = None;stats = []}

3.4.1 Random Generator🔗

’a QCheck.Gen.t is a function that takes in a Pseudorandom number generator, uses it to produce a random value of type ‘‘a‘. For example, ‘QCheck.Gen.int‘ generates random integers, while QCheck.Gen.string generates random strings. Let us look at a few more of them:


module Gen : 
    sig
      val int : int t
      val small_int : int t
      val int_range : int -> int -> int t
      val list : 'a t -> 'a list t
      val string : ?gen:char t -> string t
      val small_string : ?gen:char t -> string t
      ...
    end

3.4.2 Sampling Generators🔗

Generate a random small integer

Gen.generate1 Gen.small_int
 7

Generate 10 random small integers


Gen.generate ~n:10 Gen.small_int
 int list =[6;8;78;87;9;9;6;2;3;27]

Generate 5 int lists


let t = Gen.generate ~n:5 (Gen.list Gen.small_int);;
val t : int list list =[[4;2;7;8;…];…;[0;2;97]]

Generate two string lists


let s = Gen.generate ~n:2 (Gen.list Gen.string);;
val s : string list list =[[ “A”;”B”;…]; [“C”;”d”;…]]

3.4.3 Combining Generators🔗


frequency:(int * ‘a) list ->‘a ‘a Gen.t
oneof: 'a Gen.t list -> 'a Gen.t

Generate 80% letters, and 20% space


 Gen.generate ~n:10 
    (Gen.frequency [(1,Gen.return ' ‘);
    (4,Gen.char_range 'a' 'z')]);;
- : char list=['i';' ';'j';'h';'t';' ';' ';' ';'k';'b']

Generate one of the parenthesis ‘[,],(,),{,}‘


Gen.generate1 (Gen.oneof [
              return "[";
              return "(";
              return "{";
              return "]";
              return ")";
              return "}"
            ]);;
- : string = "]"

3.4.4 Shrinking🔗

Shrinking is used to reduce the size of a counter-example. It tries to make the counter-example smaller by decreasing it, or removing elements, until the property to test holds again; then it returns the smallest value that still made the test fail. Therefore, a shrinker attempts to cut a counterexample down to something more comprehensible for humans. A QCheck shrinker is a function from a counterexample to an iterator of simpler values:

'a Shrink.t = 'a -> 'a QCheck.Iter.t

Without shrinking, the ‘delete‘ test would return the following counterexample.


--- Failure -------------------------------
Test anon_test_1 failed (0 shrink steps):

(7, [0; 4; 3; 7; 0; 2; 7; 1; 1; 2])

With shrinking, it returns:


--- Failure -------------------------------
Test anon_test_1 failed (8 shrink steps):

(2, [2; 2])

It is much easier to debug the ‘delete‘ function with the counterexample ‘[2;2]‘ than ‘[0; 4; 3; 7; 0; 2; 7; 1; 1; 2]‘.

How does shrink work? How do we go from:


(7, [0; 4; 3; 7; 0; 2; 7; 1; 1; 2])

to:

(2, [2; 2])

Given a counterexample, QCheck calls the iterator to find a simpler value, that is still a counterexample. The steps are as follows:

Given a shrinking function f ::‘a -> ‘a list
And a counterexample x :: ‘a
Try all elements of (f x) to find another failing input…
Repeat until a minimal one is found.

QCheck’s Shrink contains a number of builtin shrinkers:

Shrink.nil performs no shrinking
Shrink.int for reducing integers
Shrink.char for reducing characters
Shrink.string for reducing strings
Shrink.list for reducing lists
Shrink.pair for reducing pairs
Shrink.triple for reducing triples

For implementing a custom shrink, look at the examples Even Numner and Balanced Brackets.

3.4.5 Printers🔗

Printers print a values of type ’a. Type:

type ‘a printer = ‘a -> string

Printers for primitives:

val pr_bool : bool printer
val pr_int : int printer
val pr_list : ‘a printer -> ‘a list printer

3.5 Example: Even Number🔗

Property-based testing generates numerous random inputs. When a test fails, the generated input may be extremely large or complex (e.g., a massive list or lengthy string). This makes debugging difficult—you don’t want to inspect a 10,000-element list to understand why a property failed. A shrinker takes the failing input and systematically reduces it to a minimal counterexample.

In property-based testing frameworks (such as QCheck), a shrinker is a function or mechanism that attempts to make a failing test input smaller and simpler while still preserving the failure. The following example demonstrates how custom shrinkers work. It shows a buggy implementation of the function even, which checks whether a given number is even. However, there is a bug in the code: if the input is greater than 101, the function always returns false. We want to find the bug using the property

even n = (n mod 2 = 0)

. The custom shrinker shrink_int32 shrinks the counterexample by repeatedly subtracting 2. It stops when the counterexample becomes minimal.

even_number/even.ml
  open QCheck

  (* buggy even number function. It fails if the input is greater than 101 *)
  let even n =
    if n > 101 then false
    else n mod 2 = 0
  ;;

  (* Srinks the integer by subtracting 2 *)
  let shrink_int32 x yield =
      if x>0 then yield (x-2);
      ()

  let g = set_shrink shrink_int32  (int_range 1 200);;
  let t = set_shrink Shrink.nil  (int_range 1 1000);;


  (* Property test for even  *)
  let  test_even =
    Test.make
      ~name:"test_even"
      ~count:1000 (* number of tests *)
     (set_shrink shrink_int32  (int_range 1 200))
      (fun n ->
        even n  = (n mod 2 = 0)
      )
  ;;

  (* run the test *)
  (*
  debug_shrink shows the steps of shrinking a random integer by repeatedly subtracting 2*)

  QCheck_runner.run_tests
    ~verbose:true
    ?debug_shrink:(Some (Some Stdio.stdout))
    [
      test_even
    ]
  ;;

even_number/dune
  (tests
      (names even)
      (libraries stdio qcheck)
  )

Here is the result of running dune test. As you can see, QCheck found a counterexample of 108, and the shrinker reduced it to 106, then 104, and finally stopped at 102, which is the minimal counterexample


random seed: 257932476            
generated error fail pass / total     time test name
[ ]    0    0    0    0 / 1000     0.0s test_even
~~~ Shrink ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Test test_even successfully shrunk counter example (step 0) to:

108

~~~ Shrink ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Test test_even successfully shrunk counter example (step 1) to:

106

~~~ Shrink ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Test test_even successfully shrunk counter example (step 2) to:

104

~~~ Shrink ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Test test_even successfully shrunk counter example (step 3) to:

102
[✗]    4    0    1    3 / 1000     0.0s test_even

--- Failure --------------------------------------------------------------------

Test test_even failed (3 shrink steps):

102
================================================================================
failure (1 tests failed, 0 tests errored, ran 1 tests)

contents ← prev up next →

1	Functional Programming with OCaml
2	Imperative Programming with OCaml
3	Property-Based Randomized Testing
4	OCaml Regular Expressions
5	Finite Automata
6	Context Free Grammars
7	Parsing
8	Operational Semantics
9	Type Checking

3.1	PBT: Property-Based Testing
3.2	QCheck: Property-based Randomized Testing
3.3	Another example: Delete an item from the list
3.4	Arbitrary
3.5	Example: Even Number