A major focus of my day job right now is cleaning up the PHP in a decades-old monolith. This includes tests written for two different test runners (PHPUnit and an in-house library) by hundreds of engineers over the years.
I could write a book on the horrors I’ve seen (and currently have at least half a dozen blog posts in draft state), but I’m not interested in raking anyone over the coals for past engineering decisions—honestly, it’s to be expected with any project this size and age. Instead, I wanted to take a moment to talk about one of the most prevalent oversights made by engineers of all levels: strict equality.
Not all equals are made equal
In PHP, JavaScript, and many other loosely-typed languages, there are two ways to determine if something is equal:
- Loose equality (e.g.
==
, or “double equals”) - Strict equality (e.g.
===
, or “triple equals”)
How do these compare? The main difference is that loose equality checks don’t account for types when making a comparison; when comparing a string against an integer, the integer will be treated as a string, so 123 == '123'
. In fact, all of the following evaluate as true when using loose equality:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
# Truthy values true == 'some string'; true == 1; true == 1.0; true == ['some', 'array']; true == new stdClass(); # Falsey values false == ''; false == '0'; false == 0; false == 0.0; false == null; false == []; |
If you’re familiar with PHP’s empty()
function, those “falsey” values should look familiar, as all of them are considered to be empty.
Now, if we replaced the double equal sign with a triple equals, all of those evaluations would be false:
1 2 3 4 5 6 7 8 9 10 |
var_dump(true === 'some string'); #=> bool(false) var_dump(true === 1); #=> bool(false) var_dump(true === ['some', 'array']); #=> bool(false) // ...and so on. |
This is because strict equality accounts for the type of a variable: an object is not the same as an array, which is not the same as an integer nor a boolean.
When we think about equality, this makes sense: should an object—even one without any properties assigned to it—be equivalent to the boolean true?
Loose equality laughs in the face of the Transitive Property
If you think back to grade school math classes, you might recall the Transitive Property of Equality, which states:
If two values are equal, and either of those two values is equal to a third value, that all the values must be equal.
Written algebraically, if a = b
and b = c
, then a = c
.
Sadly, loose equality must have been out sick the day this was taught because the Transitive Property does not apply to loose equality:
1 2 3 4 5 6 7 8 |
123 == true; #=> bool(true) true == 'hello, there!'; #=> bool(true) 123 == 'hello, there!'; #=> bool(false) |
In a world where fundamental mathematical principles don’t apply, you can see how bugs might sneak in!
Loose equality and conditionals
Conditional statements (e.g. if/else
) are a foundational part of [nearly] every programming language, and for good reason: if some expression evaluates as true, do this; otherwise, do that:
1 2 3 4 5 |
if ($expr) { // $expr is truthy } else { // $expr is falsey } |
It’s worth noting that PHP handles conditionals with loose equality, so anything loosely equal to true will evaluate as such:
1 2 3 4 5 6 7 8 9 10 11 |
if (1) { // This will always evaluate as true } if ('some string') { // This will always evaluate as true } if (['some', 'array']) { // This will always evaluate as true } |
However, we can also make our conditionals use strict equality by including strict comparisons in our expressions:
1 2 3 4 5 6 7 8 9 10 11 |
if (true === 1) { // This will never be reached } if (true === 'some string') { // This will never be reached } if (true === ['some', 'array']) { // This will never be reached } |
This works because true === 'some string'
is an expression itself, which evaluates as false.
Why is this important? Let’s consider PHP’s strpos()
function, which returns the index of a substring within a string, or false if the substring is not found:
1 2 3 4 5 6 7 8 9 10 |
$string = 'Hello, there!'; strpos($string, 'there'); #=> int(7) strpos($string, 'Hello'); #=> int(0) strpos($string, 'world'); #=> bool(false) |
Observant readers might already see the problem here: the string uses a zero-based index, so the first character has an index of 0
. If we’re not careful, we could easily fail to detect the string:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
# Try this for yourself: https://3v4l.org/S0eW2 $string = 'Hello, there!'; if (!strpos($string, 'Hello')) { echo 'Did not detect "Hello" in string (default)' . PHP_EOL; } if (false == strpos($string, 'Hello')) { echo 'Did not detect "Hello" in string (loose equality)' . PHP_EOL; } if (false === strpos($string, 'Hello')) { echo 'Did not detect "Hello" in string (strict equality)' . PHP_EOL; } |
When we run this code, we’ll get the following output:
1 2 |
Did not detect "Hello" in string (default) Did not detect "Hello" in string (loose equality) |
Wait, what happened? Since when does “Hello, there!” not contain the substring “Hello”?!
Remember: the integer 0
(the index of “Hello” in the string) is loosely equal to the boolean false, so the conditional statement fails. However, when our conditional is specifically “is the return value strictly equal to the boolean false?” then the condition fails (and thus we don’t print a “Did not detect” message).
Loose equality makes for poor tests
One of the places where loose equality can be most dangerous is in tests, because you’d better be sure that the result you’re getting is exactly what you expected.
Imagine you’re writing tests for the following User
model:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
<?php class User { private ?string $email = null; /** * Set the user's email address. * * @param string $email The user's email address. * * @return void */ public function setEmail(string $email): void { $this->email = $email; } /** * Return the user's email address (if one has been set), * otherwise null. * * @return ?string The user's email address or null. */ public function getEmail(): ?string { return $this->email ?? null; } } |
Passing around email addresses as strings? Did the author of that code not read my post about value objects?!
Anyway, it’s not uncommon to find tests for a class like this written in this way:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
use PHPUnit\Framework\TestCase; final class UserTest extends TestCase { public function testEmailMethods(): void { $user = new User(); $this->assertEquals( null, $user->getEmail(), 'Email address should be null by default.' ); $this->assertEquals( $user->getEmail(), 'The user email address should have been set.' ); } public function testEmailMethodsWithEmptyEmail(): void { $user = new User(); $user->setEmail(''); $this->assertEquals( null, $user->getEmail(), 'The user email should be unchanged' ); } } |
Here we have two tests:
testEmailMethods()
ensures that the user’s email starts off null and, after we set one,getEmail()
will give us back that value.testEmailMethodsWithEmptyEmail()
verifies that if we try to set an empty email address we still get null (presumably because of validation somewhere)
However, there’s a small but significant difference here: in the second test, the user’s email is not unchanged, as it went from null to an empty string. As it turns out, our User
model doesn’t have any sort of validation, so while an empty string is not a valid email address it is a valid string.
While this may seem trivial, perhaps you need to run a report that returns all users that don’t have an email address (e.g. SELECT COUNT(*) FROM users WHERE email IS NOT NULL
); be aware that users with empty (but not null) email addresses will not be counted!
Similarly, you may find yourself running into database errors while saving records, especially if you have a unique index on the email column—multiple null values are permitted, but only one user can have the email address of ""
.
We can save ourselves the need to troubleshoot this later by using PHPUnit’s assertSame()
constraint:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
use PHPUnit\Framework\TestCase; final class UserTest extends TestCase { public function testEmailMethods(): void { $user = new User(); $this->assertSame( null, $user->getEmail(), 'Email address should be null by default.' ); $this->assertSame( $user->getEmail(), 'The user email address should have been set.' ); } public function testEmailMethodsWithEmptyEmail(): void { $user = new User(); $user->setEmail(''); $this->assertSame( null, $user->getEmail(), 'The user email should be unchanged' ); } } |
When we run the updated test suite, we’ll see a failure:
1 2 3 |
1) UserTest::testEmailMethodsWithEmptyEmail The user email should be unchanged Failed asserting that '' is identical to null. |
By swapping out assertEquals()
for assertSame()
, our test suite is able to tell us that the code is not behaving how we expect!
Of course, this is just one example. Think about how many places in your codebase a function might return null or false when an error occurs. Should these have the same semantic meaning as 0
, ""
, or []
?
As a general rule, you should always default to strict equality checks! With few exceptions, strict equality will produce more-resilient and less-buggy code.
Caveat: strict equality of objects
Given what we’ve discussed so far, what are you having for lunch?:
1 2 3 4 5 |
if (new stdClass() === new stdClass()) { echo "It's pizza day!"; } else { echo "Who's feeling tacos?"; } |
If you were hoping for pizza, I’m sorry to disappoint you: two objects—even if they contain the exact same values—are only strictly equal if they represent the same instance.
Without getting too far into details, each time you instantiate an object (e.g. via the new
keyword), a new instance of that class is created, which has its own unique ID and memory allocation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
$obj1 = new stdClass(); $obj2 = new stdClass(); var_dump( spl_object_id($obj1), spl_object_id($obj2), // Different instances are not strictly equal, // but *are* loosely equal. $obj1 === $obj2, $obj1 == $obj2, // An instance is strictly equal it itself $obj1 === $obj1, $obj2 === $obj2 ); #=> int(1) #=> int(2) #=> bool(false) #=> bool(true) #=> bool(true) #=> bool(true) |
This is unique to objects: two empty arrays are strictly equal to one another, as are integers, floats, booleans, strings, and nulls.
However, since no two objects are the same you need to be careful in how you make assertions against them in tests. A few strategies that work well include:
1 2 3 4 5 6 7 8 9 10 |
# 1. Compare specific properties (like IDs) $this->assertSame($expected->getID(), $actual->getID()); # 2. Compare only the underlying data $this->assertSame(json_encode($expected), json_encode($actual)); $this->assertSame(serialize($expected), serialize($actual)); $this->assertSame($expected->toArray(), $actual->toArray()); # 3. Just use assertEquals() $this->assertEquals($expected, $actual); |
Whichever route you take, remember that strict comparisons produce better code!
Leave a Reply