Você pode ler este post em português também.
Hello! The first article of this year talks about data modeling. A situation that makes me angry for years is when you have attributes that work as flags inside tables. Think twice (or maybe three times) before create them because usually, they are unnecessary. And why? We create them to mark a state of a record, which can be done, many times, querying another attributes, generating duplicity situations (so much problematic, per si). Let’s see examples:
mysql> SELECT
- FROM tb\_orders LIMIT 5;
+------------ ... +--------------+------------ ... +----------------+
| order\_id ... | sent\_method | sent\_date ... | sent\_flag |
+------------ ... +--------------+------------ ... +----------------+
| 986565 ... | 1 | 2010-11-18 ... | Y |
| 659813 ... | NULL | NULL ... | N |
| 986542 ... | 2 | 2010-11-13 ... | Y |
| 321598 ... | 1 | 2010-01-14 ... | Y |
| 112982 ... | NULL | NULL ... | N |
+------------ ... +--------------+------------ ... +----------------+
5 rows in set (0.00 sec)
mysql>
The example above shows us a common situation: a flag to show something which another tuple already show. It’s redundant, and it can, easily, broke the data integrity. An error of code or a simple forget during an insert or update operation can result at the situation below:
mysql> SELECT
- FROM tb\_orders LIMIT 5;
+------------ ... +--------------+------------ ... +----------------+
| order\_id ... | sent\_method | sent\_date ... | sent\_flag |
+------------ ... +--------------+------------ ... +----------------+
| 986565 ... | 1 | 2010-11-18 ... | Y |
**| 659813 ... | NULL | NULL ... | Y |**
| 986542 ... | 2 | 2010-11-13 ... | Y |
| 321598 ... | 1 | 2010-01-14 ... | Y |
| 112982 ... | NULL | NULL ... | N |
+------------ ... +--------------+------------ ... +----------------+
5 rows in set (0.00 sec)
mysql>
Or even:
mysql> SELECT
- FROM tb\_pedidos LIMIT 5;
+------------ ... +--------------+------------ ... +----------------+
| order\_id ... | sent\_method | sent\_date ... | sent\_flag |
+------------ ... +--------------+------------ ... +----------------+
| 986565 ... | 1 | 2010-11-18 ... | S |
**| 659813 ... | 1 | 2010-05-12 ... | N |** | 986542 ... | 2 | 2010-11-13 ... | S |
| 321598 ... | 1 | 2010-01-14 ... | S |
| 112982 ... | NULL | NULL ... | N |
+------------ ... +--------------+------------ ... +----------------+
5 rows in set (0.00 sec)
mysql>
How we can proceed to treat this kind of situation? The answer is to analyze each flag as follow:
Verify each flag you want to create. There is another attribute (or a group of them, even in another tables) that be able to show the required situation? If so, maybe this flag is useless, and you can perform the following actions:
Create this flag as a calculated fields in the analyzed table;
Create views, and inside them create your flags as (again) calculated fields;
And finally, use can use triggers to populate theses flags automatically, watching the attribute/tuple that can show the situation. Sure, there are situations where the flags are essential. In this cases, I clap for them (see the “gender” and “record_active” attributes below):
mysql> SELECT
- FROM tb\_pessoas LIMIT 5;
+------------ ... +------- ... +------------+ ... +-----------------+ ...
| person\_name ... | gender ... | reg\_date | ... | record\_active | ...
+------------ ... +------- ... +------------+ ... +-----------------+ ...
| Carlos ... | M ... | 2006-01-18 | ... | S | ...
| Sandro ... | M ... | 2005-01-12 | ... | S | ...
| Henrique ... | M ... | 2008-02-13 | ... | S | ...
| Luiza ... | F ... | 2006-01-14 | ... | S | ...
| Antonio ... | M ... | 2008-01-14 | ... | N | ...
+------------ ... +------- ... +------------+ ... +-----------------+ ...
5 rows in set (0.00 sec)
mysql>
A good tip would be to ask the following question to yourself before to create an attribute of indicator/flag type: “This attribute will works to show a situation whose state can change during the life of the record?” If the answer tends to YES then imagine the semaphore should be, at least, in yellow. Then proceed with caution and consider whether you can instead of create more one physical attribute, to create a calculated field, a view or a trigger to populate it automatically.
Do you agree/disagree or have something to add? Comment!!
See you! ;-)