Schema and transformers¶
Since version 1.7.0 of Datafaker it's possible to specify transformation schema.
It also provides a set of ready to use transformers:
- CSV
- JSON
- SQL
- YAML
- XML
- Java Object
Schema¶
Schema is a set of rules describing what should be done to transform data from Datafaker representation to one of the supported formats. One of the main advantages of Schema is that the same schema could be used to transform to different formats.
Schema can be used in 2 ways: it could be used to generate data from scratch or it could be used to transform existing data.
Example of schema definition:
It is also supported nested(composite) fields e.g.:
CSV transformation¶
CSV transformer could be build with help of CsvTransformer.CsvTransformerBuilder
e.g.
The following can be configured:
- the separator and quotes could be specified with
separator()
andquote()
- with or without header also could be specified with
header()
To generate data based on a schema just call generate
against schema
:
Also it's possible to use schemas to transform existing data. E.g. there is a collection of Name
objects and we are going to build csv of first and last names based on this collection:
Schema<Name, String> schema =
Schema.of(field("firstName", Name::firstName), field("lastname", Name::lastName));
CsvTransformer<Name> transformer =
CsvTransformer.<Name>builder().header(false).separator(" : ").build();
String csv =
transformer.generate(
faker.<Name>collection().suppliers(faker::name).maxLen(limit).build(),
schema);
val faker = BaseFaker()
val schema = Schema.of(field("firstName", Name::firstName), field("lastname", Name::lastName))
val transformer = CsvTransformer.builder<Name>().header(false).separator(" : ").build()
val csv = transformer.generate(
faker.collection<Name>().suppliers(Supplier { faker.name() }).maxLen(limit).build(), schema
)
JSON transformation¶
JSON transformation is very similar to CSV. The main difference is that JSON supports nested values which could be handled with help of compositeField
.
Example of JSON generation:
To use composite fields it should be defined on Schema
level and nothing more.
SQL Transformation¶
Note: right now only INSERT
is supported.
It generates a number of INSERT
statements. There are 2 modes: batch and non batch generation.
Batch generation means that one INSERT
statement contains several rows to insert. Since different databases could have different syntax there is initial support for different dialects. Dialect could be specified during SQLTransformaer
build e.g:
Dialect also handles SQL quote identifiers, quotes and other SQL dialect specifics.
An example of batch mode:
Faker faker = new Faker();
Schema<String, String> schema =
Schema.of(field("firstName", () -> faker.name().firstName()),
field("lastName", () -> faker.name().lastName()));
SqlTransformer<String> transformer =
new SqlTransformer.SqlTransformerBuilder<String>()
.batch(5)
.tableName("MY_TABLE")
.dialect(SqlDialect.POSTGRES)
.build();
String output = transformer.generate(schema, 10);
val faker = Faker()
val schema: Schema<String, String> = Schema.of(
field("firstName", Supplier { faker.name().firstName() }),
field("lastName", Supplier { faker.name().lastName() })
)
val transformer = SqlTransformer.SqlTransformerBuilder<String>()
.batch(5)
.tableName("MY_TABLE")
.dialect(SqlDialect.POSTGRES)
.build()
val output = transformer.generate(schema, 10)
will generate 2 INSERT
each containing 5 rows e.g.
INSERT INTO MY_TABLE ("firstName", "lastName")
VALUES ('Billy', 'Wintheiser'),
('Fernando', 'Sanford'),
('Jamey', 'Torp'),
('Nicolette', 'Wiza'),
('Sherman', 'Miller');
INSERT INTO MY_TABLE ("firstName", "lastName")
VALUES ('Marcell', 'Walsh'),
('Kareen', 'Bode'),
('Jules', 'Homenick'),
('Lashay', 'Gaylord'),
('Tyler', 'Miller');
Advanced SQL types¶
It also supports generation of ARRAY
, MULTISET
and ROW
types. Please be aware that not every database engine supports it and datafaker could do it for every dialect.
To generate ARRAY
schema field supply an array.
To generate MULTISET
schema field supply a list (SQL MULTISET
could contain duplicates).
To generate ROW
schema field should supply a compositeField
.
e.g.
will lead to
will lead to
will lead to
Spark SQL¶
Some engines like Spark stand out with support for complex types like STRUCT
and MAP
.
Spark dialect doesn't support batch inserts. The dialect will throw an exception if you attempt to generate batch inserts.
The following schema:
will lead to:
INSERT INTO `MyTable` (`string`, `array`, `map`, `struct`)
VALUES ('string', ARRAY(1, 2, 3), MAP('key', 'value'), NAMED_STRUCT('name', '2'));
YAML transformation¶
YAML transformation is very similar to CSV.
The following is an example on how to use it:
final BaseFaker faker = new BaseFaker();
YamlTransformer<Object> transformer = new YamlTransformer<>();
Schema<Object, ?> schema = Schema.of(
field("name", () -> faker.name().firstName()),
field("lastname", () -> faker.name().lastName()),
field("phones", () -> Schema.of(
field("worknumbers", () -> ((Stream<?>) faker.<String>stream().suppliers(() -> faker.phoneNumber().phoneNumber()).maxLen(2).build().get())
.collect(Collectors.toList())),
field("cellphones", () -> ((Stream<?>) faker.<String>stream().suppliers(() -> faker.phoneNumber().cellPhone()).maxLen(3).build().get())
.collect(Collectors.toList()))
)),
field("address", () -> Schema.of(
field("city", () -> faker.address().city()),
field("country", () -> faker.address().country()),
field("streetAddress", () -> faker.address().streetAddress())
))
);
System.out.println(transformer.generate(schema, 1));
will generate yaml with nested fields:
name: Mason
lastname: Bechtelar
phones:
worknumbers:
- (520) 205-2587 x2139
- (248) 225-6912 x4880
cellphones:
- 714-269-8609
- 1-512-606-8850
- 1-386-909-7996
address:
city: Port Wan
country: Trinidad and Tobago
streetAddress: 6510 Duncan Landing
Java Object transformation¶
Java Object transformer could be built with help of JavaObjectTransformer.
When building JavaObjectTransformer you should provide a class to be used as a template for generated objects.
Then you should provide a schema for the class.
JavaObjectTransformer jTransformer = new JavaObjectTransformer();
Schema<Object, ?> schema = Schema.of(
field("firstName", () -> faker.name().firstName()),
field("lastName", () -> faker.name().lastName()),
field("birthDate", () -> faker.date().birthday()),
field("id", () -> faker.number().positive()));
System.out.println(jTransformer.apply(Person.class, schema));
val jTransformer = JavaObjectTransformer()
val schema: Schema<Any, Any> = Schema.of(
field("firstName", Supplier { faker.name().firstName() }),
field("lastName", Supplier { faker.name().lastName() }),
field("birthDate", Supplier { faker.date().birthday() }),
field("id", Supplier { faker.number().positive() }))
println(jTransformer.apply(Person::class.java, schema))
will generate object with fields populated with random values based on specified suppliers.
Populating Java Object with predefined Schema¶
You can use predefined schema to populate Java Object or default schema for the class. Schema should be declared as a static method with return type Schema<Object, ?>
.
Then you should provide a class to be used as a template for generated objects. Class should be annotated with @FakeForSchema
annotation with path to the schema method as a value.
Note: If default schema and class template are in the same class, you can omit full path to the method and use only method name.
Then you can use net.datafaker.providers.base.BaseFaker.populate(java.lang.Class<T>)
to populate object with default predefined schema.
Or you can use net.datafaker.providers.base.BaseFaker.populate(java.lang.Class<T>, net.datafaker.schema.Schema<java.lang.Object, ?>)
to populate object with custom schema.