![]() ![]() JSON Generator is a very nice tool to create datasets. I’m including here some commands that I used for testing. On Windows, named volumes must be used in some cases because of file permission issues. Configuration files can either be copied and stored as part of the image or we can follow the volume approach. Application data should be stored in volumes. I created 4 containers, 2 for different databases, 1 for Presto and the last for Metabase. It does not support SQL out of the box, which makes it harder for analysts to extract data because they have to learn another query language. Documents are stored as JSON objects, making it a good choice for semi-structured data with a flexible schema. From my research, these support more chart types but are less user friendly and deployment is a bit more complicated. Other alternatives include Redash and Superset. Some advanced visualizations still require SQL knowledge. Metabase stands out for being an open-source BI technology that is easy to use even for people that don’t know SQL, allowing them to explore the data and create web dashboards. There is a lot of commercial Business Intelligence software out there. There is another recent popular Presto fork called Trino. Amazon created Athena which is based on Presto and heavily integrated in AWS. Performance comparisons are out of the scope for this post, but Presto is used by big players in data-intensive environments. Arrays, nested objects, multi-database joins and the regular SQL operations are all supported.Īpache Drill is a very similar alternative to Presto, however it appears to be less popular and doesn’t support as many data sources. The main advantage of Presto is that it has many data connectors, such as Kafka, Cassandra, Elasticsearch, MongoDB, Postgres, etc… It is then able to infer the schema automatically and handle semi-structured data. In the Hadoop world, a number of solutions emerged to enable the usage of SQL to retrieve data, Presto being the most interesting in my opinion. ![]() This leads to huge data integration efforts and expensive ETL processes. Prestoįollowing the rise of NoSQL databases, many specialized query languages exist today. To connect the containers in a local environment, Docker Compose is the easiest solution but for production, the recommended approach is to use an orchestration system like Kubernetes. For example, a simple scenario when doing back-end development is to use a container for the application and another for the database. This is extremely helpful, specially during development.Įach container should only have one function. Docker Hub contains many container images so we have a starting point and don’t have to package everything ourselves. Containers package software applications and all their dependencies, which helps enable reproducible infrastructure. I will be creating a small simulation using Metabase, a web based open-source BI solution to visualize data from MongoDB. To mitigate this issue, Facebook created Presto, a high performance, distributed SQL query engine for big data. With an increasing number of specialized databases, each having their own query languages, data analysts have a hard time to combine data from multiples sources. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |