How to pass Google Certified Professional Data Engineer Exam
I obtained my certifications of GCP Associate Cloud Engineer and Professional Data Engineer with 3 years of work experience and 1 year of cloud experience. Here are some thoughts on the preparation for them:
- Pick the exam that you are specialized in. I am a data engineer so I find two exams with similar difficulty. I do not have a lot of DevOp/Architect experience so the Associate Cloud Engineer exam was not easy for me compared to the Professional Data Engineer Exam. I have experience in the Data Science field so my next certification will be ML. Similarly, if you come from a Linux IT/Cybersecurity background, go for security certification, if you are a developer, go for DevOps.
- Go with the lowest level of Certification if you do not have Cloud Experience in Azure, GCP, or AWS. Before obtaining these two certifications, I had an AWS Cloud Practitioner certification. The exam was easy and in the process of preparing, you will have a great understanding of the services the cloud platforms provide. It will also give you a sense of feeling the gap between all the study materials and the actual exam. The exam questions are generally harder than the practice exams.
- Hands-On is more important than the exam itself. I do not recommend taking classes on Udemy and then just taking the exam. I finished my Udemy class on Associate Cloud Engineer and I found it hard to remember all the concepts without utilizing them. Enterprises are hiring cloud experts to do the job instead of telling the concepts. I recommend ACloudGuru instead because they provide hands-on labs and very detailed step-by-step instructions on their own sandbox environment. The sandbox is exactly the same as the real cloud so you do not have to worry about the expenses generated. ACloudGuru also provides essential skills beyond the concepts: Linux Command, CLI command, SDK setup, Containers Deployment, Network Setup, etc.
Below is how I structured my knowledge on the Data Engineer Exam:
Data Engineer Structure: Data Source -> ETL -> Storage / Database Management-> Analysis/Application Feed-> Security&Monitoring. With these 5 concepts in mind, you can expand your knowledge in a systematic way.
Most of the questions will come in the formats of :
In this situation, with these requirements, what should you do?
The requirements usually indicate answers:
Real-Time = Pub/Sub, Data Warehouse = BigQuery, Unstructured = DataProc, Beam = Dataflow, Global=Cloud Spanner, Under 30TB relational = Cloud SQL, , Non-Developer = Data Prep etc.
Note: Little details will show up too(actual BigQuery query and actual YAML files). These tricks are here to help you if you do not know the answer to the question at all.
Very Important Concepts to pass the exam:
- Apache Beam: Dataflow will show up multiple times in the exam. It is important to understand Beam programming.
- BigQuery: as the data warehouse solution, query performance questions will show up multiple times too. You should learn the concept of partitioning, clustering, and table design.
- DataProc: Serverless Datalake solution on top of Spark.
- It is highly recommended you read Google Docs on these services as some questions are directly from the doc page. They are under the Product tab of https://cloud.google.com/
Final Tips:
- If you have taken any basic machine learning class, there is no need to review. Questions on machine learning are very basic.
- If you do not have experience with the Apache Family(I believe a lot of data analysts do not), learn the concept before you move to the preparation for the Data Engineer exam.
sharpening the axe won’t delay the cutting of firewood; good preparation saves work time.
3. Do-It-On-GCP: actual deployment will help you with remembering the concepts and truly show your skills in the organization.
At last, I wish you pass the exam soon!
Please leave a thumb if you like my article.