Vladimir Ninov is a marketing professional and Head of Organic Growth at 365 Data Science with rich experience in PR, growth hacking, and fintech. He holds a Master’s degree in Business Administration and has founded several companies, including a consultancy firm, a PR agency, and a digital marketing blockchain startup. Vladimir is also a contributing writer for Forbes and Entrepreneur magazines. He has been a marketing advisor on a wide range of projects, one of which won the CESA award for Best Blockchain Startup in Bulgaria. Hi, Vladimir! Could you briefly introduce yourself to our readers?Hi, my name is Vladimir and I am a digital marketing ninja and a tech-savvy enthusiast. Yes, you definitely are 24/7! But since our readers haven’t had the chance to meet you yet – what do you do at the company and what are the projects you have worked on so far?I am a part of the 365 Data Science marketing team and also Head of Organic Growth. So far, I’ve worked on numerous social media, SEO, and organic growth initiatives. You always set in motion ambitious and exciting projects. What are you working on right now?My current task is focused on onboarding publishers to the 365 Data Science Affiliate Program. This is an extremely exciting and challenging project because the affiliate marketing sector is an entirely separate segment in digital marketing and the mindset of the individuals in it contrasts greatly with those from other segments. We’re already seeing amazing results from the 365 Affiliate Program. But let’s turn back the clock to see how you started on this professional path…Well, I obtained my bachelor’s degree in Business Administration with a concentration in Marketing. I also hold a Master’s degree in Business Administration. Since 2008, I’ve worked in various fields like international trade, recruitment, finance, clinical trials for the pharmaceutical sector, digital marketing, PR, consulting, growth hacking, blockchain, technology, and fintech. Moreover, for the past 8 years, I have founded numerous companies including a consultancy firm, PR agency, and a digital marketing blockchain startup. In 2018, I received an invitation to join The Oracles and I became a contributing writer for Forbes and Entrepreneur magazines. Since then, I have been a marketing advisor for several fintech and blockchain projects, one of which won the CESA award for Best Blockchain Startup in Bulgaria. That is quite the record, Vladimir! In your view, what does the future hold for blockchain?The blockchain space is one of the fastest-growing sectors for the past couple of years. In addition, based on the LinkedIn report “The Most In-Demand Hard and Soft Skills of 2020”, blockchain is the top in-demand hard skill valued by companies. I think that within the next 5 years, this technology will completely transform the banking and financial sectors. In fact, blockchain is analogous to the booming of the internet, mobile application development, and the launch of debit and credit cards during their era. From blockchain to online data science training – what is your favorite aspect of your job at 365 Data Science?In all honesty, although I gain satisfaction in all aspects of my job, the part that excites me the most is the enthusiasm of the 365 team. Everyone is consistently open-minded to exploring new ideas to help progress together in this amazing educational platform. Now, we’ve all been curious… Given your experience, what secrets can you share from the cryptic world of cryptocurrency?Something that many people are not aware of is that companies and cryptocurrency exchanges are increasing their use of AI and machine learning algorithms to automate their trading and volumes. Vladimir, we all know you’re very tech-savvy. Are there any interesting or useful tools you use for work, which you’d like to recommend to our readers?Great question! On a daily basis, I utilize the Brave browser, which I particularly find useful for work, as it blocks ads and web trackers. This permits me to browse safely without interruptions when doing research that involves repetitive ads from websites. Furthermore, I am a big fan of the Mix extension because it allows me to collect articles and return to them whenever I’d like to. We heard you are a big fan of…Well, I absolutely enjoy exotic destinations. Hopefully, after the current global pandemic traveling will be back to normal, as I’d love to finally visit my dream destination – Bora Bora. Fingers crossed! Now, this next question comes from a certain 365 team member. If you were a transformer, what vehicle would you transform into?If I were a transformer, I would transform into a Tesla… Just kidding. I am a huge fan of the Ferrari Enzo. Okay, last question before you transform into one! Any advice for digital marketers who want to enter the data science field?Yes! They should consider joining the data science field as soon as possible. Back in 2012, Harvard Business Review called Data Scientist to be “The Sexiest Job of the 21st Century”. Nowadays, the demand for data scientists is growing like never before. This entirely novel field will open a new world of opportunities for any digital marketer who desires to enter the data science field. Vladimir, thank you so much for this interview! We wish you countless exciting 365 projects ahead and tons of bold ideas you set forward and turn into reality! The post Interview with Vladimir Ninov, Head of Organic Growth at 365 Data Science appeared first on Data Science PR. Originally from Machine Learning & AI – Data Science PR https://ift.tt/3lwMLVE
0 Comments
Python tuplesare another type of data sequences, but differently to lists, they are immutable. Tuples cannot be changed or modified; you cannot append or delete elements. The syntax that indicates you are having a tuple and not a list is that the tuple’s elements are placed within parentheses and not brackets. By the way, the tuple is the default sequence type in Python, so if I enlist three values here, the computer will perceive the new variable as a tuple. We could also say the three values will be packed into a tuple. For the same reason, we can assign a number of values to the same number of variables. Do you remember we went through that a few lectures ago? On the left side of the equality sign, we just added a tuple of variables, and on the right, a tuple of values. That’s why the relevant technical term for this activity is tuple assignment. In the same way we can do for lists, we can index values by indicating their position in brackets. In addition, we can also place Python tuples within lists. And then, each tuple becomes a separate element within the list. Python tuples are similar to lists, but there are some subtle differences we should not overlook.They can be quite useful when dealing with different comma-separated values. For example, if we have age and years of school as variables, and I have the respective numbers in a string format, separated by a comma (hence the name comma-separated values), the split method with the proper indication within the parentheses will assign 30 as a value for age and 17 as a value for years of school. We can print the two variables separately to check the outcome… Everything seems to be correct – awesome! Last, functions can provide tuples as return values. This is useful because a function (which can only return a single value otherwise), can produce a tuple holding multiple values. Check this code; I will input only the length of the side of a square, and as an output, the “square info” function will return a tuple. The tuple will tell me the area and the perimeter of the square. This is how we can work with tuples in Python! Thanks for watching! Curious to learn more? Check out our 365 Data Science Training. It covers everything you need from Mathematics and Statistics, through R, Python, and SQL, to Machine learning and Deep Learning (with TensorFlow). ![]() The post Python tuples Explained in 2020 appeared first on Data Science PR. Originally from Machine Learning & AI – Data Science PR https://ift.tt/33OndgE Learning more about basic database terminology is a necessary step that will help us when we start coding. Let’s go through the entire process of creating a database. Assume our database containing customer sales data has not been set up yet, ok? So, imagine you are the shop owner and you realize you have been selling goods quite well recently, and you have more than a million rows of data. What do you need, then? A database! But you know nothing about databases. Who do you call, an SQL specialist? No. You need a database designer. Database DesignerShe will be in charge of deciding how to organize the data into tables and how to set up the relations between these tables. This step is crucial. If the database design is not perfect from the beginning, your system will be difficult to work with and wouldn’t facilitate your business needs; you will have to start over again. Considering the time and data (…and money!) involved in the process, you want to avoid going back to point 0. What do database designers actually do? They plot the entire database system on a canvas using a visualization tool. There are two main ways to do that. One is drawing an Entity-Relationship diagram, an ER diagram for brief. It looks like this… and, as its name suggests, the different figures represent different data entities and the specific relationships we have between entities. The connections between tables are indicated with lines. This way of representing databases is powerful and professional, but it is complicated. We will not focus in-depth on ER diagrams in this post, but you should know they exist and refer to the process of database design. Relational SchemaAnother form of representation of a database is the relational schema. This is an existing idea of how the database must be organized. It is useful when you are certain of the structure and organization of the database you would like to create. More precisely, a relational schema would look like this. It represents a table in the shape of a rectangle. The name of the table is at the top of the rectangle. The column names are listed below. All relational schemas in a database form the database schema. You can also see lines indicating how tables are related in the database. As you probably noticed, we have some other features, but we will leave their explanation for now. Once the database design process is done, the next step would be to create the database. To this moment, it has been ideas, planning, abstract thinking, and design. At this stage, it would be correct to say SQL can be used to set up the database physically, as opposed to contriving it abstractly. Then, you can enjoy the advantages of data manipulation. It will allow you to use your dataset to extract business insights that aim to improve the performance and efficiency of the business you are working for. This process is interesting. Remember – well thought-out databases that are carefully designed and created are crucial prerequisites for data manipulation. If we have done good work with these steps, we could write effective queries and navigate in a database rather quickly. Database ManagementYou will often hear the term database management. It comprises all these steps a business undertakes to design, create, and manipulate its databases successfully. Finally, database administration is the most frequently encountered job amongst all. A database administrator is the person providing daily care and maintenance of a database. Her scope of responsibilities is narrower regarding the ones carried out by a database manager, but she is still indispensable for the database department of a company. This post allowed us to learn certain terms you will be using often when dealing with basic database terminology. Check out other Database explainers: Database vs Spreadsheet, and Relational Database Essentials. ![]() The post Basic Database Terminology appeared first on Data Science PR. Originally from Machine Learning & AI – Data Science PR https://ift.tt/34S0QX9 It’s hardly a surprise to anyone in the tech and related industries that “data scientist” is the best job to have in the States. After all, this has been what sources like the Harvard Business Review and Glassdoor report for what is now four years in a row. And even if we take the base salary of $117,000 out of the equation, the position is still plenty attractive on all other dimensions. For example, the current shift towards using machine learning to fast-track business growth secures a steady cross-industry need for skilled professionals capable of handling data and the emerging technologies. To quantify the need: 151,717 data scientists would be warmly welcomed into the workforce, according to the LinkedIn Workforce Report. And let’s not forget the on-the-job satisfaction levels – most data scientists are actually happy (their words, not ours). But this article is not to give an overview of data science as a smart career decision. The purpose of this report is to feed the “data scientist”, the collective entity, to the number crunching algorithm and understand what makes a data scientist.Reverse engineering the data scientist entails analyzing their skillset, employment history, the industry they work for, academic background, and their formal qualifications. Knowing that, the aspiring data scientist can take informed professional steps to securing the title. When we at 365 Data Science first attempted to dismantle the data scientist in 2018, we revealed a rich professional profile. Twelve months have passed since our initial research and the replicated study suggests that the field is evolving and, with it, the typical professional evolves as well. A note on methodologyThe collective ‘data scientist’ profile was informed by a study on 1,001 professionals currently employed as data scientist. The data was collected from these data participants’ LinkedIn profiles and according to a series of prerequisites. Forty percent of the sample were currently employed at a Fortune 500 company, whereas the remainder worked elsewhere; in addition, location quotas were introduced to ensure limited bias: US (40%), UK (30%), India (15%) and other countries (15%). The selection was based on preliminary research on the most popular countries for data science, where information is public. Is the typical data scientist of 2018 relevant in 2019?At a glance, absolutely! The domain is still strongly dominated by men (69%), who can hold a conversation in at least two languages (not to be confused with programming languages, which, if included, would at least double this number). They have been in the workforce for 8 years, but only working as data scientists for 2.3 of them. They can proudly frame up a second-cycle academic degree (74% hold either a Master’s or a PhD), and do a lot more than program “Hello World” in at least Python or R (73%), often both. Lucky for those of us who are female, or have not yet earned our Doctorates, the segmentation of the data tells a richer and truer story. Does ‘data scientist’ imply Doctor of Philosophy?Just as the field is not impregnable by women, so is having a PhD not a prerequisite for the position. In fact, less than a third of the data scientists in the cohort hold a Doctorate degree (28%). This is a comparable number to last year’s 27%, which seems to entail that industry does not intentionally introduce an unattainable degree of academic prowess. On the other hand, if Master’s degree is something into which the aspiring data scientist is willing to invest time and effort, it seems to be the golden standard for academic qualifications (46% of the sample hold a Master). There is a trend that appears to take shape, however, and that is that the percentage of professionals in a ‘data scientist’ position who have a second-cycle academic degree will decrease, being evened out by data scientists penetrating the field with only a Bachelor’s. The data corroborates this speculation as there is a 4% increase compared to last year in the number of data scientists with only a Bachelor’s degree (19% in 2019 vs. 15% in 2018). Finally, the fact that some graduate Law school (one in 1,001) and make it as data scientists leaves us with some wiggle room when it comes to the level of academic degree the aspiring data scientist can obtain and get away with it. Level of education and work experienceFrom university, to an internship, to the final destination of a ‘data scientist’. This is the story of 8% of the data scientists in our cohort. These are the professionals for whom what it took to land the best job in the USA, was one internship position and a Master’s degree (71%) or a Bachelor’s degree (18%). For others, the path never strayed from leaving academia (9%). As can be expected, the split here between data scientists with a Master’s degree (47%) and those with a PhD (44%) is pretty equal, with other levels of education barely having any presence at all. So, apart from interning or coming straight from academia, are there other ways to open the door to a data science career? Two men and a woman walk into a room: who will be the next data scientist?
According to our data, all of these are gateway positions into data science with a comparable success rate: 9%, 9%, and 8%, respectively. While this might not be the revelation some of our readers are hoping for, these numbers begin to paint the picture of a profession that has many entry points. Consider also the Data Analyst position (13%), the Consultant (6%), and of course the cluster of Other jobs (13%) the current data scientists may have held (comprised of no less than 15 odd positions and titles). If you are mathematically inclined, which we hope you are, because data science implies some analytical prowess, you are just about noticing that there is a large percentage of the previous job position analysis here that is missing. Indeed, 42% of the people who are currently data scientists in our cohort, have already been working as data scientists in their previous position. There is the paradox – the best way to secure getting the title ‘data scientist’ is to already have it. We leave to the reader to speculate further if data scientists like to move jobs (they know their value, too!), or the fact that self-reported data is not always the best data. Should I study Computer Science and Mathematics or can I learn Botanics and still make it as a data scientist?Alright. To become a medical practitioner, you would go to medical school; to become a lawyer – law school; police officer – a special academy, and so on. Data science schools are scarce as of the time of writing, if existent at all, so what do data scientists study? As a matter of fact, a respectable chunk of our cohort studied data science and analysis but a note on notation before we proceed. Due to the massive amount of unique degrees available for academic pursuit, we clustered them into seven areas of academic study.
That said, 12% percent of the data scientist professionals in our research studied Data science and analysis. Although the field itself is new, there is a growing number of universities that offer specialized degrees to prepare you for a future in data science. Given what the typical data scientist profile reveals about the level of education received, it comes as no surprise that most of these are at a Master’s level. However, Data science and analysis is not the most prevalent pursued degree in our cohort.Instead, that would be Computer science (22%). A substantial part of the data scientist’s toolbox is programming languages and number-crunching tools, and this degree is a natural fit, for a lack of more widely accessible alternatives. Surprisingly, the runner-up degree in this year’s study is not Statistics and Mathematics (which comes as a solid third with 16% prevalence), but Economics and Social Sciences (21%). Nonetheless, Engineering graduates make up for another 9% of the cohort, which lends support to the idea that still, the holy trinity of degrees that can best prepare you for processing and handling big data are Computer Science, Statistics and Mathematics, and Engineering (collectively, 50% of the cohort). On the other hand, the heavy prevalence of Economics and Social Sciences graduates in the sample (21%) might be welcome news for the less mathematically inclined aspiring data scientists! Almost as many current professionals come into the field from the Humanities side of the spectrum, as do from the Computer science field alone. And a further 11% from a Natural Sciences background. Considering that data science is often programming with a statistics twist (or statistics with a programming twist), this result is quite interesting. Ivy league or ‘Bye-bye, data science…’?Data Science is definitely not a private playing field for Ivy league graduates. Although a third of the professionals in our sample graduated from a Top 50 University (according to the Times Higher Education World University Ranking for 2019), the second largest cohort is represented by graduates of universities not even ranked by the Times (23%). A collective sigh of relief – the field is not gated for professionals with limited access to world-class higher education. The data suggests that data science is a field in which you can be successful based on skill and merit alone, which is not entirely shocking. After all, analysing and processing data are practical, hands-on skills. The most enthusiastic of us could possibly learn to be exceptional with practice, curiosity, and resourcefulness. Regarding the remaining clusters in the ranking, they are relatively equally populated. The reader should notice, however, that clusters are of different sizes (e.g., *1-50 vs *301-500 or *501-1000). Nonetheless, this is all great news for the aspiring data scientist: the field is not only impregnable by people from various backgrounds and levels of academic studies received, but it is also welcoming to professionals from universities ranking anywhere on the Times scale. Self-preparation and online coursesIt’s no secret that data scientists come from many different backgrounds. Data science is a discipline unlike many others that have a strong educational infrastructure in place. This means that many people wanting to become successful in the field need to take on the responsibility of learning the skills themselves. But how exactly can we measure who did this? The most reliable way is to look at online certificates posted on their profiles. With a plethora of online platforms offering quality courses, and at prices comparable to a romantic dinner date, building a tailor-made set of skills has never been easier (or cheaper). We found that 43% of the profiles we gathered had posted at least one online course with 3 certificates being the average. Of course, some data scientists may well have self-taught themselves through different means. While others may not even post all, or any, of the certificates they have received. If they don’t believe them to be relevant once they have gained more experience, why would they waste time or space on their digital resume? This is something worth keeping in mind given the platform where the data was collected. So, what can we do with this information? Firstly, we could see if there are any correlations to the backgrounds of people who took (or at least posted) online courses and those who didn’t. Is level of academic education a factor? It may not be a stretch to assume that individuals coming from lower ranked universities would be taking more online courses. But maybe students from higher ranked universities are more devoted to their education, so, let’s not assume at all, and instead look the data. Online courses and university rankingThere is, in fact, a very interesting result when comparing these two factors. Before we review though, remember that the university ranking clusters do vary in size, both in terms of participants of the study and university ranks. We also want to note that the *1000+ cluster only contains 7 participants and that is not enough data. So, we won’t be discussing them as valid results. That said, we get some very interesting insights from the remaining data. The first fascinating result is that Universities ranked in the four clusters making up the Top 500 have marginally different results. This is not overly surprising, but it does show a distinct difference from the results of last year’s study, where graduates of the Top 100 universities posted significantly less online courses. Perhaps this goes to show that self-preparation is valuable and valued, even for students of prestigious universities.The next interesting result is seen in the *501-1000 ranking cluster, which shows a dramatic increase in the number of certificates. Again, this wouldn’t be considered shocking: we are willing to speculate that the lower ranked the university you went to, the more you develop a desire to stand out with a high number of online certificates. What is surprising and introduces doubt into the validity of our hypothesis is the behavior of the next cluster (not ranked). There, the number of certificates drops right down to a percentage comparable to those of the Top 500. Although it is hard to know the exact reason why this is, it’s not tricky to see the importance of self-preparation and considering that the number has risen since last year, especially in top tier universities, aspiring data scientists are realizing that too. Gaining online certificates is one thing, but which skills are the most beneficial to learn? Which ones do employers look for? Well, of course, we’ve got the data to discover just that. Country of employment and programming skillsDue to the country of employment quotas placed, we could not only look at aggregates, but also make reliable comparisons across countries. We have divided the sample into four regions: the US, UK, India, and others, placing the same sample weights as in last year’s research (see methodology above). Before we go into geographic segmentation, we need some context which is best provided by the aggregate data. The most prevalent programming language in the data science community is Python, followed by R. It is worth noting that R has experienced a 10% decrease in popularity compared to last year, which, given the versatility of Python, is not entirely surprising. But let’s segment further. Looking at region segmentation, the findings are not too far from the aggregates. For the sake of simplicity, we’ve looked into the 4 most used languages: Python, R, SQL, and MATLAB. The first most prominent difference with last year’s edition is that Java has been replaced by MATLAB as the fourth most used programming language. The leading trio remains unchanged in terms of contestants, but not in composition. For a couple of years now, Python has been eating away at R. In fact, that’s what we see here, too. Python is hands-down the most used data science language worldwide, with R parring it in the only in the US and India. What’s worth noting is that relational database usage seems flat across the globe with SQL being equally employed everywhere. With India being the outsourcing heaven lately, it may be worth taking a look at yet another segmentation – industry. Country of employment and industryWe’ve divided our sample into 4 big clusters of industries: Industrial, Healthcare, Financial, and Tech. Healthcare is an insignificant part of the whole, so it makes sense to focus on the rest. In fact, the data does not show great cross-country differences (see below), apart from UK’s data science still being more ‘Financial’ than ‘Tech’, and India’s data science being less ‘Industrial’ than the rest. While the latter is far from surprising, what’s worth noting is that in our previous study, the Tech sector boasted 70% of the data scientist talent. In the past year, this seems to have changed drastically. Finally, the financial-related data science has grown dramatically in India and the rest of the world, catching up to other industries. For the US-based data scientists, we can say that the tech and industrial clusters are acquiring most of the talent. Country of employment and academic degreeWhat was one of the most interesting segmentations for us was that by academic degree? Subconsciously we expect that PhDs will be dominating the field. And last year the data begged to differ with only a third of the sample being a ‘Doctor of Philosophy’. ‘Just’ a Master’s degree seemed sufficient in more than half of the cases. This year, we see little difference. For the UK, India and the Others, we’ve got no change. A Bachelor’s degree is enough, but being a Master is preferable. What does catch the eye though is the increase in the PhDs in the US. Pair that with the fact that the Tech industry has always gained a bigger share in America, and you’ll have it – highly educated, tech-oriented data scientists probably working harder than ever to deliver the futuristic data science we’re all looking forward to. If you are shy of a PhD and are on our list, chances are you are from India. But how does that translate when it comes to actual work experience? Country of employment and work experienceWhich location leads to the fastest career progression? To get that insight, it is worth looking at the experience the data ‘wizards’ had before achieving their dream of becoming a data scientist. In the 2018 edition, we saw a tremendous gap between countries. More than half of the data scientists in the USA had more than 5 years of experience. This year? Nothing like it. They may have quit their jobs… or became managers. After all, it is also frustrating being a data scientist. But the ‘youngsters’ are making way. And even more so in India and the UK. It seems that there are more job openings and even without prior experience you have a 20-30% chance of getting that place! Company size and programming languageWhile Python is still more widely used in non-F500 companies, in almost every other category, F500 is moving hand-in-hand with non-F500 companies. This is great news for anyone new to data science that prefers the technology of the day – R and Python. There’s been a lot of catching up done compared to last year’s results… Company size and university rankingSurprisingly, university ranking doesn’t make a difference when it comes to the data scientist’s employer. Quoting ourselves from last year: ‘Data scientists are needed everywhere. From F500 to a tech start-up!’ This also reinforces our belief that personal skills and self-preparation are much more prominent factors to becoming a successful data scientist, and employers know that. ConclusionHopefully, this article does not make you doubt whether the data scientist profession is something you could realistically pursue. Instead, we hope to have lent a reassuring hand. One of the main messages we extracted from our study both last and this year, is that if you have the skill base that makes a data scientist, you can be a data scientist. It will be interesting to see how the data science profession changes in the next 2-5 years, but right now, a universal data scientist profile appears to be taking shape: a unique programming language toolbox desired across industries and locations; preferably a Master’s degree, or a Bachelor’s and proof of practical abilities; and a confident learning-on-the-go attitude are the currencies of the field. One final note: we are aiming to create a complete and useful profile of the data scientist as it develops through the years. If anyone has suggestions or comments, feedback and ideas for things we can do better or examine more closely, please let us know! Open conversation is crucial for helping the aspiring data scientist make informed career decisions. Good luck with your data science journey and thank you for reading! P.S. ![]() The post What Do You Need to Become a Data Scientist in 2019? appeared first on Data Science PR. Originally from Machine Learning & AI – Data Science PR https://ift.tt/3jQhPze Lightmatter, a leader in silicon photonics processors, announced its artificial intelligence (AI) photonic processor, a general-purpose AI inference accelerator that uses light to compute and transport data. Using light to calculate and communicate within the chip reduces heat—leading to orders of magnitude reduction in energy consumption per chip and dramatic improvements in processor speed.Since 2010, the amount of compute power needed to train a state-of-the-art AI algorithm has grown at five times the rate of Moore’s Law scaling—doubling approximately every three and a half months. Lightmatter’s processor solves the growing need for computation to support next-generation AI algorithms. The 3D-stacked chip package contains over a billion FinFET transistors, tens of thousands of photonic arithmetic units, and hundreds of record-setting data converters. Lightmatter’s photonic processor runs standard machine learning frameworks including PyTorch and TensorFlow, enabling state-of-the-art AI algorithms. This new architecture is a massive advancement in the development of photonic processors. The performance of this photonic processor provides proof that Lightmatter’s approach to processor design delivers scalable speed and energy efficiency advantages over the current electronic compute paradigm and is the starting point for a roadmap of chips with dramatic performance improvements. Source: InsideBigData The post Lightmatter Introduces Optical Processor to Speed Compute for Next-Generation Artificial Intelligence appeared first on Data Science PR. Originally from Machine Learning & AI – Data Science PR https://ift.tt/3kQB6R7 In order to create effective machine learning and deep learning models, you need copious amounts of data, a way to clean the data and perform feature engineering on it, and a way to train models on your data in a reasonable amount of time. Then you need a way to deploy your models, monitor them for drift over time, and retrain them as needed. You can do all of that on-premises if you have invested in compute resources and accelerators such as GPUs, but you may find that if your resources are adequate, they are also idle much of the time. On the other hand, it can sometimes be more cost-effective to run the entire pipeline in the cloud, using large amounts of compute resources and accelerators as needed, and then releasing them. The major cloud providers — and a number of minor clouds too — have put significant effort into building out their machine learning platforms to support the complete machine learning lifecycle, from planning a project to maintaining a model in production. How do you determine which of these clouds will meet your needs? Here are 12 capabilities every end-to-end machine learning platform should provide. Be close to your dataIf you have the large amounts of data needed to build precise models, you don’t want to ship it halfway around the world. The issue here isn’t distance, however, it’s time: Data transmission speed is ultimately limited by the speed of light, even on a perfect network with infinite bandwidth. Long distances mean latency. The ideal case for very large data sets is to build the model where the data already resides, so that no mass data transmission is needed. Several databases support that to a limited extent. The next best case is for the data to be on the same high-speed network as the model-building software, which typically means within the same data center. Even moving the data from one data center to another within a cloud availability zone can introduce a significant delay if you have terabytes (TB) or more. You can mitigate this by doing incremental updates. The worst case would be if you have to move big data long distances over paths with constrained bandwidth and high latency. The trans-Pacific cables going to Australia are particularly egregious in this respect. Support an ETL or ELT pipelineETL (export, transform, and load) and ELT (export, load, and transform) are two data pipeline configurations that are common in the database world. Machine learning and deep learning amplify the need for these, especially the transform portion. ELT gives you more flexibility when your transformations need to change, as the load phase is usually the most time-consuming for big data. In general, data in the wild is noisy. That needs to be filtered. Additionally, data in the wild has varying ranges: One variable might have a maximum in the millions, while another might have a range of -0.1 to -0.001. For machine learning, variables must be transformed to standardized ranges to keep the ones with large ranges from dominating the model. Exactly which standardized range depends on the algorithm used for the model. Support an online environment for model buildingThe conventional wisdom used to be that you should import your data to your desktop for model building. The sheer quantity of data needed to build good machine learning and deep learning models changes the picture: You can download a small sample of data to your desktop for exploratory data analysis and model building, but for production models you need to have access to the full data. Web-based development environments such as Jupyter Notebooks, JupyterLab, and Apache Zeppelin are well suited for model building. If your data is in the same cloud as the notebook environment, you can bring the analysis to the data, minimizing the time-consuming movement of data. Support scale-up and scale-out trainingThe compute and memory requirements of notebooks are generally minimal, except for training models. It helps a lot if a notebook can spawn training jobs that run on multiple large virtual machines or containers. It also helps a lot if the training can access accelerators such as GPUs, TPUs, and FPGAs; these can turn days of training into hours. Support AutoML and automatic feature engineeringNot everyone is good at picking machine learning models, selecting features (the variables that are used by the model), and engineering new features from the raw observations. Even if you’re good at those tasks, they are time-consuming and can be automated to a large extent. AutoML systems often try many models to see which result in the best objective function values, for example the minimum squared error for regression problems. The best AutoML systems can also perform feature engineering, and use their resources effectively to pursue the best possible models with the best possible sets of features. Support the best machine learning and deep learning frameworksMost data scientists have favorite frameworks and programming languages for machine learning and deep learning. For those who prefer Python, Scikit-learn is often a favorite for machine learning, while TensorFlow, PyTorch, Keras, and MXNet are often top picks for deep learning. In Scala, Spark MLlib tends to be preferred for machine learning. In R, there are many native machine learning packages, and a good interface to Python. In Java, H2O.ai rates highly, as do Java-ML and Deep Java Library. The cloud machine learning and deep learning platforms tend to have their own collection of algorithms, and they often support external frameworks in at least one language or as containers with specific entry points. In some cases you can integrate your own algorithms and statistical methods with the platform’s AutoML facilities, which is quite convenient. Some cloud platforms also offer their own tuned versions of major deep learning frameworks. For example, AWS has an optimized version of TensorFlow that it claims can achieve nearly-linear scalability for deep neural network training. Offer pre-trained models and support transfer learningNot everyone wants to spend the time and compute resources to train their own models — nor should they, when pre-trained models are available. For example, the ImageNet dataset is huge, and training a state-of-the-art deep neural network against it can take weeks, so it makes sense to use a pre-trained model for it when you can. On the other hand, pre-trained models may not always identify the objects you care about. Transfer learning can help you customize the last few layers of the neural network for your specific data set without the time and expense of training the full network. Offer tuned AI servicesThe major cloud platforms offer robust, tuned AI services for many applications, not just image identification. Example include language translation, speech to text, text to speech, forecasting, and recommendations. These services have already been trained and tested on more data than is usually available to businesses. They are also already deployed on service endpoints with enough computational resources, including accelerators, to ensure good response times under worldwide load. Manage your experimentsThe only way to find the best model for your data set is to try everything, whether manually or using AutoML. That leaves another problem: Managing your experiments. A good cloud machine learning platform will have a way that you can see and compare the objective function values of each experiment for both the training sets and the test data, as well as the size of the model and the confusion matrix. Being able to graph all of that is a definite plus. Support model deployment for predictionOnce you have a way of picking the best experiment given your criteria, you also need an easy way to deploy the model. If you deploy multiple models for the same purpose, you’ll also need a way to apportion traffic among them for a/b testing. Monitor prediction performanceUnfortunately, the world tends to change, and data changes with it. That means you can’t deploy a model and forget it. Instead, you need to monitor the data submitted for predictions over time. When the data starts changing significantly from the baseline of your original training data set, you’ll need to retrain your model. Control costsFinally, you need ways to control the costs incurred by your models. Deploying models for production inference often accounts for 90% of the cost of deep learning, while the training accounts for only 10% of the cost. The best way to control prediction costs depends on your load and the complexity of your model. If you have a high load, you might be able to use an accelerator to avoid adding more virtual machine instances. If you have a variable load, you might be able to dynamically change your size or number of instances or containers as the load goes up or down. And if you have a low or occasional load, you might be able to use a very small instance with a partial accelerator to handle the predictions. Source: InfoWorld The post How to choose a cloud machine learning platform in 2020 appeared first on Data Science PR. Originally from Machine Learning & AI – Data Science PR https://ift.tt/2G5dQjk Creating a machine learning algorithm ultimately means building a model that outputs correct information, given that we’ve provided input data. Think of this model as a black box.We feed input, and it delivers an output. For instance, we may want to create a model that predicts the weather tomorrow, given meteorological information for the past few days. The input we’ll feed to the model could be metrics, such as temperature, humidity, and precipitation. The output we will obtain would be the weather forecast for tomorrow. Training is a central concept in machine learning, as this is the process through which the model learns how to make sense of the input data. Once we have trained our model, we can simply feed it with data and obtain an output. The post Free Data Science Tutorial: Introduction to Machine Learning appeared first on Data Science PR. Originally from Machine Learning & AI – Data Science PR https://ift.tt/2G427Bt In this video, I cover the concepts and practical aspects of building a classification model using the R programming language; starting from loading in the iris dataset, splitting the dataset, building the training model and cross-validation models using support vector machine, evaluating the prediction performance as well as the feature importance. TIMESTAMP 0:39 Download code from Data Professor GitHub 0:48 Import Iris dataset 0:59 Check for missing values 1:56 Data splitting 2:57 Data splitting in R 5:28 Practice: Make scatter plot comparing Training and Testing sets (distribution) 7:35 Mean centering 11:16 Building Training and CV models in R 15:38 Model performance metrics 16:54 Feature importance The post Machine Learning in R: Building a Classification Model appeared first on Data Science PR. Originally from Machine Learning & AI – Data Science PR https://ift.tt/3hISR2O Learn Web Scraping with Beautiful Soup and requests-html; harness APIs whenever available; automate data collection!What you’ll learn
Requirements
Description Are you tired of manually copying and pasting values in a spreadsheet? Do you want to learn how to obtain interesting, real-time and even rare information from the internet with a simple script? Are you eager to acquire a valuable skill to stay ahead of the competition in this data-driven world? If the answer is yes, then you have come to the right place at the right time! Welcome to Web Scraping and API Fundamentals in Python! The definitive course on data collection! Web Scraping is a technique for obtaining information from web pages or other sources of data, such as APIs, through the use of intelligent automated programs. Web Scraping allows us to gather data from potentially hundreds or thousands of pages with a few lines of code. From reporting to data science, automating extracting data from the web avoids repetitive work. For example, if you have worked in a serious organization, you certainly know that reporting is a recurring topic. There are daily, weekly, monthly, quarterly, and yearly reports. Whether they aim to organize the website data, transactional data, customer data, or even more easy-going information like the weather forecast – reports are indispensable in the current world. And while sometimes it is the intern’s job to take care of that, very few tasks are more cost-saving than the automation of reports. When it comes to data science – more and more data comes from external sources, like webpages, downloadable files, and APIs. Knowing how to extract and structure that data quickly is an essential skill that will set you apart in the job market. Yes, it is time to up your game and learn how you can automate the use of APIs and the extraction of useful info from websites. In the first part of the course, we start with APIs. APIs are specifically designed to provide data to developers, so they are the first place to check when searching for data. We will learn about GET requests, POST requests and the JSON format. These concepts are all explored through interesting examples and in a straight-to-the-point manner. Sometimes, however, the information may not be available through the use of an API, but it is contained on a webpage. What can we do in this scenario? Visit the page and write down the data manually? Please don’t ever do that! We will learn how to leverage powerful libraries such as ‘Beautiful Soup’ and ‘requests HTML’ to scrape any website out there, no matter what combination of languages are used – HTML, JavaScript, and CSS. Certainly, in order to scrape, you’ll need to know a thing or two about web development. That’s why we have also included an optional section that covers the basics of HTML. Consider that a bonus to all the knowledge you will acquire! We will also explore several scraping projects. We will obtain and structure data about movies from a “Rotten Tomatoes” rank list, examining each step of the process in detail. This will help you develop a feel for what scraping is like in the real world. We’ll also tackle how to scrape data from many webpages at once, an all-to-common need when it comes to data extraction. And then it will be your turn to practice what you’ve learned with several projects we’ll set out for you. But there’s even more! Web Scraping may not always go as planned (after all, that’s why you will be taking this course). Different websites are built in different ways and often our bots may be obstructed. Because of this, we will make an extra effort to explore common roadblocks that you may encounter while scraping and present you with ways to circumnavigate or deal with those problems. These include request headers and cookies, log-in systems and JavaScript generated content. Don’t worry if you are familiar with few or none of these terms… We will start from the basics and build our way to proficiency. Moreover, we are firm believers that practice makes perfect, so this course is not so much on the theory side of things, as it adopts more of a hands-on approach. What’s more, it contains plenty of homework exercises, downloadable files and notebooks, as well as quiz questions and course notes. We, the 365 Data Science Team are committed to providing only the highest quality content to you – our students. And while we love creating our content in-house, this time we’ve decided to team up with a true industry expert – Andrew Treadway. Andrew is a Senior Data Scientist for the New York Life Insurance Company. He holds a Master’s degree in Computer Science with Machine learning from the Georgia Institute of Technology and is an outstanding professional with more than 7 years of experience in data-related Python programming. He’s also the author of the ‘yahoo_fin’ package, widely used for scraping historical stock price data from Yahoo. As with all of our courses, you have a 30-day money-back guarantee, if at some point you decide that the training isn’t the best fit for you. So… you’ve got nothing to lose – and everything to gain? So, what are you waiting for? Click the ‘Buy now’ button and let’s start collecting data together! Who this course is for:
Join now!The post Web Scraping and API Fundamentals in Python 2020 appeared first on Data Science PR. Originally from Machine Learning & AI – Data Science PR https://ift.tt/35INZIG Beginner and Advanced Customer Analytics in Python: PCA, K-means Clustering, Elasticity Modeling & Deep Neural Networks.What you’ll learn
Requirements
Description Data science and Marketing are two of the key driving forces that help companies create value and stay on top in today’s fast-paced economy. Welcome to… Customer Analytics in Python – the place where marketing and data science meet! This course is the best way to distinguish yourself with a very rare and extremely valuable skillset. What will you learn in this course? This course is packed with knowledge, covering some of the most exciting methods used by companies, all implemented in Python. Since Customer Analytics is a broad topic, we have created 5 different parts to explore various sides of the analytical process. Each of them will have their strong sides and shortcomings. We will explore both sides of the coin for each part, while making sure to provide you with nothing but the most important and relevant information! Here are the 5 major parts: 1. We will introduce you to the relevant theory that you need to start performing customer analytics We have kept this part as short as possible in order to provide you with more practical experience. Nonetheless, this is the place where marketing beginners will learn about the marketing fundamentals and the reasons why we take advantage of certain models throughout the course. 2. Then we will perform cluster analysis and dimensionality reduction to help you segment your customers Because this course is based in Python, we will be working with several popular packages – NumPy, SciPy, and scikit-learn. In terms of clustering, we will show both hierarchical and flat clustering techniques, ultimately focusing on the K-means algorithm. Along the way, we will visualize the data appropriately to build your understanding of the methods even further. When it comes to dimensionality reduction, we will employ Principal Components Analysis (PCA) once more through the scikit-learn (sklearn) package. Finally, we’ll combine the two models to reach an even better insight about our customers. And, of course, we won’t forget about model deployment which we’ll implement through the pickle package. 3. The third step consists in applying Descriptive statistics as the exploratory part of your analysis Once segmented, customers’ behavior will require some interpretation. And there is nothing more intuitive than obtaining the descriptive statistics by brand and by segment and visualizing the findings. It is that part of the course, where you will have the ‘Aha!’ effect. Through the descriptive analysis, we will form our hypotheses about our segments, thus ultimately setting the ground for the subsequent modeling. 4. After that, we will be ready to engage with elasticity modeling for purchase probability, brand choice, and purchase quantity In most textbooks, you will find elasticities calculated as static metrics depending on price and quantity. But the concept of elasticity is in fact much broader. We will explore it in detail by calculating purchase probability elasticity, brand choice own price elasticity, brand choice cross-price elasticity, and purchase quantity elasticity. We will employ linear regressions and logistic regressions, once again implemented through the sklearn library. We implement state-of-the-art research on the topic to make sure that you have an edge over your peers. While we focus on about 20 different models, you will have the chance to practice with more than 100 different variations of them, all providing you with additional insights! 5. Finally, we’ll leverage the power of Deep Learning to predict future behavior Machine learning and artificial intelligence are at the forefront of the data science revolution. That’s why we could not help but include it in this course. We will take advantage of the TensorFlow 2.0 framework to create a feedforward neural network (also known as artificial neural network). This is the part where we will build a black-box model, essentially helping us reach 90%+ accuracy in our predictions about the future behavior of our customers. An Extraordinary Teaching Collective We at 365 Careers have 550,000+ students here on Udemy and believe that the best education requires two key ingredients: a remarkable teaching collective and a practical approach. That’s why we ticked both boxes. Customer Analytics in Python was created by 3 instructors working closely together to provide the most beneficial learning experience. The course author, Nikolay Georgiev is a Ph.D. who largely focused on marketing analytics during his academic career. Later he gained significant practical experience while working as a consultant on numerous world-class projects. Therefore, he is the perfect expert to help you build the bridge between theoretical knowledge and practical application. Elitsa and Iliya also played a key part in developing the course. All three instructors collaborated to provide the most valuable methods and approaches that customer analytics can offer. In addition, this course is as engaging as possible. High-quality animations, superb course materials, quiz questions, handouts, and course notes, as well as notebook files with comments, are just some of the perks you will get by enrolling. Why do you need these skills? 1. Salary/Income – careers in the field of data science are some of the most popular in the corporate world today. All B2C businesses are realizing the advantages of working with the customer data at their disposal, to understand and target their clients better 2. Promotions – even if you are a proficient data scientist, the only way for you to grow professionally is to expand your knowledge. This course provides a very rare skill, applicable to many different industries. 3. Secure Future – the demand for people who understand numbers and data, and can interpret it, is growing exponentially; you’ve probably heard of the number of jobs that will be automated soon, right? Well, the marketing department of companies is already being revolutionized by data science and riding that wave is your gateway to a secure future. Why wait? Every day is a missed opportunity. Click the “Buy Now” button and let’s start our customer analytics journey together! Who this course is for:
Join now!The post Customer Analytics in Python 2020 appeared first on Data Science PR. Originally from Machine Learning & AI – Data Science PR https://ift.tt/35Crlli |